Sunday, January 10, 2010

what's in a malware name

...that which we call conficker by any other name would taste as sour.

david harley, tom kelchner, and mary landesman have all posted their responses to an infosecurity article questioning the apparent lack of consistency in malware naming.

they all say more or less the same thing about the deluge of modern malware making harmonization of names impossible and to a certain extent they're right, but to a certain extent they're also wrong - not so much in the technical details of their answer but more in the way they're framing the problem that the infosecurity article was underlining.

now the truth is i had actually planned on writing about malware naming some time ago in response to another of david harley's articles in which he basically says malware names are irrelevant. i can see where he's coming from with that, and probably you can too. a malware detector doesn't care what the name of the malware is, only whether it's there or not - and the consumer of the malware detector generally won't care that much about the name either (certainly not whether it's the same name that all the other vendors use). in the consumer's worst case scenario all they really need is some sort of unique identifier, be it a number, a GUID, or some made up nonsense word (oh, wait, that's what they get now) in the event that they need to call up the vendor for support.

but there's a problem with this line of thinking and i'll demonstrate it with a little thought experiment. let's take all the bones in the human body and replace their current identifiers (such as scapula, ulna, radius, etc) with numbers, or GUIDs, or made up nonsense words. now try having an intelligible discussion about bones you've broken over your lifetime with someone. can you imagine how much more difficult that would be? obviously replacing their current names with the made up nonsense words would just pose difficulty in adjusting to new names but GUIDs would be far too unwieldy for people to use, and numbers would have numerical relationship baggage that would confuse the issues. let's take one more step in this thought experiment, however. let's say there are 50 different people, each with their own different set of replacement identifiers for the bones in the human body, and let's say that they collectively are trying to advise people on bone health. how well is that really going to work? not very well, obviously.

while it is true that malware today is far too numerous to harmonize the naming for each and every instance, we can't let the great become the enemy of the good. if the anti-malware world revolved exclusively around the production and consumption of malware detectors then names really would be unimportant and irrelevant, but the fact is in such a world people like david harley and tom kelchner and mary landesman wouldn't be blogging about such things because those blogs would also be irrelevant.

the thought experiment above demonstrates when names are important and why consistent names are important. names are important when you're dealing with people rather than just technology. they are important when you are trying to communicate information about threats, trends, etc. to people. people need names for things, and frankly they need to be fairly simple names - that's why storm, loveletter, and code red catch on while waledac, virut, and sality wallow in obscurity, and why people keep misspelling conficker. heck, it's why meteorologists name significant weather formations like hurricanes using human given names like harry or katrina. people also need for multiple authorities to agree on the names for things or else they can't integrate data from multiple sources and are left disoriented and confused.

again, we can't let the great (harmonizing the naming of all malware instances) become the enemy of the good (harmonizing the naming of the relative handful of malware instances the industry considers significant enough to write about in things like year-end threat reports). it may be impossible to coordinate names for each malware instance in existence and entirely pointless even if it were possible, but the same does not hold true for the small set of malware that vendors write about by name. just so we're clear, i'm not suggesting that such coordination need take place before releasing detection for the aforementioned malware. what i have in mind is something not unlike the now defunct common malware enumeration with the exception of using names instead of numbers - a post hoc harmonized second name (a common name or layman's name) for those few pieces of malware that the industry feels they need to communicate to the masses about.

of course, after all that is said and done, even if naming were consistent i fully realize that different vendors reports would list different sets of malware and to that end people still need to understand that such reports reflect not the actual threat landscape but what the vendor has seen of the threat landscape. to that end there should still be overlap between the sets of malware used by different vendors in their reports, and if there isn't that suggests sampling bias pronounced enough to render those models of the threat landscape irrelevant.


cdman83 said...

IMHO the single biggest problem in the "malware naming harmonization" is the one of detection methods.

For example lets say that we have a malicious file F which is detected by to malware scanners S1 and S2. Furthermore, lets say that S1 detects the file as "Malware1" because it contains the string ".evil" in the headers and S2 detects it as "Malware2" because it contains the string ".bad". This means:

- both scanners correctly determine that the file is malware
- however, because of the different detection methods, you can't make the equation "S1/Malware1" == "S2/Malware2" - there very well might be files out there which will be detected by S1 as Malware1, but by S2 as something else and vice-versa

And I'm not even considering the problem naming "families" (ie. if you have a downloader which today downloads Malware1 but tomorrow downloads Malware2 because the second one pays better - how should it be classified? as Malware1? 2? something different?)

To say it an other way: the set of "malicious files" detected by S1 is the reunion of smaller detections (which may be overlapping! - ie. a file might be detected by multiple heuristics/signatures) and so is the set of files detected by S2. Because they are developed separately, it isn't reasonable to expect that there would be anything like a 1 to 1 correspondence between these subsets.

PS. IMHO one of the source of this misunderstanding is that in "the good old days" virus-writers worked separated and there were few new viruses. So (a) it was easy to isolate "family X" from "family Y" and (b) the companies had time check how others were detecting it and coming up with a similar name. These days however there is a lot of cross-pollination and variants come out very rapidly, making this process unfeasible.

Also, I am all for giving nonsensical names to malware (such as GUIDs), because I've seen many times people imagining that the complete behavior of the malware is encoded in the name (ie. the malware is named "INF/Autorun", which means that we don't have to worry that it spreads trough IM, since it doesn't say so in the name).

David Harley said...

Well, I agree with you more than I disagree. :) But I thought my response needed more time and space than I have here, so I blogged it at ESET.

kurt wismer said...

i have no doubt there are many difficulties in the realm of "deconfliction" (which is what the CME called the process of deciding whether a particular instance was the same as an already enumerated piece of malware or not) but there was a process for it.

whatever those problems are, however, at the end of the day if a vendor can give something a name for use in a threat report, they can also agree with other vendors about what that name should be. they don't need to do that for all of them or even most of them - threat reports show what? the top 10 or 20 pieces of malware seen throughout the year? harmonizing those names should be doable.

- also, good point on the name interpretation. caro-compliant names never say what the malware doesn't do, they only imply some part of what it does do.

Mike said...

You make a good point, and it is one I often make about encryption. There are just too many standards out there for any smooth communication to occur. I think there are some companies who are getting it right with their approach to malware, but many malware just can't seem to get their fundamentals down.

cdman83 said...

@kurt: you seem to have been spammed :-) (see Mikes comment above). But fear not, you are in good company ;-) (see this blogpost).

kurt wismer said...

without any other context i settled on the interpretation that mike was just expressing an opinion, but in light of the new context you've brought you appear to be right about mike being up to no good.

while i'm always slightly suspicious of comments that contain links, these comment links point to legitimate and well known vendors - and he points to 2 competing vendors, which seems an unlikely behaviour for someone getting paid by one of them.

it's possible this person is astroturfing or maybe even trying to smear the reputation of both companies.

at any rate, it certainly shouldn't do anything to improve either's pagerank since the rel="nofollow" attribute is set. i also doubt regular readers will be swayed by it since i think they're already quite familiar with both companies. i think i'll leave the comment as evidence.