anti-virus rants: the myth of meaningful informal anti-malware tests

Monday, April 02, 2007

the myth of meaningful informal anti-malware tests

bad tests are not necessarily a problem that is unique to the anti-malware field but it is one that those in the anti-malware community have encountered countless times... it's not that average folks are physically incapable of performing good tests, it's that they simply don't do it...

an informal test lacks strict adherence to established testing protocols, often because the standards set for good tests are so high that most people can't be bothered to go to all that trouble... only those who are dedicated to the process of anti-malware testing have ever done a half decent job of it because it requires a substantial investment of time and effort, not to mention a certain degree of expertise to ensure that the testbed is free of garbage/scanner fodder and duplicates...

i could stop there, i suppose, and hope that what i've said isn't too abstract, but i really object lessons and one sort of fell into my lap... dr. anton chuvakin has used the the results of one such bad test to support the now popular opinion that anti-virus is dead... i saw where this was going early on (the original question was obviously loaded)... i guessed the results would be bad and suggested the most likely reason was that the malware was too new - to which i was told that some samples were weeks old, though weeks can technically still be new enough if no one else has reported it yet... additionally, if a particular piece of malware isn't affecting many people the av companies may down-prioritize it in order to focus on samples that are affecting more people...

the test, as described so far, worked like this: someone who does incident response at a public university collected samples from compromised machines for a period of time and then when the batch was big enough (weeks after the collection was started) this person submitted them to virus total and took note of the results... the results were as follows: the best detection rate was 50%, the worst was 2%, and the average was 33%...

now, i can see a lot of problems with such a test but lets start with the big ones:

statistical significance - the sample size was too small... it may sound picky to those who want to believe, but even an unskilled labourer intuitively knows that asking 5 people their opinion on X does not give you a result that can be generalized to the entire population... this test (or at least the interpretation of it that we've been given) pretends to represent the ability of anti-virus products to protect against malware that is currently in the wild affecting people but most likely has a sample size of about 50 (50 is the minimum necessary for a product to get a 2% detection rate, but larger sample sizes make a perfect 50% detection rate increasingly unlikely)... since the wildlist has over 1600 different pieces of malware on it and since that only represents the self-replicating malware (which are believed to be in the minority now), a sample size of 50 (or even 100) just doesn't cut it...
sample selection bias - the samples came from compromised machines in a production environment... maybe that sounds reasonable to you but let me ask you this - if you test a variety of lie detectors against people who have proven themselves to be good at fooling lie detectors, are you really measuring how good those lie detectors are in the real world? the answer is no, you're only measuring how good they are against an artificially constructed set of people... the same goes for malware from compromised production machines - the malware you find there is the malware that has proven itself to be good at evading anti-malware scanners, not in-the-wild malware in general...
sample selection bias - the samples came solely from computers in an academic institution... sorry to say but universities are not a typical environment... if anything, they are the place where the incidence of truly malicious insider threats are the greatest (uni students practicing their l33t skills)... home users don't intentionally compromise their own computers, enterprise users may consider it (or even try it) but there are more/stronger controls in place (stricter logical access controls, more personally significant administrative controls, etc.) to prevent and/or deter it...
test-bed integrity - the test was carried out by incident response personnel... we really have no way of knowing how clean (in the sense of being free of detrimental garbage) the test-bed was... even if it was an incident response technician who was capable of reverse engineering each sample of malware (and had the free time in which to do it), simply reverse engineering the samples would not be enough to determine whether the samples were sufficiently different enough from one another to be considered distinctly different pieces of malware... and since many incident response technicians simply follow a script, there's a serious question as to whether all the samples really were malware instead of scanner fodder and whether any of them were duplicates...

now as i've already said, those of us who have been in the anti-malware community for a while have seen countless examples of these kinds of tests so we know to take them with more than just a single grain of salt... unfortunately the average person doesn't know why they should be wary of such tests and when security experts use these same sorts of tests to prop up FUD that scares people away from anti-virus products it doesn't help anyone... it's bad science and a doctor of physics should know better...

4 comments:

Unknown said...: I really came close to posting on my own blog something about this. I think AV has sadly been misconstrued over years and years of consumeristic marketing hype. AV is very much useful and good to have, but it is not necessarily meant to protect against "everything" or against the newest attacks.

Of course, that can mean some people have bought into the marketing hype that has twisted the purpose of AV and declare it dead just because it won't catch everything. This is what I call denouncing something because it has a flaw (god forbid). Sadly, many of the same people will say there is no silver bullet... Which kinda means we're just gonna always say everything is dead forever... D'oh!; 3 Apr 2007, 14:37:00
kurt wismer said...: well, don't let me stop you from posting about this if there's something you feel you need to say about it... everyone has a voice...

i agree with you that the av hype has really backfired and hurt the av industry... for years i have been saying that these products are not solutions but rather they are just tools that can help us to protect ourselves...

if people would just stop looking at av as if it's supposed to be some kind of panacea and start treating it as just another tool with a limited scope of applicability (just like everything else) folks would be much better off...; 3 Apr 2007, 18:50:00
Anonymous said...: The results are obvious, in fact- there are some trends that lead to complete ineffectivity of AV tools.

1. Only undetectable malware can bring revenue to its author.
2. The number of malware increasing rapidly.

Both thends leads to anti-virus labs overflow and growing number of undetectable malware. Also, the time delta between new malware release and its analyzis by AV people is growing as many users can't identyfy malware files and send them to lab (especially if it is rootkit).

The real problem is that AV companies advertize their software as a first-line defense tools, but reactive signatures detection can be only second-line defense!; 7 Apr 2007, 12:01:00
kurt wismer said...: @ilya rabinovich
"The results are obvious, in fact- there are some trends that lead to complete ineffectivity of AV tools."

av tools are far from completely ineffective... you've repeated the standard arguments for why anti-virus isn't effective against new/unknown malware but failed to acknowledge that new/unknown malware doesn't stay new/unknown for long and that it's not just new/unknown malware out there...

for example:
"1. Only undetectable malware can bring revenue to its author."

arguable, at best, since (for example) people have been hitting my blog looking for cell-phone spyware products long after detection for them was added to anti-virus products... and then there was the person looking for removal instructions for a virus made in 1992...

"2. The number of malware increasing rapidly."

and anti-virus vendors continue to make advances in the automation of malware analysis, thus accelerating the rate at which they can process malware samples...

"The real problem is that AV companies advertize their software as a first-line defense tools, but reactive signatures detection can be only second-line defense!"

traditionally, anti-virus was the first line of defense - and, with the exception of those who use technologies that examine content as it's just coming off the wire, for most folks anti-virus still is the first line of defense...

what's interesting to me, though, is that this would be brought up by someone whose product represents the last line of defense and it makes me think that i should write a post about what the defensive lines in end-point anti-malware security are...; 8 Apr 2007, 14:43:00