Tuesday, April 12, 2011

it's not a detection rate

(this has been stewing for a little while now)

look, i realize that virustotal performs a series of detection tests in order to get it's results. i also know that it expresses those results as something that looks a lot like a rate. but as much as you may want to, as much intuitive sense as it may make, don't mistake those results for a detection rate.

first, let's deal with the elephant in the room. in the anti-malware world, the term "detection rate" has already been used for something else. traditionally a detection rate is arrived at by testing an anti-malware product against many malware samples in order to see how good the product is at detecting malware. this is what detection rate has meant for somewhere on the order of two decades, and it bares little relation to what virustotal does.

the inverse of that method, to test a single sample against many anti-malware products in order to see how bad anti-malware technology is, is in theory similar to what virustotal does but it differs in two very important ways:
  1. the purpose of virustotal's test is to give the user an indication of whether the submitted sample is likely to be malware rather than to determine how bad anti-malware technology is
  2. the way virustotal uses anti-malware products in it's testing does not lend itself to an accurate determination of whether a particular product can detect a particular sample (as i've discussed over and over again, and as hispasec themselves mention)
i mulled over the idea that even though it's not a detection rate, and it's not even an inverse detection rate, maybe it could at least represent the lower bound of an inverse detection rate since the most obvious methodological problems (if we were trying to interpret virustotal results the way some people seem to want) would lead to detection capabilities being under-reported. even if it could be called that, however, an inverse detection rate lower bound is so abstract that there's little benefit in using the term with the general population.

i'm tempted to suggest people just call the results a score, but when you compare "virustotal results" with "virustotal score" you realize you're not really saving much more than 2 keystrokes by using that term. there's no intuitive meaning to be had in either of them. it seems the kind of intuitive meaning that people hope to convey by calling it a detection rate simply can't be had.

as such, whether you're an recognized and well regarded expert like dancho danchev who explicitly calls it a detection rate, or brian krebs who tries to infer meaning about anti-malware technology by looking at the results, or even if you just someone who relies on such experts for their accurate analyses and informed opinions - remember that virustotal is for testing samples not anti-malware products. don't try to infer meaning from virustotal test results about something virustotal isn't meant to test. you will most likely fail.

2 comments:

David Harley said...

Amen.....

Mr Adam Smolkowicz said...

Hey thanks for this article