Wednesday, January 24, 2007

the myth of overwhelming numbers

one of the myths put forward by the anti-virus is dead crowd is that anti-virus companies just can't keep up with the malware authors... this article about the death of anti-virus, balanced though it may be, contains the following quote that demonstrates just what i'm talking about:
"The traditional, signature-based technologies are simply not able to keep up with the sheer volume of malware that's out there," Jaquith told TechNewsWorld. "There are over 200,000 unique pieces of malware out there. Some host intrusion vendors say that number is closer to a million.
200,000 indeed - that does seem like an awfully big number, how could anti-virus companies possibly keep up? in actuality, this is an example of lying with numbers... although the 200,000 figure is fairly accurate (at least it's in line with what mcafee and f-secure were saying last time i checked), it is not something you keep up with... keeping up is something you do with a rate (a certain number of things per minute or hour or day or year) and 200,000 isn't a rate, it's a total... if we were to turn that total into a rate we'd need to include the time period over which it occurred, and although 200,000 sounds like a pretty big number, 200,000 over the course of 20 years doesn't sound quite so overwhelming...

i could end there, but that would be misleading in the opposite direction... 200,000 over 20 years works out to about 27 per day but that's not the right figure right now, that assumes a constant rate of malware production over those 20 years and it has been anything but constant... in this article stanislav shevchenko (the head of kaspersky labs virus lab) is quoted as saying that november 2006 saw 10,000 malware samples added to their database - that works out to about 333 per day... what's more, it's apparently 5 times the number from january 2006...

now you're probably thinking we're back in the overwhelming numbers range again, but wait - stanislav has more figures, specifically the time it takes for a good analyst to process an average malware sample is 5 minutes... if you add in the stated constraint of a 12 hour shift a good analyst should be able to process 144 samples per day so november's incredible figures could probably have been handled by a grand total of 3 analysts with time to spare assuming there weren't too many samples that were significantly more complicated than average - and even if that assumption is false, anti-virus labs tend to have more than just 3 analysts...

so clearly, right now anti-virus companies are not being overwhelmed by samples... that only leaves the future to worry about... if the numbers increased 5 fold from january to november in 2006, what are they going to be like this time next year or the year after that?...

the first graph shown here depicts the increase in the malware production rate over time for the past several years and admittedly it looks pretty grim but i saw a graph with a nearly identical shape (though obviously with very different numbers) ten or more years ago... somehow the world hasn't ended yet, anti-virus hasn't died yet, and life goes on... the more you do a repetitive task, the more patterns start to emerge and with those patterns comes the potential for more automation which in turn leads to faster malware analysis... advances in automation have sped up and will continue to speed up the analysis process so the future doesn't really look all that bleak either...

[edited to fix the year specified - apparently i'm still getting used to this whole 2007 thing]

4 comments:

Rob Lewis said...

The graph may have been the same shape Kurt, but the units were likely in the tens, not thousands. That alone says something.

kurt wismer said...

actually, at least one side of the graph was in the hundreds... it made it clear that it would reach the thousands soon...the implication was that it would soon outstrip our ability to cope with it, but that never happened...

the implication most people take away from graphs that show geometric curves is that doom lies just beyond the far edge of the graph but that's based on an assumption that we will continue dealing with things exactly as we are right now, that there will be no advances, and obviously that assumption is not valid... we are continuously making advances that help us do things faster...

Rob Lewis said...

These comments from Kaspersky are a little telling about this topic Kurt.

Kaspersky seeks help from international police to fight cybercrime

http://www.networkworld.com/news/2007/013107-kaspersky-cybercrime.html

Obviously from your arguments, you do not agree.

kurt wismer said...

actually, i think the that article is pretty orthogonal to the myth of overwhelming numbers...

it talks about solving the malware problem, and since the problem has many dimensions (not all of which are technical) any solution to the malware problem has to deal with all of those dimensions...

of course, the very notion of a solution is itself a little misleading...

now, i do realize that kaspersky is quoted as saying they are overwhelmed, but the figures for my myth article also come from kaspersky labs... kaspersky's got to say what he's got to say in order to get people on board with the idea that changes need to be made to assist in applying legal pressure against the people responsible... if he said everything was a-ok, the perception would be that there is no need to do more than is already being done...