Wednesday, February 06, 2013

debating AV effectiveness with security experts

a rather disheartening conversation took place on twitter over the weekend. as public conversations sometimes do, it grew beyond any capability i have to do it justice through description, so instead i'll provide some screenshots and links to a couple of branches of the discussion.

because i don't follow either dan kaminsky or robert graham, i knew nothing about this discussion until someone retweeted the tweet pictured below (i included as much context as i could):

what first made me take interest in this was that robert graham seemed to be talking about 2 different things as though they were the same. the AV that's only 4% effective (or 0% when he's done with it) is different than the AV that organizations pay 40% of their budget on.

the apparently ineffective AV is actually the scanner component of the AV; as you can see he describes his methodology for bypassing it - a methodology that essentially amounts to malware q/a, which happens to be a countermeasure against heuristic detection, which is a feature of scanners.

the AV that organizations pay 40% of their budget on (assuming that's an accurate figure, i wouldn't know) is the enterprise security suite, which includes other things beyond just the scanner. for example the tweet by dan kaminsky that seems to have started the entire conversation alludes to the failure of symantec's product to stop 44 out of 45 pieces of malware in the recently publicized attack on the new york times. but as symantec rightly pointed out, their product included a reputation system which, for all intents and purposes, behaves much like a whitelist - if something doesn't have a good reputation (and new things have no reputation at all) then it will be flagged. that is about as different from a traditional scanner as one can imagine and bypassing it isn't nearly as straightforward.

talking about 2 different "AV"s as though they were the same is symptomatic of not being able to see beyond the abstraction that is AV. conceptually AV is an abstraction that encompasses a variety of disparate preventative, detective, and recovery techniques. most people, however, just see AV as a magic box that you turn on and it just protects things. the only component that behaves anything like that is the real-time scanner, but it is not the only component in a security suite (especially an enterprise security suite) by any stretch of the imagination.

failing to see beyond the abstraction means, unfortunately, that you will fail to argue intelligently about the subject. it also means you will probably fail to make effective use of AV. if you don't know how a tool works, how can you possibly hope to use it to your fullest advantage? furthermore, how can you value something you don't understand? just as you can't price a car based on the effectiveness of the wheels, you shouldn't value AV based on the supposed effectiveness of the scanner.

one of the things that also became clear as i read some of the subsequent tweets was that robert seems to think there's nothing special about his attacks. but the fact is his attacks are special. as a penetration tester he launches targeted attacks. targeted attacks take more effort, more human capital to execute, and he himself described some of that extra effort. this basically means targeted attacks are more expensive to launch than the more automated variety and that they don't scale quite as well. consequently targetted attacks represent a minority of the overall attacks being performed. note, however, that that doesn't necessarily mean targetted attacks are a minority of the attacks a particular organization sees, as it's entirely possible that an organization may be a juicy enough target to receive a great deal of attention from targeted attackers.

from what i can tell, dan kaminsky also has difficulty seeing beyond the abstraction of AV. much of what he says above reflects the idea that AV is a magic box that you simply turn on and it should protect you. in the preceding example it appears he thinks there's an expectation that organizations solve all their security problems with scanners when in fact the expectation is simply that they have AV suites in their toolbox and that they use the appropriate tool (which may or may not be part of the suite) for the job (as rik ferguson attempted to explain).

dan also quoted the DBIR as showing AV to only be effective 3% of the time. i wondered about that so i looked a little deeper. DBIR stands for Data Breach Investigations Report. let the meaning of that phrase sink in a little bit. a data breach investigations report is a report about data breach investigations. data breach investigations are investigations of data breaches, and data breaches only occur when all the effort you put into preventing them failed.

you can't judge how successful something is by only looking at it's failures.

one of the consequences of this is that dan has actually failed to understand the statistic he's reported. the 3% where AV detected the breach still represents a failure because it was a detection after the breach had happened. this can happen due to things like signature updates (a scanner can detect more today than it could detect yesterday).

another consequence is that trying to use the DBIR to evaluate the effectiveness of AV represents a self-selected sample bias because the failure itself causes the event to be included in the study. a success would have excluded the event from the study. now one might have entertained the possibility that dan simply wasn't familiar with selection bias, but as we will see, that appears to not be the case.

it appears that dan has in fact heard of selection bias before, not to mention the WildList too (bravo). unfortunately it doesn't appear that he can use them properly.
  • AV testers don't define the WildList, WildList reporters do (in a sense, but it's probably more accurate to say the the WildList is a result of a particular type of sampling)
  • AV testers typically don't do WildList testing, although virus bulletin does offer a WildList-based certification in addition to larger, more inclusive tests
  • if a product only detects 90%+ of the WildList, it's generally considered to be crap, because the WildList is the absolute bottom of the barrel of performance metrics. anything less than 100% is an embarrassment.
  • AV testers don't define the set of malware they're going to test against, they cull samples from as wide a variety of real-life sources as they can and describe them as being 'in-the-wild' so as to distinguish them from malware that only exists in a 'zoo' or malware that was whipped up in a lab for testing purposes (something that's generally frowned on)
  • defining what you're going to measure is not actually selection bias. "Selection bias occurs when some part of the target population is not in the sampled population" ("Sampling: Design and Analysis" by Sharon L. Lohr)  (now THAT's a textbook definition of selection bias - good thing was minoring in statistics in university). if testers defined their target to be the samples they already had then by definition there couldn't possibly be any selection bias because the target population and the sample population would be the same set.
this isn't to say there isn't selection bias in the tests performed by AV testers. it's entirely possible that some classes of malware (perhaps even targeted malware, for example) are harder to find samples of due to circumstances outside the testers' control. that being said, that bias is a lot more subtle than looking exclusively at failures.

now, it just so happens that i continued digging into the DBIR beyond just figuring out what went into it, and came across a rather interesting chart.
i highlighted the part that should really be interesting here. just to be clear, this only covers organizations that have to be compliant with PCI, but unless organizations that are legally obligated to run up-to-date AV are somehow magically more stupid than the rest of the organizations, the rest of the organizations actually have less motivation to run up-to-date AV and so their numbers are probably as low if not lower. 

now what this means is that there is really no way at all to use the DBIR to evaluate the effectiveness of AV because it appears that most of the organizations included in the report can't even follow the most basic of AV best practices. it also suggests that dan hasn't read his own sources thoroughly enough. if i'm not mistaken his 3% figure comes from the year when only 53% of PCI-bound organizations were running up-to-date AV. the subsequent year it was 1% of breaches discovered by signature-based anti-virus, and in the most recent one, AV doesn't appear to have helped after the fact at all.

that ever decreasing percentage of organizations running up-to-date AV is actually kind of disturbing, and it makes you wonder what's hurting organizations more; the amount of money they pay to AV vendors or the amount of attention they pay to security experts pontificating on subjects they are demonstrably ignorant of?

i anticipate that there will be those thinking that all i do is criticize and that i have nothing constructive to offer, so let's think about how we'd really measure AV effectiveness. the independent AV tests are apparently not good enough - in fact dan kaminsky went so far as to say this:
so how would we measure effectiveness really? like effectiveness in the field where AV is actually getting used? well first of all we stop limiting our data collection to just those instances where AV failed, because that's just ridiculous. no, we'll need to collect data on each and every time the AV raises an alert as well, to supplement our failure statistics. oh, and we'll have to follow up each of those alerts to make sure they aren't false positives because those certainly don't contribute to the effectiveness of AV. we'll also have to use all of the AV suite, rather than just parts of it, in order that people can determine if the effectiveness justifies the money they pay for the entire suite. additionally, we'll need to control for variables - different organizations have different security measures and controls that they deploy in conjunction with AV that may stop some malware before the AV gets a chance. that's not a bad thing, of course, and if they all used the same security measures then we could collect data on how effective AV is under those particular circumstances. but because each organization has different measures, they'll affect the AV results to differing degrees and that will skew the measurement. so we'll have to get organizations to either all use the same complementary measures or get them all to stop using any complementary measures. neither of which seem very likely in production environments, so that leaves us with trying to simulate what happens in the field - but then we get back into the AV testing lab territory which apparently 'no competent soul on the planet' trusts.

the reality is that it doesn't matter what kind of test you do, it's never going to match people's anecdotal experience. that's because testing is inherently designed around the idea of arriving at a single set of results representing how effective AV can be - it's never going to be able to reflect what happens when variables such as complementary controls, sub-optimal operation, targeting motivation, etc. aren't controlled for - and in the real world they aren't controlled for. tests necessarily reflect ideal circumstances and your mileage may (probably will) vary.

were i to be overly judgmental i might sign off this post with this little gem by none other than robert graham himself that i found yesterday will going through my RSS backlog:
The problem with our industry is that it's full of self-styled "experts" who are adept at slinging buzzwords and cliches. These people are skilled at tricking the masses, but they have actually zero expertise in cybersecurity.
but i prefer the school of thought from my own post about security experts from 2006 - security is just too big for anyone to be an expert in all parts of it. it seems to me that the expertise of dan and robert lie elsewhere.

it's important for people to recognize their own limitations, and i believe it's also important to recognize the limitations of the authorities you're listening to as well, lest you give credence to well meaning but uninformed experts. anti-malware is a complicated field, more complicated than i think either dan or robert realize, and if they have difficulty seeing beyond the AV abstraction imagine, how many other people do as well. i hope someday dan and robert and other experts like them can gain a deeper appreciation for how complex it is so that they can pass that along to those who depend on them to do the heavy cognitive lifting.