... and whatever comes after that, and whatever comes after that, and whatever comes after that, etc., ad infinitum...
so some experts are saying vista will have malware problems and a bunch of people are talking about it...
can an expert possibly tell me why this is news?
follow me on this one: we know that all general purpose computers (note the alternative) that accept new software are able to support viruses (something that's been known for some 20 odd years now)... we know that self-replication (the ability to make possibly evolved copies of oneself - the defining characteristic of viral malware) is not magical or special in any way so it's reasonable to assume those same general purpose computers will support non-viral malware as well... further, we know that a computer with windows vista loaded on it (or really, any computer with a range of functionality sufficiently broad enough to warrant ANY operating system) qualifies as a general purpose computer...
so the fact that vista will have malware problems shouldn't be news, it should be a forgone conclusion... just as the fact that linux can (and to a certain extent does) have malware problems, and mac os x can (and to an increasing extent will) have malware problems, and smartphones can (and to an increasing extent will) have malware problems should be forgone conclusions...
why has the industry forgotten this? why does the IT industry seem to keep forgetting the past? if it's not viruses then it's DRM or the finer points of key management or shannon's maxim... it's frustrating to watch a collection of supposedly smart people be so dense...
devising a framework for thinking about malware and related issues such as viruses, spyware, worms, rootkits, drm, trojans, botnets, keyloggers, droppers, downloaders, rats, adware, spam, stealth, fud, snake oil, and hype...
Tuesday, April 24, 2007
Saturday, April 21, 2007
when is a bad test not a bad test?
when it's FUD'ing snake oil...
there have been a number of posts about anti-virus/anti-malware testing recently... even i posted about testing in response to what has become a series of posts about anti-virus testing over on anton chuvakin's blog (1, 2, 3, 4)... well this post is a follow-up because anton has managed to post the original test paper that his series of posts were based on...
to say that i was unimpressed would be an understatement... lets start with the number of samples - you may recall from my previous post that i said that the minimum number samples needed to account for the 2% detection rate that was being claimed was 50... according to the actual paper
bias also figured heavily in the test... not only because they only used samples that got through the layers of protection already present on the systems they culled the samples from (thereby missing a potentially huge chunk of what's really posing a threat to users and irrevocably compromising the results and the integrity of the test itself), but on a deeper level the test was written from the perspective of an incident response technician... what is the perspective of an incident response technician? well these are the people who spend their days dealing with the after effects of the failure of security software and/or preparing for the next failure so as to make cleaning up after that event easier than cleaning up after the previous one... all they see are the security product's failures because that's their job, and if it weren't for the immutable fact that all security products fail they wouldn't have that job and they'd have to find some other form of employment (such as performing and publishing dubious tests)... just as police (who deal with criminals on a daily basis) are prone to developing an imbalanced view of society if they're not careful, so too are incident response technicians prone to developing an imbalanced view of the efficacy of security products if they aren't exposed to the security product's successes (which are generally invisible by design)... this acute form of perceptual bias taints the entire test at fundamental levels - including the design and methodology of the test (as evidenced by their belief that they need only collect samples that successfully compromised production systems that already had protective measures in place)...
so far these problems would seemingly be attributable to the testers simply being inexperienced and/or suffering from false authority syndrome... it's time to shine a light on a part of the test that can't easily be attributed to that... there was one product in the test that didn't do too badly - in fact it did better than all other products, it detected 50% more than the next best product, and it was one of the few products in the test that weren't used through virustotal... that product was asarium by proventsure (no link for reasons that are about to be made clear)... go and read the paper carefully (it's only 4 pages) and tell me if you can see something a little off about it... yes, that's right - the product that did the best, the product that was better than all others by a wide margin was the product made by the company whose president helped write the paper and is the contact listed in the abstract... the test, which is titled "Antiviral shortcomings with respect to 'real' malware" by gary golomb, jonathan gross, and rich walchuck, is NOT independent... one or more of it's authors has a clear vested interest in making one product look good at the expense of all others... this puts all the other problems with this test into a new and decidedly unfavourable light... the bad math, the sample selection bias, the insignificant sample size, etc. - in light of this revelation they all point to a cooked test designed to make all products other than asarium look worse than they really are (FUD) in order to make asarium look better than it is in comparison to them (snake oil)... the test, therefore, becomes little more than a marketing stunt by a disreputable company whose product should probably be given a wide berth...
and poor anton chuvakin - though widely regarded as a security expert, not only is he clearly not an authority on malware himself but apparently he also can't recognize a fraud/pretender when he sees one... that doesn't bode well for average folks' ability to do the same, does it...
there have been a number of posts about anti-virus/anti-malware testing recently... even i posted about testing in response to what has become a series of posts about anti-virus testing over on anton chuvakin's blog (1, 2, 3, 4)... well this post is a follow-up because anton has managed to post the original test paper that his series of posts were based on...
to say that i was unimpressed would be an understatement... lets start with the number of samples - you may recall from my previous post that i said that the minimum number samples needed to account for the 2% detection rate that was being claimed was 50... according to the actual paper
Of the 35 malware files, three invalid files were removed from the sample set, leaving 32 malware binaries used in the final tests and performance calculationsso if only 32 samples were used, how is it that the lowest scoring product only detected 2% of the samples? detecting just a single sample gives a detection rate of 3%, not 2%, and all products tested detected more than just one sample... it can't be blamed on anton misremembering the figure he was told either, since the actual test paper states
the lowest was tied between ClamAV and FileAdvisor with a 2% detection ratein one place and
two products tied for the lowest detection rate at 2%thankfully the chart with their results clears this up - it's 2 raw detections (not a 2% detection rate) which means a 6% detection rate (which was also correctly reported in that same chart)... now, you'll have to forgive me for calling a spade a spade, but this level of mathematical incompetence (recognizing that you can't have a 2% detection rate with only 32 samples is grade school math and simply reading the column marked "percent" in a chart takes even less skill than that) is inexcusable for people who wish to have their test taken seriously... given such a complete lack of mathematical acumen, it's almost understandable that they failed to realize that a test bed of 32 samples isn't anywhere near large enough to give statistically significant results...
bias also figured heavily in the test... not only because they only used samples that got through the layers of protection already present on the systems they culled the samples from (thereby missing a potentially huge chunk of what's really posing a threat to users and irrevocably compromising the results and the integrity of the test itself), but on a deeper level the test was written from the perspective of an incident response technician... what is the perspective of an incident response technician? well these are the people who spend their days dealing with the after effects of the failure of security software and/or preparing for the next failure so as to make cleaning up after that event easier than cleaning up after the previous one... all they see are the security product's failures because that's their job, and if it weren't for the immutable fact that all security products fail they wouldn't have that job and they'd have to find some other form of employment (such as performing and publishing dubious tests)... just as police (who deal with criminals on a daily basis) are prone to developing an imbalanced view of society if they're not careful, so too are incident response technicians prone to developing an imbalanced view of the efficacy of security products if they aren't exposed to the security product's successes (which are generally invisible by design)... this acute form of perceptual bias taints the entire test at fundamental levels - including the design and methodology of the test (as evidenced by their belief that they need only collect samples that successfully compromised production systems that already had protective measures in place)...
so far these problems would seemingly be attributable to the testers simply being inexperienced and/or suffering from false authority syndrome... it's time to shine a light on a part of the test that can't easily be attributed to that... there was one product in the test that didn't do too badly - in fact it did better than all other products, it detected 50% more than the next best product, and it was one of the few products in the test that weren't used through virustotal... that product was asarium by proventsure (no link for reasons that are about to be made clear)... go and read the paper carefully (it's only 4 pages) and tell me if you can see something a little off about it... yes, that's right - the product that did the best, the product that was better than all others by a wide margin was the product made by the company whose president helped write the paper and is the contact listed in the abstract... the test, which is titled "Antiviral shortcomings with respect to 'real' malware" by gary golomb, jonathan gross, and rich walchuck, is NOT independent... one or more of it's authors has a clear vested interest in making one product look good at the expense of all others... this puts all the other problems with this test into a new and decidedly unfavourable light... the bad math, the sample selection bias, the insignificant sample size, etc. - in light of this revelation they all point to a cooked test designed to make all products other than asarium look worse than they really are (FUD) in order to make asarium look better than it is in comparison to them (snake oil)... the test, therefore, becomes little more than a marketing stunt by a disreputable company whose product should probably be given a wide berth...
and poor anton chuvakin - though widely regarded as a security expert, not only is he clearly not an authority on malware himself but apparently he also can't recognize a fraud/pretender when he sees one... that doesn't bode well for average folks' ability to do the same, does it...
Sunday, April 08, 2007
defensive lines in end-point anti-malware security
some comments on my post about anti-malware tests have inspired me to examine what the various defensive lines are for end-point anti-malware security... this isn't going to include things like network intrusion detection/prevention systems because those are in the network rather than on the end-point...
as such, the first line of defense happens just as content is coming off the wire and into the system... this is before it gets written to disk (assuming the content ever does get written to disk, which sometimes is not a valid assumption) and is represented by things like exploit scanning network proxies (one might also consider the inbound traffic filter of a personal firewall as working at this stage of the defense)... i say this is the first line of defense because (assuming you have defenses here) this is the first one that malware would encounter when traveling to your system... this hasn't always been a defensive line however; before the internet became ubiquitous, malware tended to be passively shared either on removable media like floppy disks or over the telephone line when downloading from bulletin board systems - both of which happened to suit the second line of defense quite well...
the second line of defense (which used to be the first line of defense) starts just after content has been written to disk and continues for as long the content persists on the disk... this is traditionally the point at which new incoming materials would be screened against a blacklist (anti-virus product, generally)... content shared on removable media is suited to this defensive line because the media is just another disk to check... content shared via BBSes were similarly suited to this defensive line because the content had to be saved to disk before you could do anything risky with it (like try to execute it)...
the third line of defense happens just before the content is executed... this is where application whitelisting comes in (though on-access scanning with anti-virus will also trigger here since accessing the content is usually a prerequisite for executing it)... this of course assumes that the content is executable or otherwise interpretable in some way as instructions for the computer to follow (basically that it's some sort of program)... if not then it's not really malware (even exploit code is a program of sorts, though it tends to be executed in an unconventional way)...
the last line of defense happens after the content is executed... this is where behaviour-based protection (sandboxes, behaviour blockers, some/most types of HIPS, immunization, change-detection, outbound traffic filtering, etc.) comes into play, obviously, because at this point the defenses are looking for bad behaviour from the running programs... there's no later stage than while the malware is executing to prevent the malware from doing it's bad deed - if the defenses at this stage fail to detect/stop the malware's bad behaviour then prevention has failed...
as such, the first line of defense happens just as content is coming off the wire and into the system... this is before it gets written to disk (assuming the content ever does get written to disk, which sometimes is not a valid assumption) and is represented by things like exploit scanning network proxies (one might also consider the inbound traffic filter of a personal firewall as working at this stage of the defense)... i say this is the first line of defense because (assuming you have defenses here) this is the first one that malware would encounter when traveling to your system... this hasn't always been a defensive line however; before the internet became ubiquitous, malware tended to be passively shared either on removable media like floppy disks or over the telephone line when downloading from bulletin board systems - both of which happened to suit the second line of defense quite well...
the second line of defense (which used to be the first line of defense) starts just after content has been written to disk and continues for as long the content persists on the disk... this is traditionally the point at which new incoming materials would be screened against a blacklist (anti-virus product, generally)... content shared on removable media is suited to this defensive line because the media is just another disk to check... content shared via BBSes were similarly suited to this defensive line because the content had to be saved to disk before you could do anything risky with it (like try to execute it)...
the third line of defense happens just before the content is executed... this is where application whitelisting comes in (though on-access scanning with anti-virus will also trigger here since accessing the content is usually a prerequisite for executing it)... this of course assumes that the content is executable or otherwise interpretable in some way as instructions for the computer to follow (basically that it's some sort of program)... if not then it's not really malware (even exploit code is a program of sorts, though it tends to be executed in an unconventional way)...
the last line of defense happens after the content is executed... this is where behaviour-based protection (sandboxes, behaviour blockers, some/most types of HIPS, immunization, change-detection, outbound traffic filtering, etc.) comes into play, obviously, because at this point the defenses are looking for bad behaviour from the running programs... there's no later stage than while the malware is executing to prevent the malware from doing it's bad deed - if the defenses at this stage fail to detect/stop the malware's bad behaviour then prevention has failed...
Tags:
anti-malware,
malware,
prevention,
security
Monday, April 02, 2007
the myth of meaningful informal anti-malware tests
bad tests are not necessarily a problem that is unique to the anti-malware field but it is one that those in the anti-malware community have encountered countless times... it's not that average folks are physically incapable of performing good tests, it's that they simply don't do it...
an informal test lacks strict adherence to established testing protocols, often because the standards set for good tests are so high that most people can't be bothered to go to all that trouble... only those who are dedicated to the process of anti-malware testing have ever done a half decent job of it because it requires a substantial investment of time and effort, not to mention a certain degree of expertise to ensure that the testbed is free of garbage/scanner fodder and duplicates...
i could stop there, i suppose, and hope that what i've said isn't too abstract, but i really object lessons and one sort of fell into my lap... dr. anton chuvakin has used the the results of one such bad test to support the now popular opinion that anti-virus is dead... i saw where this was going early on (the original question was obviously loaded)... i guessed the results would be bad and suggested the most likely reason was that the malware was too new - to which i was told that some samples were weeks old, though weeks can technically still be new enough if no one else has reported it yet... additionally, if a particular piece of malware isn't affecting many people the av companies may down-prioritize it in order to focus on samples that are affecting more people...
the test, as described so far, worked like this: someone who does incident response at a public university collected samples from compromised machines for a period of time and then when the batch was big enough (weeks after the collection was started) this person submitted them to virus total and took note of the results... the results were as follows: the best detection rate was 50%, the worst was 2%, and the average was 33%...
now, i can see a lot of problems with such a test but lets start with the big ones:
an informal test lacks strict adherence to established testing protocols, often because the standards set for good tests are so high that most people can't be bothered to go to all that trouble... only those who are dedicated to the process of anti-malware testing have ever done a half decent job of it because it requires a substantial investment of time and effort, not to mention a certain degree of expertise to ensure that the testbed is free of garbage/scanner fodder and duplicates...
i could stop there, i suppose, and hope that what i've said isn't too abstract, but i really object lessons and one sort of fell into my lap... dr. anton chuvakin has used the the results of one such bad test to support the now popular opinion that anti-virus is dead... i saw where this was going early on (the original question was obviously loaded)... i guessed the results would be bad and suggested the most likely reason was that the malware was too new - to which i was told that some samples were weeks old, though weeks can technically still be new enough if no one else has reported it yet... additionally, if a particular piece of malware isn't affecting many people the av companies may down-prioritize it in order to focus on samples that are affecting more people...
the test, as described so far, worked like this: someone who does incident response at a public university collected samples from compromised machines for a period of time and then when the batch was big enough (weeks after the collection was started) this person submitted them to virus total and took note of the results... the results were as follows: the best detection rate was 50%, the worst was 2%, and the average was 33%...
now, i can see a lot of problems with such a test but lets start with the big ones:
- statistical significance - the sample size was too small... it may sound picky to those who want to believe, but even an unskilled labourer intuitively knows that asking 5 people their opinion on X does not give you a result that can be generalized to the entire population... this test (or at least the interpretation of it that we've been given) pretends to represent the ability of anti-virus products to protect against malware that is currently in the wild affecting people but most likely has a sample size of about 50 (50 is the minimum necessary for a product to get a 2% detection rate, but larger sample sizes make a perfect 50% detection rate increasingly unlikely)... since the wildlist has over 1600 different pieces of malware on it and since that only represents the self-replicating malware (which are believed to be in the minority now), a sample size of 50 (or even 100) just doesn't cut it...
- sample selection bias - the samples came from compromised machines in a production environment... maybe that sounds reasonable to you but let me ask you this - if you test a variety of lie detectors against people who have proven themselves to be good at fooling lie detectors, are you really measuring how good those lie detectors are in the real world? the answer is no, you're only measuring how good they are against an artificially constructed set of people... the same goes for malware from compromised production machines - the malware you find there is the malware that has proven itself to be good at evading anti-malware scanners, not in-the-wild malware in general...
- sample selection bias - the samples came solely from computers in an academic institution... sorry to say but universities are not a typical environment... if anything, they are the place where the incidence of truly malicious insider threats are the greatest (uni students practicing their l33t skills)... home users don't intentionally compromise their own computers, enterprise users may consider it (or even try it) but there are more/stronger controls in place (stricter logical access controls, more personally significant administrative controls, etc.) to prevent and/or deter it...
- test-bed integrity - the test was carried out by incident response personnel... we really have no way of knowing how clean (in the sense of being free of detrimental garbage) the test-bed was... even if it was an incident response technician who was capable of reverse engineering each sample of malware (and had the free time in which to do it), simply reverse engineering the samples would not be enough to determine whether the samples were sufficiently different enough from one another to be considered distinctly different pieces of malware... and since many incident response technicians simply follow a script, there's a serious question as to whether all the samples really were malware instead of scanner fodder and whether any of them were duplicates...
Subscribe to:
Posts (Atom)