Comments on anti-virus rants: tavis ormandy's sophail presentation

One of the other considerations to bear in mind is...

2011-09-27T18:58:28.529-04:00

One of the other considerations to bear in mind is, if you like, one of the 'dirty little secrets' of the industry - that of having to deal with ridiculous tests of our products. There exist thousands (perhaps millions) of corrupt/dead/intended/junk files, which are only relevant to people with large collections of 'malware' for the purposes of testing. The way that many companies (by no means all) deal with this is to simply add simple signature based detections for those samples. CRC is a fast method for a certain proportion of those, and although it's hardly robust, nor is it likely that a CRC taken across some specific parts of those files will be all that problematic. This comes down ultimately to the problem of verification. Testing of AV is a lost cause if you don't verify the samples. No tester truly verifies all the samples, ergo, all testing is next to useless. Notwithstanding that, people continue to test our products against piles of useless files - therefore, the only solution is, to some extent, to 'cheat' by adding - en masse - detection for such useless crap. The low impact way (from a user perspective) is to use 'signatures'. This gives the testers the illusion that they are having some sort of real impact on the confidence that users have in AV products, and gives AV vendors time to get on with the real job of dealing with actual malware (for which task you require emulators and much more modern methods than CRC's across a file). I seriously doubt that Sophos (or any other decent AV company) is purely using CRC as their frontline protection - if they were, they'd be out of business.
As to the emulation, Vess already covered that perfectly.

@Bruce Thompson "If it's relatively easy ...

2011-09-14T22:22:33.429-04:00

@Bruce Thompson
"If it's relatively easy to spoof the CRC32, then it's also easy to obfuscate it. A single bit change could result in the signature no longer matching. This, to me, is the inherent weakness of any checksum based detection scheme."

it's the weakness of all known malware detection systems regardless of how they're implemented - once you change a known malware it is no longer the same malware and thus no longer qualifies as "known".

however, since known malware detection was only ever appropriate for known malware it has never been appropriate to rely on it exclusively. it has always been called for to use methods for detecting unknown malware to complement the known malware detection algorithms. there are a variety of reasons why unknown malware detection methods can't be relied on exclusively either.

I wouldn't be concerned about collisions with ...

2011-08-26T15:54:11.781-04:00

I wouldn't be concerned about collisions with good files, my concern would be creating false negatives. If it's relatively easy to spoof the CRC32, then it's also easy to obfuscate it. A single bit change could result in the signature no longer matching. This, to me, is the inherent weakness of any checksum based detection scheme.

To put it another way, a CRC32 match on a known bad actor is a good indication that you've found that bad actor, or you've found something that for some mysterious reason is masquerading as that bad actor. Finding no matches for anything known tells you precisely nothing about what it is you're looking at.

As Vesselin Bontchev surmises, the Sophos emulator...

2011-08-06T21:20:18.350-04:00

As Vesselin Bontchev surmises, the Sophos emulator doesn't have a static cutoff of 500 cycles. Like F-Prot's - and, I suspect, most other code emulators in most other decent threat-detection products - our emulator is controllable at runtime, scan-by-scan.

In fact, one of the reasons Ormandy's paper shows only a small number of older executable packers being handled by hard-wired code in our product (e.g UPX, PECompact) - something he assumes is a reason for criticism - is that we use the emulator to assist with the bulk of our unpacking needs.

That generally takes a _lot_ more than 500 CPU cycles :-)

As Kurt says in the main article, the "halting problem" says we may end up emulating for ever, with no result. So it's no good just leaving an emulator to run "to see what happens".

But, as Vesselin suggests, you can run the emulator in a controlled way, giving it more and more rope only if the results so far seem to justify it.

Very loosely put, the greater your certainty you're on the track of something interesting, the more millions of instructions you give to the emulator.

the CRC collision with good files seems like a dif...

2011-08-06T13:05:11.157-04:00

the CRC collision with good files seems like a difficult attack to mount as i imagine it would be difficult to predict what part of the file the AV vendor chooses to calculate a CRC of.

good insight about the emulation, though. thanks for that. the dynamic approach you describe does indeed make sense when something fishy is found. i didn't really have any frame of reference to judge what's large enough for when nothing fishy is found though.

A minor point regarding the CRC problem - I think ...

2011-08-06T12:48:05.015-04:00

A minor point regarding the CRC problem - I think that both Tavis and you have missed it. The fact that CRC checksums can easily be forged is indeed a problem (theoretically) in using them for malware detection. The problem, however, is not that an attacker can create a file that has a CRC you already use and plant it on your system, as both Tavis and you surmise. The problem is that the attacker can create a piece of malware that has the same CRC as that of a well-known and widespread legitimate program. When when the AV program puts the CRC for the malware in its database, it could suddenly create thousands of false positives all over the world. Normally, such things are found during testing, but SNAFUs do happen from time to time.

The reason why I said that it was a theoretical problem is that during my 23-year career as anti-virus researcher, I have yet to see a malware author using this attack.

Another minor point about the cut-off of the emulator. If the Sophos scanner indeed has a static cut-off of 500 cycles for the emulator, this indeed isn't very smart. The problem is not only that it is too short; the main problem is that it is static. For instance, our emulator uses a default cut-off of 2 million instructions - but the detection language in the database can control the emulator. For instance, a detection entry for a malware variant can say "oh, and if you found this sequence of bytes while emulating, keep emulating for X more cycles". In other words, the cut-off is dynamic, not static.