Tuesday, April 08, 2008

no such thing as non-executable files

i happened to notice a post by benny ketelslegers presenting a digested view of a symantec post by andrea lelli on file format vulnerabilities... i didn't give the symantec post much thought when i first encountered it, but when i read benny's post i realized that i (and perhaps many others) have been taking the issue of executable files for granted and that the word choices being made could lead to an altogether wrong concept of what a program is and by extension what sorts of things can effectively be malicious agents on a computer...

perhaps my first exposure to the idea that there are no non-executable files came when i was still a teenager learning that for every sequence of bytes there exists a turing machine for which that sequence of bytes is a virus... i didn't even really know that much about turing machines at the time (other than they were a way to model computers) but i understood what it meant - for every sequence of bytes there exists a possible computing environment where those bytes are interpretable as a self-replicating program...

it's an interesting statement, but it's awfully narrowly focused on viruses... it turns out that the focus on viruses was just because of the context... in reality, for every sequence of bytes there exists a possible computing environment where those bytes are interpretable as a program...

that probably still sounds pretty formal to the average person, but not as formal as fred cohen's reference to the generality of interpretation... the simplest way i can express this is that there is no fundamental difference between data and code, that program code basically is data but that it's data for which there happens to be some part of the computer (some other program or subsystem) that will treat it as a collection of one or more instructions... any piece of data you can imagine can potentially be interpreted (or even misinterpreted) as program code, although it may not have much meaning or do anything useful as a program...

this ability to treat arbitrary data as code is what allows us to add new, previously unimagined capabilities to our computers in the form of new software... it's what makes our general purpose computers 'general purpose', but there's still more to the story... the ability to treat a particular chunk of data as code in practice may not even be intentional, it can be an emergent property resulting from excessive complexity and/or poor implementation... this leads to what i often refer to as exotic execution and data that makes use of such a property is generally referred to as an exploit...

all this is to say that all data may also be considered by the computer to be program code under the right conditions, and determining if those conditions have or have not been met can be very hard... as such, to say something is not program code, to say that it is non-executable is an oversimplification that can (especially when uttered by authoritative sourced like symantec) promulgate an idea that there are safe, non-executable file types which in turn gives people a false sense of security when handling those types of files...

indeed, the majority of the file types referenced by the symantec post explicitly support the ability to contain code that their respective reader applications are intended to execute (whether that be javascript in pdf documents or visual basic in microsoft office documents)... the fact that those file types can also carry known exploits should have little bearing and the very fact that people need to be corrected about the safety of such files points to a more endemic false sense of security about data in general... there are in fact no safe file types and no such thing as non-executable files...

0 comments: