Monday, August 09, 2010

numbers, context, and background

one of the things i've come across while reading various sources is an attempt to pin down an intended target nation for the stuxnet worm based on prevalence data. the theory goes something like 'since nation X is where most instances of stuxnet are found, therefore nation X was the intended target (because obviously more work was put into spreading it there)'.

this theory has some problems, however. first and foremost is that not all the numbers agree. while we have symantec saying that ~60% is in Iran, we also have eset saying that ~60 are in the US just 3 days earlier. they can't both be right - or can they? and if they are, what are the implications for the targeted nation theory?

as is always the case when there are contradictory numbers, we have to look at how those numbers were arrived at. in fact, even when there aren't contradictory numbers, we should still be paying close attention to how those numbers were arrived at.

close examination of vikram thakur's post on the symantec site suggests that there number represents actual infected machines trying to connect to their C&C server (on top of everything else, stuxnet is also a botnet) during a 3 day period between july 19 and july 22. they were able to gather this data because they redirected the domains hosting the C&C servers to themselves so it seems like it would be a pretty accurate snapshot of the pool of infected machines at a particular point in time.

eset's numbers in david harley's post came from their installed clients throughout the world. their cloud-based technology reported the instances - however, since stuxnet employs stealth it's more likely that rather than reporting infected machines (where it would be active and hidden) it's actually reporting infected USB drives. it could also be reporting both if eset's products can see through the stealth, but the key point is that eset's numbers almost certainly include infected USB drives while symantec's do not. the USB numbers are important because that's how this worm spreads and if one were going to work on targeting a particular nation, spreading infected USB drives in that nation would be the way to do it.

furthermore eset's numbers appear to be from the time detection for the worm was added until the time the statistic was reported, rather than just the 3 day period covered by symatec's figures. this means that eset's figures represent a measurement of how many instances of the worm there were over it's detected lifetime to that point, while symantec's figures represent a measurement of how many infected machines remained at that point. this is important because by the time symatec started collecting it's data, negative population controls had already been in effect for some time.

controls which, like worms they intend to control, are not necessarily uniformly effective across the entire globe. some products have greater market share in some regions that they do in others, and the dominant product in certain regions might be poor at controlling particular worms and thus allow those worms greater reproductive advantages in those regions than they might find in others. the presence of population controls like anti-malware software affects both the death and birth rates of worm instances and as anyone who's heard johnny long discuss hackers for charity knows, such controls are not uniformly present or effective across all regions.

there are a actually a variety of other factors, in addition to such controls, that contribute to how well and in what way a worm or virus spreads, as discussed in some detail by jeff kephart, david chess, and steve white in "Computers and Epidemiology". some of these factors, like the degree of connectedness of susceptible hosts (and how often adequate contact between such hosts happens), can be influenced by computing culture, which in turn can be influenced by culture in the more general sense, geopolitical climate, and even socioeconomic considerations. hypothetically speaking, a nation that is cut off from US technologies due to trade sanctions (as metioned by by brian krebs) could well exhibit a higher rate of software sharing as part of 'alternative' procurement techniques and in so doing raise the region above the epidemic threshold for some unspecified worm.


ultimately both symantec and eset could be right since they were measuring different things over different periods of time. what that means for the targeted nation theory is that things aren't as clear-cut as either set of numbers would suggest on their own. what we do know is that stuxnet appeared to enjoy more reproductive success in Iran than elsewhere. whether that's down to purely epidemiological factors or intentional injection of the worm into the local computing population by a malicious actor is unclear, but eset's data would support an argument against the latter option as the effort seems to have been expended elsewhere. on the other hand, if we were to entertain the notion that the US was the target based on the amount of infected materials floating around the computing population, then we are left once again with the conclusion that stuxnet was a failure since in spite of all that effort the prevalence of actual infected machines in the US was minuscule.

i don't think much can be read into the fact that there were more infected machines in Iran than elsewhere since such pockets of infection are actually normal - especially for self-replicating malware that must be spread by physical media. some region had to draw the short straw and this time it was Iran.

2 comments:

David Harley said...

Actually, while the prevalence data is interesting (else I wouldn't have blogged it), I've grown less convinced that it tells us anything useful about targeting. It would have been more useful earlier in its lifecycle, but by the time I first blogged it, it had already spread beyond SCADA sites, which I'd say is actually a more critical FAIL than its low prevalence, since it essentially scuppered its ability to effect a targeted attack. At any rate, in its present form.

I agree that Symantec and ESET (not to mention Microsoft, who were also tracking geographical data) weren't measuring the same things (or at the same time). I'd say that the only way to draw any useful conclusions would be to look at several sources and note similarities and divergences. But I see these data as more qualitative than quantitative. After all, the data from each vendor comes from (mostly) discrete user/system populations.

Actually, Stuxnet isn't limitedh to USB devices. Unfortunately, our telemetry, while it's accurate in geographical terms, doesn't give relative proportions for different vectors.

Thanks for your input. Blogging data like these is something I don't do very often, and I'm still thinking about the best way of presenting 'em. :)

kurt wismer said...

"I've grown less convinced that it tells us anything useful about targeting."

i'm not convinced it tells us anything about it either. both the contradictory implications from the different datasets, and the fact that viruses and worms are often more prevalent in one area than in another. i think there's a strong motivation for us to find patterns and meaning, but i'm not convinced there are any to find in this case.

"Actually, Stuxnet isn't limitedh to USB devices. "

i realize it also copies itself to network shares, but as it reqires a hardcoded absolute path for the lnk exploit to work it seems unlikely that those copies would activate (unless someone was using mapped drives).

assuming there were mapped drives, that could account for a higher probability of infection between nodes within an organization (making the org behave like a clique) but i'm not sure yet what the larger implications of that would be.

"Blogging data like these is something I don't do very often, and I'm still thinking about the best way of presenting 'em. :)"

well, people will probably always find ways of reading more into numbers than is actually there, but i suppose trying to head at least some of them off at the pass might not hurt.