Monday, July 06, 2009

on the efficacy of passphrases

so now that we've all had a good think about whether masking passwords is really the right way to go, let's question another bit of wisdom - let's question whether passphrases are really as secure as we think they are.

you see with all the thinking out loud recently about the password authentication system one of the topics that got discussed in various sectors was the passphrase. the passphrase idea is that instead of using a relatively short string of bytes (which we'll all hope for the sake of argument isn't a dictionary word) as the shared secret you use to authenticate yourself with some system, you use an actual complete sentence which, although made up of dictionary words is quite a bit longer and thus theoretically harder to guess or attack by brute force.

the theory goes that the more characters are in your shared secret (be it a password or passphrase or whatever) the more possible combinations a program would have to go through before it eventually found the one that matched. additionally, since a passphrase is a proper sentence made out of proper words the consensus seems to be that it would be easier to remember.

but are those things true? let's look at the length issue. one of the things you'll often encounter when selecting a password is the need to make it complex. complex in this context means that it draws characters from as wide an array of the printable ascii character set as possible and that it doesn't contain dictionary words - basically the closer it looks like a random string of characters the better. the reason for this is 2-fold: the wide array of characters is to maximize the number of possible values for each character in the password so that the total number of possible passwords a brute force attack would have to go through would be maximized, and random non-dictionary-word property is to remove any constraints on what the various characters might be that would ultimately lower the total number of possible passwords (example. in english the letter Q is almost always followed by the letter U). natural language introduces considerable constraints - although it takes 7 bits per character to represent the entire set of printable characters, the english language is generally known to contain a little over 3 bits of information per character because of all those constraints. that means that an english passphrase of 20 characters is about as strong as a random password about half that length. at least if that were the only issue involved.

now how is your memory? a while back when i discussed choosing good passwords that you should have a different password for each account you make so that if one gets compromised the rest remain secure. nothing about using passphrases obviates the need to have a distinct and different one for each account so can you remember 20, 50, 100 different sentences? can you remember them all perfectly? can you remember which one goes with which site? perhaps not.

how could an average user make that easier (besides reusing passphrases)? the most obvious tactic would be to use culturally significant phrases. things like:
"The rain in Spain falls mainly in the plains"
"Are you smarter than a fifth grader"
"Rudolph the red-nosed reindeer had a very shiny nose"
i say this is the most obvious tactic because we've already seen it in the media - if you'll recall way back there was a television program called millenium starring lance hendriksen. hendriksen's character was recruited by a clandestine organization who supplied their recruits with passphrase protected computers, and in hendriksen's case the passphrase was "Soylent Green is people" (a quote from the movie soylent green).

this adds another layer of constraints on the set of probable values and my gut instinct is that this constraint is so significant that rather than mounting a full brute force attack it would be feasible to simply mount a dictionary attack. i will concede that there are a very large number of culturally significant phrases that people might choose from, but i doubt the cardinality of that set differs significantly from that of the set of words in an actual dictionary. furthermore, it wouldn't be too difficult for an attacker to trick people at large into generating a dictionary of passphrases for him/her simply by setting up a free website that requires an account logon in order to access porn.

as such, i question both the idea that passphrases will be easier to remember when used properly and that a passphrase is more computationally expensive to attack. i think the strength gained by increasing the length is a phantom improvement, that the added strength is thrown away due to the various additional constrains on the set of possible values. i think there will be no ease of use benefit as the sheer number of passphrases will still require them to be recorded somehow (and since they're so much longer than normal passwords, recording them all will be more onerous). i like the idea of expanding password fields to accept more characters, but i see no reason to use that extra space for anything other than longer strong passwords.