Always use Salt

I'm not a very good cook. One reasons is that I've never mastered the art of tasting the food as I go along and seasoning it properly. When cooking, I never seem to add the right amount of salt. As a programmer, though, I always use salt.

"Salt", in programming, isn't a seasoning, it's a tool that is used alongside cryptographic hash functions. So I'll begin by explaining what a cryptographic hash function is.

When you forget your password and call up the system administrator, they'll offer to reset your password. They will not offer to tell you what your old password was... in fact, they can't find out even if they wanted to! This is a good thing—it means that a system administrator with a grudge can't log in under your account and leave incriminating evidence; it also means that they can't check other systems (perhaps ones the administrator doesn't have access to) to see if you used the same password there. And it means that even if a hacker manages to break into the system and promote themselves to system administrator, they won't automatically know everyone's password.

But how does it work? After all, the system administrator has access to all of the files in the system—isn't there a file that has all the passwords? How else would the computer itself be able to tell that you entered the right password when trying to log in?

In fact, there is no such file, because all decent systems use a cryptographic hash function.

A hash function is just a function that takes as input one of a large number of inputs (like any string used as a password) and gives as output one of a small number of outputs (like a 32 bit number). A simple example of a hash function is the "mod" function: just treat your input as a BIG number (a string of up to 160 bytes can be treated as a 128-digit number) and trim off all but the last few digits. These sorts of functions are used for securing passwords, for generating "fingerprints" of files and other information, and as building blocks in more complex cryptographic schemes.

A hash function would not be useful for cryptography if there were a simple way to find the input given the output (find the password given the hash of it). But it would also be unusable if you could find two inputs giving the same output. With the mod function for instance, any two inputs with the same last few letters will have the same hash. All you need to do is guess the last couple of letters of someone's password and you can get in!

Of course any hash function can be broken by a technique called "brute force". Brute force simply means trying different inputs until you find one that gives the output you're looking for. This approach always works, but it may take a million years of computing power (even with quantum computers). That's a LOT better than the 2 seconds it takes to guess the last letters of a password, which is why we don't use mod as a cryptographic hash function. Inventing real cryptographic hash functions is one of those tasks you shouldn't attempt unless you are one of the top ten experts in the world in cryptography: use a published standard instead like MD5 or SHA-1. Except don't use MD5 or SHA-1 because they've been broken: I would recommend SHA-256 instead.

Okay, this is fine, but what does it have to do with salt?

Well... if you are trying to recover a random string from a secure hash, the brute force method will take an unreasonable amount of time. But most passwords are not random strings! Most are a simple dictionary word or a name, perhaps with some a->@ and i->1 replacements. What a hacker can do is to precompute the secure hash of every word in the dictionary plus a bunch of alternates and compare the hashed passwords against this file. (All the words in the dictionary is a LOT fewer than all the possible strings of characters, which makes this far more feasible than brute force.) There are data structures like rainbow tables capable of storing of all passwords of decent length made up of letters and digits, or you can even use Google as a reverse-hashing table for common words.

As a user, you can avoid this by using a passphrase instead of a password. But as a programmer, you can protect your users from the dangers of common inputs by using salt. Before you feed the password, the file, or whatever input into a secure hash function, generate a random string of 5-10 characters and prepend it to your intended input. This random string is called the "salt" because it's sprinkled in with the actual text being hashed. Store both the hash and the salt. When you need to test to test a sample password (or other input) to see if it matches, first prepend the salt you had stored, then apply the hash function and see if it matches.

(Note: some people use a single salt value for ALL inputs to their program. Bad idea... then an attacker can prepend this salt to a list of dictionary words and precompute the hash values. But if each securely hashed value has a different salt then a precomputing attack would have to be separately targeted against each individual password and it could only be prepared if that person's salt were already known.)

It's a simple technique, can be applied while still using all the standard cryptography libraries to do the hashing, and it provides nearly complete protection against a whole class of attacks. That's why I recommend that programmers always use salt.

Posted Mon 05 May 2008 by mcherm in Programming