Two more uses for a Secure Hash

In a previous article I talked about using salt with a secure hash, but all of the examples assumed that the secure hash would be used to validate passwords. Secure hashes can be used for many other purposes, and I'll illustrate that by describing two other possible uses.

The first example is the generation of unique IDs for identifying a document or block of text. Suppose that you are creating a system that stores documents of some sort along with some metadata. It might be as big as a document management system, something like blogging software that stores messages, or perhaps a photo-sharing site. We'll assume that you need some sort of unique ID for the documents; perhaps to use as a key for the metadata. And let's say that you have to make sure that if the SAME document is added multiple times that it receives the same key each time.

The easiest and most common approach to generating unique IDs is to use a central service that assigns new IDs on demand—often a database sequence. The requirement that a document added multiple times receive the same ID means that we cannot simply assign the next value in the sequence without first checking to see if this document is one that has already been submitted. This means we will need to keep available the full contents of all documents ever submitted to the system. It becomes impossible to support the archiving old documents or storing them in a location with occasional downtime. We are also required to have a single central authority (and potential source of failure) to verify the contents and issue IDs.

So instead, we take a secure hash of the document, and use that as an ID. There is now no need for a "central authority" listing the full content of the documents; we could even put the documents into some sort of "cold storage" yet still be able to recognize duplicates. You may be thinking "but what if two documents happen to have the same hash?". If it were not a secure hash, then that might be a danger -- someone could modify an existing document in such a way as to cause a collision. Because it's secure we only need to worry about accidental collisions.

Accidental collisions sound like an insurmountable problem, but in reality you can completely ignore them. Using a 128 bit hash, one would expect to try 22 quintillion different documents (that's 3 billion documents for each and every human on earth) before getting even a 50% chance of a collision. Your program is in much greater danger of producing errors due to cosmic radiation flipping bits on your hard drive than due to accidental hash collisions.

Signup for YouTube A second practical example of the use of secure hashes is one you see very often when signing up for accounts on random websites. Although our lives would much easier if sites used OpenID for signups, most simply ask for an email address. Then to verify it, they send you an email with a link to click -- once you follow that link, your account is activated.

What form should the link take? Suppose you used something like this: "http://mysite.com/activate?addr=janet@yahoo.com". The problem is that the user could guess this easily. Even if they DIDN'T own the "janet@yahoo.com" address, they could sign up for it, then type this URL in -- without ever receiving the email. Here is where the secure hash comes in. Take a secure hash of the address and send that in instead. It'll look something like this: "http://mysite.com/activate?hash=dbb7fc9e". Don't forget to include a salt with the email address before hashing it, otherwise a knowlegable attacker could still guess the URL to type in.

Posted Thu 12 June 2008 by mcherm in Programming