An Exception to Every Rule

I like automated code scanners, really I do. They can scan your code either before or after you check it in and review it for code formatting, memory errors, or even potential security problems. It can prevent lots of foolish errors and unnecessary inconsistencies.

But there is one catch: the tools are "dumb", and there always needs to be a way for a knowledgeable human to override it. Usually it's a special comment that tells the tool not to report a particular infraction. The override is used in that rare special case where there is a good reason why the rule does not apply.

Now, I know a certain portion of my readers are already forming objections, certain that THEIR pet peeve is the one case to which there should never be an exception. For instance, rules requiring consistent indentation throughout the project are good, but suppose some of your source files are from an external source? It's not wise to reformat them just to placate the code scanner -- that will make it harder to merge in changes when the next revision is released.

Or for another example, the capitalization rule that Java instance variables always begin with a lowercase letter seems very good, but when my team created an XML binding mechanism that mapped XML fields (which are case sensitive and often begin with a capital letter) to instance variables, being consistent with the XML file was more important than following the standard Java conventions.

I can still hear someone arguing. Some reader out there is still complaining that there is a special reason (security? correctness checks?) why some rules must be absolute. Frankly, I think this reader is just a control freak and they should learn to let programmers act like the professionals they are, but in order to convince you, I'm going to tell a story.

[caption id="attachment_253" align="alignright" width="304" caption="Larry Osterman"][/caption]

This is the story of a programming rule so VERY absolute that before you read the tale you'll agree that there is NO possible reason to violate this rule. Yet after an automated code scanner discovered the violation, the catastrophic result was that for 2 years, all cryptography originating from a major operating system was completely insecure.

So... on with the story. (And many thanks to Larry Osterman, pictured here, from whom I first learned this sordid tale.)

The programming rule that I am sure you will agree with is that one should never read from uninitialized memory. In many modern programming languages this is difficult or impossible (Java, for instance, guarantees instance variables are pre-initialized to 0), but in C it is so easy that people constantly do it by mistake. Since there is NEVER a reason to read from unitialized memory (which could contain any random junk), surely we can expunge this behavior from our code, right?

Now, Linus Torvalds and his compatriots manage the source code for the kernel of the Linux operating system, but a usable system requires much more -- a whole set of systems and applications which are tested and configured to work together. This is called a "Linux distribution" and there are several major distributions in wide use: SUSE, Fedora, and Mandriva among others. The single most popular today is Ubuntu, which is based on Debian.

The folks at Debian collect code from a number of different projects and carefully review it, test for compatibility, and then certify the resulting distribution and provide a place to download the finished product. One of those components is OpenSSL, an open-source implementation of some cryptography libraries that provide support for basic functions like secure random number generation. And one of the tests that the Debian folks perform is to analyze the code with Valgrind, an automated code scanner that detects memory leaks and similar problems.

Valgrind detected that there was a problem in one of the OpenSSL routines that generated random numbers -- it accessed uninitialized memory. Not knowing how to suppress the warning, someone at Debian decided to "fix" the code.

What they didn't realize is that the OpenSSL developers had known exactly what they were doing. The crux of a cryptographically secure random number generator is an entropy collector. You see, the problem with generating "random" numbers on a computer is that nothing on the computer is truly random. You can mix and mash bits with the fanciest hash function in the world, but if the seed you start it off with is just the current time off the clock and an attacker can guess that (all but the last few digits are awfully easy to guess), then the attacker can repeat the same process and determine what "random" key was chosen, thus completely cracking your security.

So a cryptographic random number generator (as opposed to a garden variety RNG) goes to great lengths to collect "entropy". It may start with the time off the clock, but it mixes in the number of milliseconds between key presses on the keyboard. And it mixes in the process ID of the OpenSSL process, data traveling over the network socket, micro-timing of the hard-drive motor and of mouse movements, and anything else they can use as a source of "randomness". When they are first creating the data structure for the entropy pool, they intentionally left the memory uninitialized, because it's yet another source of randomness.

debianlogo-100 Unfortunately, a Debian maintainer didn't realize this. And they also made a minor error when changing the code: they accidentally commented it out in such a way that all sources of entropy other than the first one (the process ID) were multiplied by zero before being mixed in. So there was NO randomness except the process ID (which is very easily guessed).

[caption id="attachment_258" align="alignright" width="196" caption="Ubuntu"] Ubuntu [/caption]

Then Debian was released with this bug, and Ubuntu picked it up and distributed it further. And 2 years went by. During that 2 years, everything done using the RNG on Debian or Ubuntu Linux is insecure because the keys are guessable. Everything! Any SSL connection made from such a machine. Any secure certificate signed by such a machine. And no one noticed for two whole years.

So the moral of the story is, don't behave like that ignominious Debian developer and change code that you don't understand. But also realize that for any supposedly-universal rule, there is some special case exception. In almost all circumstances, the rule is good, but it is still wise to provide some way for the experts who do need to violate the rule to declare that they are doing so on purpose (preferably with a required explanation of why). Otherwise, someone who doesn't know what they are doing is likely to break things.

Posted Wed 31 December 2008 by mcherm in Programming