Dragons in the Algorithm
Adventures in Programming
by Michael Chermside

Threadsafe Java Servlets - a solution

In a previous post I wrote about how nearly all web applications built on Java servlets suffer from potential threading issues. Web browsers can make multiple simultaneous requests, which will result in multiple threads concurrently modifying the (not threadsafe) HTTPSession. Most people just ignore the problems (which strike rarely), some serialize all requests for the same session, but neither of these works as well in a world where AJAX-based user interfaces are becoming more common. I hope to describe the basic outline of a solution; explaining, as I go, the reasoning I used in coming up with it.

As far as I can tell, there IS no universal solution to this problem -- only particular solutions that optimize for certain kinds of application behavior. In our case (we're building an online mortgage application) nearly all of the interaction will occur with a logged-in user, or at the very least will care whether the user is logged in, which almost certainly means reading user information from the Session. So almost every request will be reading from the Session. However, many of the requests will be used for display not update -- they will be reading from the Session but not updating it.

Chris OkasakiThat observation immediately suggests the use of immutable data structures. Immutable data structures are a technique for concurrent programming that has been known for many years, but it has received a renewed spurt of interest recently sparked largely by the 1996 thesis of Chris Okasaki and others who expanded on it. What everyone had always known was that if a data structure is immutable (can't be changed) then multiple threads can safely read it - essentially each thread gets a A Frozen Data Structure"frozen" copy of the data as it looked at the moment the thread began working with it. What Chris's thesis added was a coherent way of analyzing the performance behavior of immutable data structures and some clever ideas that made them nearly as fast as traditional mutable data structures.

So if we use an immutable data structure, then we can be sure that all requests that are just reading data and not updating it can proceed at full speed. They will receive a copy of the data as it looked when the thread began processing the request. Any changes to session that happen during the time that the request is being processed will not be seen, but that is probably the best behavior we could ask for -- it is equivalent to assuming that the read-only request was executed so quickly that it finished before the update occurred.

So we'll use some sort of an immutable data structure to hold our session data -- perhaps it's a linked list, or an immutable map, so far it doesn't really matter. And we'll handle requests that just read data by grabbing a copy of the structure once -- that way no matter how logn the request takes, it will be seeing a single, coherent view of the session data. For requests that must both read and write data we can do the reading the same way: take a copy of the data structure at the beginning of the processing and do all reads from that copy. The difference is that at the end of the service we will have done some writes to the data -- with an immutable data structure this means we'll have generated a new structure which is just like the original except for the small number of changes we have made. The question remaining to be solved is what to do with this changed data.

The "obvious" solution is to just update the pointer in the HTTPSession to point to our newly changed data, but this won't work. This won't be safe in the face of threading issues. Imagine two requests that both do writes are occuring simultaneously: one modifies the customer's current requested loan amount while the other modifies the customer's address. Whichevre is processed second may overwrite the changes of the first -- so one thread would update loan amount then the second (unaware of the first) will set back the loan amount while updating the address. In the end, the user thinks the loan amount has been changed, but it hasn't.

I can think of two solutions here: "one-writer-at-a-time" and "merging". The general idea behind "one-at-a-time" would be for any thread that intends to make modifications to obtain a lock before beginning. Then we would allow as many "read-only" threads as desired, but only one "read-write" thread per user session could execute simultaneously. If the application consists of lots of simultaneous AJAX calls to obtain data but very few that perform updates, then this is a simple and straightforward approach.

To make this idea more clear, let me define an API that would achieve "one-writer-at-a-time" semantics. In the Session, we would place a single ThreadsafeSessionData object, and all programmers would be enjoined not to use the HTTPSession.putValue(). A method ThreadsafeSessionData.getReadonlyData() would return an immutable Map containing immutable data objects with the contents of the session. Servlets that only needed to read data could call this once to obtain a consistent view of the data. Another method, ThreadsafeSessionData.getEditableData() would obtain a lock on the session (blocking if needed) and return an EditableData object. The EditableData object would have a getReadonlyData() method returning the same immutable Map, but it would also have a setValue() method for editing the data (by creating a new immutable Map). And it would have a commitChanges() method which would take the current immutable Map, store it in the ThreadsafeSessionData object, release the lock, and then lock down the EditableData object (no calling setValue() after commitChanges()).

It is possible to go even further than this. Multiple writer threads could operate at the same time, so long as they are not modifying the same piece of data, or relying on a value which is simultaneously being modified. The concept is rather like version control systems that use merging rather than locking. But it requires merging the data after a "commit", which adds a great deal of complexity to the solution. As a programming exercise, I think it would be fun to develop, but finding a reliable means of merging changes is challenging enough that I think for production use I'll stop with "one-writer-at-a-time".

Posted Tue 07 October 2008 by mcherm in Programming