Dragons in the Algorithm
Adventures in Programming
by Michael Chermside

Depending on Someone Else's Code

Depending on Someone Else's Code

We had an interesting problem arise the other day, a problem about code dependency. Our problem was in no way unique, so it seemed worthwhile to write out the problem and our proposed solution.

The problem

The core of the problem is that we need to depend on a codebase that was never designed for someone else to depend on it. In this case, we have two divisions within the company -- I'll call them "Eastern Bank" and and "Western Bank". Both provide banking services which are pretty similar: both offer credit cards, checking accounts and savings accounts; both let customers view their transactions and do bill-pay. But there are differences too: the fees and the limits are quite different, and some of the extra features like rights management can be quite different. And the Western accounts are stored on a different back-end system than the Eastern accounts, which requires many small changes to the logic.

Now, there is a large system -- in this case online servicing -- which the Eastern banking team has been developing for about 2 years now and which the Western team wants to start using. It has all kinds of functionality which the Western team wants: showing balances and transactions, making payments and transactions, managing disputes, etc. It would be foolish for the Western teams to start from scratch; clearly they should use what has been built for Eastern customers.

Copy the Code
The first plan

The first plan was to just take a copy of the Eastern banking code, then make the changes in limits and policies, and start to add the Western-specific features. But this would be a grave mistake. Sure, it would work fine for the next year or two, but since the Western team only had a copy of the code, they would need to re-discover and fix every bug the Eastern team found. And 2 years from now, the consumer teams would have added major new features -- they would have done 4 years worth of work, and small business would only be able to take advantage of the first 2 years worth. It is not enough to benefit from the PAST work from the other team, we need to benefit from their FUTURE work also.

Extend a Library
Extending a library

So perhaps a better idea is to treat the Eastern banking code as a library, and use subclassing, template methods, and other patterns to modify just the pieces that need to be different for Western customers. This is an excellent solution, except for one thing: there is a big difference between code that works and a library well-designed for extension. Creating a library is a lot harder.

In order for the Western team to be able to keep the bulk of the Eastern banking code and change only the pieces that need to be different, the Eastern banking code needs to be written with this in mind. If a limit or policy is different for small business, that limit or policy needs to be designed so it is pluggable. If data on Western accounts is to be pulled from a different back end yet keep the processing logic, then the code to pull data needs to be cleanly separated from the processing logic. It is possible to keep these kinds of concerns separate so they can be effectively modified -- this is what a good library designer does when building extensible and reusable code. But it requires a lot of work and attention to flexibility.

And in this case, substantial extra work on the part of the Eastern banking team in order to support the small business use case is not really an option. It would require that team to take on additional complexity, and to carefully test the Western bank functionality (something they are not experts on) with every change. It would slow down the pace of development for Eastern banking (which is actually a significantly bigger group of developers). We cannot use this approach.

Use a Long-Lived Branch
Maintaining a branch

Fortunately, we use git -- surely a version control system can solve our problems! Instead of copying the Eastern banking code, the Western team can fork their repo and then do Western changes in the fork. When Eastern banking makes changes, Western can take advantage of that work by merging from the Eastern branch. Western bank gets to take advantage of past and future work by the Eastern teams, and as a bonus, they can absorb the changes at a time they control -- performing the merge only at the start of a sprint if they like, rather than whenever the Eastern team does a release.

It sounds too good to be true, and it is. This, too, would work just fine at first -- so long as the Western teams haven't done much work in their own fork. But in a couple of years this will completely break down. Eventually, the Western teams will have made quite a few modifications. Some of it will be cleanly separated -- the behavior with regard to user rights management is very different so that code will probably be in entirely different files. But many of the changes will be scattered throughout the code for the same reason that we couldn't just use subclassing and template methods to modify the original. Every place that a policy is different or back-end access works differently will be a change. If I were toguess I would say I expect perhaps 5% of the code to be modified eventually.

And what happens when you merge in changes to a branch where 5% of the code has been modified? You get merge conflicts... lots of them, every single time. Eventually the merge process becomes unmanageable (I have seen this happen) because every single merge requires extensive manual review of individual conflicts which consumes a huge amount of developer time (with little benefit) and offers a myriad of opportunities to introduce errors.

Proposed Solution
A solution... we hope

So, we can't just copy the code because that would mean we lose out on all future development. We can't subclass and extend it because the code isn't designed to support doing that. We can't fork it because eventually the new development will be significant enough to trigger constant merge conflicts. What can we do?

I hope that a combination of the two approaches will work better than either alone. Western bank should make a fork of the original code, and continue to import all changes made to the original branch (but on their own schedule -- at the start of a sprint, not whenever Eastern bank releases). But instead of doing all of the Western development in this branch (which would eventually modify enough code to have frequent merge conflicts), they should check in only a few minimal changes to this branch like making private methods public, final methods non-final (that's Java lingo for allowing them to be overriden in subclasses) and introducing the occasional template method pattern. Then create a separate repository that uses subclassing of this separate branch and template methods and similar patterns to modify the code and implement the Western functionality.

The Western teams get a staging area (the branch) where they can absorb all future changes made by the Eastern team. They avoid merge conflicts by making only the minimal changes needed to make it possible their own code to extend and modify the Eastern code. Hopefully, they can even keep the Eastern bank tests working in their branch -- to easily distinguish between bugs introduced by improperly imported code from those introduced by improper overriding and extending. And they can modify it as extensively as they like using traditional programming techniques like subclassing.

Do you think this will work? Do you have any suggestions (or better yet, experience) in how to solve this problem for two long-lived and complex systems? Reach out and let me know: write to mcherm@mcherm.com.

Posted Sun 28 January 2018 by mcherm in Programming