Friday, September 05, 2008

Maintainable code -- order matters

It's been a while since I mumbled about writing maintainable code. It's after midnight now so it must be time to start stringing the typos together.

One thing I see fairly often is code that is meant to fetch data from a store that is implementing some sort of caching. An example might be a user ID to username lookup routine, or names to authorization credentials, whatever. It boils down to given some key, find a value based on it -- updating a cache if necessary -- and return the value. There are at least two ways to write this code: the obvious way, and the better way.

The Obvious Way
if key in cache:
cache.maybe_track_query(key)
return cache.values(key)
otherwise:
value = figure_out_the_value_for(key)
cache.add(key, value)
cache.maybe_track_query(key)
return value

Earlier today, I was looking at code where almost 70 lines of C separated the code dealing with tracking the query. Bah, that's not right. This sort of duplication leads to subtle bugs that are hard to find because humans aren't good at detecting small discrepancies. Never create problems humans have a hard time fixing.

When you're writing code that solves this sort of problem, think about whether or not you can restructure your code so each action happens once. The maintainer will thank you. Actually, that's a lie. Maintainers don't really thank people. What will really happen is you will reduce by one the number of times the maintainer wishes to break your ribs.

The Better Way
if key not in cache:
value = figure_out_the_value_for(key)
cache.add(key, value)
cache.maybe_track_query(key)
return cache.values(key)

I know that this structure costs an extra cache lookup. If your cache really is slow to look in, then feel free to make the code uglier with a boolean to track the results of the existence check.

Scratch that.

Your cache isn't that slow.

If you think your cache is that slow, you are wrong, wrong, wrong until such point as you've hooked a profiler up and seen in at least 2 common usage scenarios that your cache look ups are slow.

This may seem like a contradiction of my earlier post advocating a fail early approach. On some overly-literal level it may be. However, it is entirely consistent with the goal of helping the maintainer: make the intent clear.

No comments: