Thoughts on complexity

When I posted a link to the old “Friends don’t let friends do Java” picture, I never expected it to receive a lot of comments. I never expected being called a neo-luddite, either. I guess, I should have prepared for this, since it was a fairly cheap jab at Java. It got me thinking about all sorts of complexity though.

Before I proceed any further, I must disclaim that I am as guilty as any other programmer: under pressure of budget and time constraints I have cranked out code that I am really ashamed of. When required to choose between sabotaging business plans and dying a little inside, I will always choose the latter without second thoughts. However, I feel it’s important for us as craftsmen, philosophers and artisans to do some occasional introspection, reflection and reality check.

Classifying complexity

If the code works and the business task is solved, is it really important how that was achieved? Yes it is. It’s been fairly established that technical debt (however vaguely defined) is something that should be avoided at all costs. Or at least paid as immediately as possible, before it starts charging you the interest. However, when it comes to code complexity, it is often overlooked. Sometimes people even use tools and metrics that force you to increase complexity for no apparent reason at all. I’m going to focus on two kinds of complexity today: patterns explosion and dependency hell.

Explosion of patterns and fake extensibility

Now, Java has been ridiculed a lot for the abuse of patterns by its users. Lots of it occurs naturally as a result of language being extremely verbose and at the same time insufficiently powerful. You couldn’t make something like TransactionAwarePersistenceManagerFactoryProxy up even if you wanted to. But it’s not exclusive to Java at all. I see people building extreme class hierarchies in Ruby too. Even if they don’t use crazy inheritance chains, the result is using all the patterns prematurely.

Maybe it’s better illustrated by an example: http://madewithenvy.com/ecosystem/articles/2015/active-record-and-the-srp/

What he wanted to do: create a record with a provided email, send some notification that’s about two lines of code in any sane codebase. What he actually did: build two classes with dependency injection, abstract out the repository and notifier interface. That’s extra two classes and two interface contracts for what amounts to be a chance at some future extensibility. I mean, fine, if you really need this stuff, you need it. If your application requires twenty different notifiers for this particular case, by all means go for it. You don’t reach for a sledgehammer to crack a nut. What I find especially sad, that this kind of abstraction is often considered okay by all kinds of OO best practitioners. It looks fine, its metrics are fine, method length is fine, class length is fine, number of arguments per method is fine. Codeclimate or a similar tool will probably rate this code A. I disagree.

DHH has been fighting this kind of architectural astronautics for ages. He didn’t succeed. Now I know how easily a rails application turns into a giant ball of mud. However, this is the case when preemptive taking of medicine is much worse than any illness it would cure. You don’t use the nuclear option unless you’ve exhausted all the other ways.

Dependency hell

Dependencies are another huge source of accidental complexity that you might willingly add to your app without thinking too much.

Consider a simple task: let’s say you see a feed of posts and comments and would like to especially filter out those that are in Orcish. Let’s also say that you don’t mind some false negatives (i.e. Orcish comments that get through the cracks), you’re fine with 80% successful result (however you do mind false positives, i.e. Elven comments being classified as Orcish). What’s the best way to deal with this situation?

Now, language detection is a known problem. Common solutions include training a neural network, using bayesian analysis on trigrams, maybe something else. However these are fairly costly solutions. For example, javascript franc library includes 250kb of raw json with a bunch of trigrams. That’s 250kb of json you have to load before you even do something meaningful. After that the target text is split into trigrams, checked against that huge dict, and so on and so forth. Now, does this solve the problem? Yes. Is it the best way to do it? It depends.

Our generic solution didn’t take into account that we’re okay with 80% success rate and only one blocked language. Do we really need this dependency? I’m saying we could get rid of it and make a much better ad-hoc solution that will be two orders of magnitude faster. Just make a list of top-100 most popular Orcish words that are unique to Orcish, and build them into one big regex like this: /word1|word2|word3|word4|...|word100/. The way most sane languages implement regular expressions, it will probably be blazingly fast, require miniscule memory, and be perfectly fit for embedding into a browser, even a mobile one.

Obviously, this approach doesn’t scale too well and has some very noticeable limitations (relatively high false negatives rate), but it’s dead simple and doesn’t bring in a huge dependency. We never shoot assault rifle at mosquitoes, even if it killed them, because that would be simply wasteful (and sometimes dangerous). However, when it comes to dependencies, developers often pile them on without so much as twitch of an eye.

Oftentimes, performance hit you get by abusing dependencies is not really an issue. 1 second load time or 5 seconds load time, who cares if it only needs to happen once per session. However, another thing to consider before you make your application more complex for no reason is abstraction leakage. Joel Spolsky formulated his brilliant law of leaky abstractions back in 2002. What this means to us, practically, is that either you understand the inner mechanics of the abstractions you use, or one day you’re going to have to deal with an abstraction that fell apart right on your hands. In my practice, I’ve encountered all kinds of breakage on all levels of stack: hardware fails, os fails spectacularly, networking fails, infrastructure, like rdbms, fails so often it’s not even funny, your libraries fail, your framework fails, your client browser fails and obviously your application fails all the time. Now you can’t prevent stuff from failing, it’s a fool errand, however what’s not there cannot fail. The less moving parts there are, the more robust the mechanism generally is. Think about this before adding another 35mb “compact language detector” gem to your project.

Conclusion

Like I mentioned in the beginning of this essay, we are often faced with the problem of external constraints passed to us directly from The Powers That Be. Short on budget, short on time, we build some quick and dirty hacks that solve our immediate problems, but lead to more problems in the future. It’s very similar in nature to the technical debt: you add some dependency or pattern today, you have to pay the cognitive price for the rest of your project’s life. Since it’s not immediately apparent, measured, quantified and turned into KPIs and metrics, crawling complexity doesn’t get much attention. And even if it did, there is often no easy solution for this.

However, I’m looking into the future with optimism. Dependency overuse starts getting frowned upon, which is a good thing in my book. Also, a lot of work is being done in the language development field, where new generations of languages are almost always superior to their predecessors: think swift, clojure, elixir, scala, rust. These new tools empower us to do more in less code with less cognitive cost and more safety guarantees. And nothing makes lazy me happier than having more for less.

A journey of a thousand miles starts with one little step.