Wishful Thinking — Paul Murphy

Originally published on Tumblr.

Most people don’t think of engineers as wishful thinkers, but to a large extent, we are.

To paraphrase Martin Geddes, how can anyone consider the Internet fit for purpose when we have no mechanism to guaranty that any part of it will satisfy an unknown demand? Because it works most of the time doesn’t mean it will always work. It most cases, that doesn’t matter. In some, it does. Building software on this infrastructure and expecting it to always work is wishful thinking, and people do it every day.

Expecting an API to always deliver its advertised functionality requires us to double down on that wishful thinking. First, public APIs depend on the Internet. And second, no service behind an API can scale infinitely, no matter how well engineered. At some point it will be unable to fulfill its contract.

Most API operators have no idea at what point their service will fail, since failure can be the result of so many different kinds of stress. API consumers are further in the dark as they have no means of even knowing that potentially fatal stress is building up.

Dealing with failure is a requirement, but dealing with failure elegantly and getting the job done are two different things. If an API is unavailable, no amount of error handling is going to help. But being able to anticipate failure might prevent it altogether. If all clients know that an API is overloaded, some percentage of them might “back off”, thereby preventing a hard failure. Today, this is impossible, since no feedback mechanism allows a client to monitor the health of an API in real time. APIs have no “gauges” which tell the client how close they might be to a breaking point.

These two topics, API specification and API instrumentation, are two sides of the same coin.

In the physical world, specifications are common, expected, and critical. In the software world they are rarely even discussed. This silence is the result of decades of wishful thinking. Software specification languages have languished in academic obscurity while everyone builds software components that are “good enough” without stating under what circumstances they will work as expected. If we can’t attach specifications to standard library components like hash tables, how can we possibly attach specifications to complex APIs made up of hundreds of components?

We can’t, so it’s amazing that any software works at all.

As with the Internet, and fundamental software components, most APIs are good enough in most cases, but not all. If it were possible to fix this problem – and that’s a very big if considering the foundation – should we try?

Maybe the problem is too complex already. Maybe traditional hardware techniques don’t scale. Maybe adding specifications to a sophisticated API is as crazy as the idea of adding specifications to a biological system. Maybe we need a completely different approach, something adaptive, modeled on biological systems themselves.

Machine learning techniques give us a possible clue, but machine learning requires a feedback loop. They tell us that we can avoid specification if we provide a feedback mechanism. Earlier I mentioned the idea of clients backing off when a server becomes overwhelmed. That’s a good example of a feedback loop allowing a system to avoid local failure.

Massively complex systems like our bodies are able to cope with components of varying and unexpected tolerances, as well as sudden local failures. So if natural selection led to a design so apparently friendly to wishful thinking, it might just be arrogant to think we can do better.

We’ve gotten very far without any sort of software specification. Did we just get lucky? If today’s incredible software progress depends on wishful thinking, we may well have. Despite that, we should probably to give the feedback loop a little more thought. I have a feeling we’re goint to need it.