When APIs don't return the same data

Originally published on Tumblr.

We expect a function call, f(x), to return the same value for ‘x’, no matter how often it’s called. An API call is a remote function call, so we expect the same kind of behavior. The only time we don’t expect this to be true is when we’re dealing with the physical world, rand(), or when the function implementation changes. Ignoring functions like rand(), In the two other cases we need a consistent way of letting the caller know something about the context of the result.

Let’s think about the physical world first.

Consider the function: weather(location). We expect the result to change over time. The result is meaningless without a timestamp. Is it the implementor’s responsibility to include that in the result, or is it the caller’s to track the relationship between the result and the time? I would argue that it’s the implementor’s, because the context is part of the data.

Now let’s consider the case of a changing implementation.

In the API world, this is tricky. One of the reasons people use APIs is to isolate themselves from implementation.

When we work with a local library, we know that an implementation won’t change out from under us, so, barring interaction with the real world, we expect the same result every time we call a function. When the implementation changes, the library version changes. Everything is explicit.

API versions rarely change, but implementations change all the time. This means that, in some cases, results will change, which can lead to quite a bit of confusion.

A few months ago I was looking at the Echo Nest API and noticed that certain audio processing calls always return version number of the software package that generated them. This is a clever way of explicitly exposing the “library” version. This is possible because all of the processing is presumably done by a single, isolated piece of software.

Another part of the API isn’t so neat. It derives information like genre by looking at the content of a music file and also at comments made by people on the Net. The result is almost certainly derived from the output of multiple algorithms. Over time, the result changes, but the user has no way of knowing why. Reading the support forums I saw that this often confused people. Some were angry.

At OP3Nvoice we have the same problem. We are working on some functionality that will be derived from multiple pieces of software that will change over time. The goal makes sense. We want to give our users the best possible results at all times. These results will magically get better, which implies that they’ll change. We’re still trying to figure out how to communicate that change to our users.

We know that if we give them no context they’ll be confused, and sometimes angry, just like the Echo Next users I saw. If we give them the version number of every piece of software used to derive the data, we’re putting a lot of burden on them to track each number, and we’ll be exposing some proprietary implementation details. We’ve been thinking of generating a synthetic version number for the lot, but haven’t settled on a good method.

This isn’t a problem I’ve heard discussed in the API community. It’s possible that I’ve just missed the discussion, or it’s possible that we’re enough of an edge case that no one has in fact been talking about the problem. I’d love to hear what my readers have to say about this.