Memcached protocol is not enough
Reliability: Memcached ought to be a caching layer
As the original concept memcached was a caching layer. The data that went into it were just duplicates. The effect of this decision is clearly visible in the design of client libraries. When you kill the SQL database, you’ll immediately get a huge exception shouting that data can’t be saved. While some memcached clients
only silently report a failure:
>>> mc.set('key', 'value')<br> True<br> >>> os.system('killall memcached')<br> >>> mc.set('key', 'value2')<br> False<br>
irb(main):003:0> cache = MemCache::new '127.0.0.1:11211'<br> irb(main):004:0> cache["key"] = "value"<br> => "value"<br> $ killall memcached<br> irb(main):005:0> cache["key"] = "value"<br> => "value"<br>
This is behavior is fine when memcached is treated as a caching layer, but it’s unacceptable when it’s used as a persistent storage.
Optimistic locking: versioning
When multiple clients are modifying a single value in memcached, it’s very hard to avoid race conditions. Once again, it’s not a very big problem for a caching layer. But for a persistence layer it can result in losing precious data. Usually you can work around this problem by using version numbers for records. We can start with very simple versioning schema:
def versioned_set(key, value):<br> version = mc.incr('%s.version' % (key,))<br> return mc.set('%s.%s' % (key, version), value)<br><br> def versioned_get(key):<br> version = mc.get('%s.version' % (key,))<br> return mc.get('%s.%s' % (key, version))<br>
Using this schema we can implement optimistic locking
. But what will happen when a client dies between incr
operations? To get this right the code is going to become quite complicated. What we actually need, is for the versioning to be done on the server side.
Speed: parallel mutli_get
Client libraries are designed for extremaly fast in-memory only memcached. It’s so fast that they don’t even try to query multiple servers at the same time, there would be no benefit in that. Even though Tokyo Tyrant and MemcacheDB are quite fast, it can sometimes take a while to find your data on the disk – single disk seek can cost 10ms. Memcached clients will query servers sequentially as they are not used to such latency.
Proper parallel implementation of multi_get method is doable, but it makes a library code much more complicated, and in fact slower for the in-memory memcached use case.
Speed: CPU is wasted
Memcached uses a text protocol
. It means that you can speak to the server using telnet. But it also means that a server needs to find new line characters to be able to recognize a command. Eventually protocol parsing becomes a huge waste of CPU. It’s so painful, that a weird thing was developed in response: a multi-threaded version of memcached
Network bottleneck: too many packets
When you ask memcached for a bunch of small values, it will send a bunch of tiny packets as a response. Believe it, or not, but it can saturate a network switch and flood your CPU with unneeded network interrupts. We should be able to group many requests in a single packet and receive response also as a single datagram.
This feature must be supported by both client and server.
The silver bullet: Memcached binary protocol
Memcached developers proposed a major upgrade to current protocol, a binary protocol
. It’s much better than the old text protocol. It actually fixes all the protocol flaws I described above:
– It has a CAS
field, that can help clients in versioning and optimistic locking.
– The parsing of the requests has a negligible CPU cost.
– It tries to fix the network-packets problem by allowing requests to be grouped.
On the other hand there are also several flaws in the binary protocol:
– There is no mainline implementation of it.
– There are no client libraries that support it.
– Extending the protocol – for example adding new commands – is not defined anywhere.
– I don’t like the ‘server version‘ command, it doesn’t tell anything about the server.
– Some of the concepts are barely documented – the exact behavior of the CAS feature or how to exactly group requests.
– I think it’s overcomplicated in some aspects.
The thing is that memcached binary protocol is much more than just a caching layer protocol. It’s funny to see that memcached developers actually created a protocol that fits more a persistent key-value storage use case rather than a dumb cache.
Memcached binary protocol seems to be a reasonable protocol for my needs in a realtime search engine.
Actually, I’ve already implemented a server that uses this protocol
and a python client library
. The server supports Berkeley DB storage
, Tokyo Cabinet
, raw filesystem and an experimental append-only storage. The client code allows querying of multiple servers in parallel and supports grouping multiple small requests. However, the code is very experimental, has a lot of bugs and it probably never will work on anything other than my computer (TM).