Memcached protocol is not enough

By: on May 21, 2009


source

Memcached protocol is not enough

A few months ago I was wondering if it’s feasible to build a scalable realtime search engine using shared-nothing architecture. One of the essential project decisions I need to make, is to choose a decent communication protocol to storage nodes. Recently, the memcached protocol is becoming a standard as a key-value protocol. It’s not only used by a memcached cache-server, but it has also been adopted in persistent key-value databases like Tokyo-Tyrant, LightCloud or MemcacheDB. However there are several things that make this protocol a very bad choice for a persistent database.

Reliability: Memcached ought to be a caching layer
As the original concept memcached was a caching layer. The data that went into it were just duplicates. The effect of this decision is clearly visible in the design of client libraries. When you kill the SQL database, you’ll immediately get a huge exception shouting that data can’t be saved. While some memcached clients only silently report a failure:

 &gt;&gt;&gt; mc.set('key', 'value')<br> True<br> &gt;&gt;&gt; os.system('killall memcached')<br> &gt;&gt;&gt; mc.set('key', 'value2')<br> False<br>

Others, like the Ruby client, just ignore errors:

 irb(main):003:0&gt; cache = MemCache::new '127.0.0.1:11211'<br> irb(main):004:0&gt; cache["key"] = "value"<br> =&gt; "value"<br> $ killall memcached<br> irb(main):005:0&gt; cache["key"] = "value"<br> =&gt; "value"<br>

This is behavior is fine when memcached is treated as a caching layer, but it’s unacceptable when it’s used as a persistent storage.
Optimistic locking: versioning
When multiple clients are modifying a single value in memcached, it’s very hard to avoid race conditions. Once again, it’s not a very big problem for a caching layer. But for a persistence layer it can result in losing precious data. Usually you can work around this problem by using version numbers for records. We can start with very simple versioning schema:

 def versioned_set(key, value):<br> &nbsp;&nbsp; &nbsp;version = mc.incr('%s.version' % (key,))<br> &nbsp;&nbsp; &nbsp;return mc.set('%s.%s' % (key, version), value)<br><br> def versioned_get(key):<br> &nbsp;&nbsp; &nbsp;version = mc.get('%s.version' % (key,))<br> &nbsp;&nbsp; &nbsp;return mc.get('%s.%s' % (key, version))<br>

Using this schema we can implement optimistic locking. But what will happen when a client dies between incr and set operations? To get this right the code is going to become quite complicated. What we actually need, is for the versioning to be done on the server side.

Speed: parallel mutli_get
Client libraries are designed for extremaly fast in-memory only memcached. It’s so fast that they don’t even try to query multiple servers at the same time, there would be no benefit in that. Even though Tokyo Tyrant and MemcacheDB are quite fast, it can sometimes take a while to find your data on the disk – single disk seek can cost 10ms. Memcached clients will query servers sequentially as they are not used to such latency.
Proper parallel implementation of multi_get method is doable, but it makes a library code much more complicated, and in fact slower for the in-memory memcached use case.
Speed: CPU is wasted
Memcached uses a text protocol. It means that you can speak to the server using telnet. But it also means that a server needs to find new line characters to be able to recognize a command. Eventually protocol parsing becomes a huge waste of CPU. It’s so painful, that a weird thing was developed in response: a multi-threaded version of memcached.
Network bottleneck: too many packets
When you ask memcached for a bunch of small values, it will send a bunch of tiny packets as a response. Believe it, or not, but it can saturate a network switch and flood your CPU with unneeded network interrupts. We should be able to group many requests in a single packet and receive response also as a single datagram. 
This feature must be supported by both client and server.
The silver bullet: Memcached binary protocol
Memcached developers proposed a major upgrade to current protocol, a binary protocol. It’s much better than the old text protocol. It actually fixes all the protocol flaws I described above:
 – It has a CAS field, that can help clients in versioning and optimistic locking.
 – The parsing of the requests has a negligible CPU cost.
 – It tries to fix the network-packets problem by allowing requests to be grouped.
On the other hand there are also several flaws in the binary protocol:
 – There is no mainline implementation of it.
 – There are no client libraries that support it.
 – Extending the protocol – for example adding new commands – is not defined anywhere.
 – I don’t like the ‘server version‘ command, it doesn’t tell anything about the server.
 – Some of the concepts are barely documented – the exact behavior of the CAS feature or how to exactly group requests.
 – I think it’s overcomplicated in some aspects.
The thing is that memcached binary protocol is much more than just a caching layer protocol. It’s funny to see that memcached developers actually created a protocol that fits more a persistent key-value storage use case rather than a dumb cache.
Memcached binary protocol seems to be a reasonable protocol for my needs in a realtime search engine.
Actually, I’ve already implemented a server that uses this protocol and a python client library. The server supports Berkeley DB storage, Tokyo Cabinet, raw filesystem and an experimental append-only storage. The client code allows querying of multiple servers in parallel and supports grouping multiple small requests. However, the code is very experimental, has a lot of bugs and it probably never will work on anything other than my computer (TM).

Share

5 Comments

  1. Mebiblu says:

    The text protocol also has check and set, with the caveat that no python client library supports it.

  2. Steve Yen says:

    Hi, interesting thoughts, and I agree that binary protocol is the way to go for the future.

    Some corrections:

    • on “no mainline implementation”…
      The memcached server versions 1.3 has implemented and supported binary protocol for awhile, as does the just-released 1.4 version.

    On “no client libraries”…
    The libmemcached C client library supports binary protocol. There are now several language wrappers around libmemcached for popular scripting languages, such as PHP, Ruby, etc.

  3. marek says:

    @SteveYen
    Thanks for suggestions!

    I tried to use a binary-protocol branch of memcached some time ago, and I don’t have good memories. I assumed that it was just a proof-of-concept. I definetely must check out the latest memcached release.

    I also tried the python implementation (what I think was the reference implementation), but without success.

    It could be interesting to take a look at libmemcached and see how some features are implemented there.

  4. No doubt that memcached is a very powerful caching solution, but in my opinion, if we are to work in a distributed enviornment, and we require all the data to be available at cost of a local read or write call, then better solution is to go with NCache. I used it, and in its Partitioned-replica topology, all the data is not only available at a cost of a local operation, but also, it provides a 100% fail-safe mechanism. I sat up a testing environment for its fail-safe environment, and it proved itself reliable and scalable solution.
    As you’ve discussed that you wanted to develop a search engine, so you will probably be thinking that how to load the prior information into the cache? I mean, do you really think it is wise to gather all the data from the web at the time of initiation, and make your user suffer with bad or no results? while I was going through the documentation of this caching solution, I read about a set of functions provided for pre-loading the cache, and keeping all the data in the cache as well as in Data store up-to-dated; mainly, ICacheLoader, IReadThrough, IWriteThrough and IWriteBack.
    Even though, it is mainly for .NET Platform, but I think the developers of NCache are providing interfaces for other platforms.
    I’d say do look it up and see if it suites you.

    I used the Express edition, which is free. you can check it out @ http://www.alachisoft.com/ncache/ncache_express.html.

    Max.

  5. bhg says:

    thanks for the info, but i am sure memcached is the best choice for linux if not for windows

Leave a Reply

Your e-mail address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*