Many users of Rabbit have been asking us about how Rabbit copes with many large messages in queues, to the extent that the total size of these messages exhausts the available physical memory (RAM). As things stand at the moment, the answer is not very well. Although we have a persistence mechanism, that is not quite an answer either because whilst it does ensure that messages are written to disk, it does not remove messages from RAM. So, we’ve been looking at writing a disk-based queue so that should RAM become tight, we can start to push messages out to disk and collect them later from there.
However, there is this thing called swap, and it seems wise to test how Rabbit copes when we just allow it to expand into swap. The current releases of Rabbit monitor memory usage, and by default use
Channel.Flow to tell publishing clients to stop sending messages when memory gets tight. However, if you start up Rabbit with
-rabbit memory_alarms false then the memory monitoring does not occur and so clients will not be told to stop sending messages when we run out of memory. This means we can just start hammering more and more messages into Rabbit and exhaust RAM. Cue fitting an extra 160GB hard disc to be used solely as swap.
Quickly, we hit another problem. The OTP platform, which sits atop Erlang and provides a series of common behaviours for Erlang processes, has a couple of places where it specifies default timeout values of 5 seconds on replies coming back to messages. When the whole computer is stalled swapping out pages, these timeouts can often be exceeded, and so we went through the code base and set all such timeouts to
infinity. This does not alter behaviour in the non-this-computer-is-in-a-lot-of-pain case, but when the computer is unwell, it allows Erlang to solider on regardless (albeit somewhat more slowly!). For the brave souls of you who wish to test this for yourself,
hg clone/pull from the usual repository and update to the latest on the
Unless it turns out that swapping just works extremely well, it’s pretty likely that we’re going to be writing our own disk-backed queue, and if we do, we need to be able to demonstrate that it was worthwhile — i.e. it works better than just using swap. Thus we need to measure the performance when using swap to give us something to compare against. So, we have two tests. Before getting on to the differences, I’ll start by mentioning the similarities. All message payloads are 10MB in size. Both the client and server are run on the same machine and are communicating using the loopback network device. The machine has an Intel Core2 Quad CPU Q9400 running at 2.66GHz, and 4GB of RAM. When the tests are started, about 3GB of that RAM is available. Each test is started on a frest running instance of Rabbit, with an empty database. The kernel is the Debian stock 64-bit 2.6.28 kernel, and I’m using Erlang R12B-3 (Debian version: 1:12.b.3-dfsg-4). When fetching messages,
Basic.Get is used and
no-ack is turned on. I used the Erlang AMQP client.
The first test type is pushing in N messages and then pulling them back out again. I capture the elapsed time for each action (be it a publish or a get), and then have graphed them.
So, when N is 64, 128 or 256, it’s not really too exciting. This is easily explained: 256 10MB messages easily fits into the 3GB RAM available. Thus nothing much to report on. First let’s see the cumulative time graphs. Note the axis — we have time on the y-axis, not the x-axis. So a steeper gradient means slower performance. (Click on any of the images to get them a bit bigger.)
Next we can take the first differential of these graphs and see how much time is being spent on each operation. The y-axis is now logarithmic:
In all cases we see that getting messages is slightly faster than publishing messages, and that as the number of messages in the system, and hence memory used, increases, we see slightly bigger spikes — this, I’m guessing is the garbage collector having more work to do, but so far, nothing too surprising. Now let’s see what happens when we ramp up 512 messages. This is 5GB of data, there’s only 4GB RAM in the box, and only 3GB is free at the start of the test. So it’s pretty certain we’re going to hit swap.
Everything’s going along just fine until we get to about 310 messages in the Rabbit, and then performance starts to become somewhat less predictable. Fetching messages is on the whole slower than before, though on the differential graph, we do see some spikes showing that there are periods where performance recovers. Presumeably this correlates to large numbers of pages being swapped back in and then allowing Rabbit to run reasonably quickly for small periods of time.
Just for fun, I also did this with N as 1024, though as it took 20 mins to run, I only did this test once:
It’s clear here that publishing when we’ve run out of RAM isn’t too bad, and this makes sense — all that is required is that a page is swapped out and we’re given a new page to write to. Getting messages is much slower as we may have to both read from and write to swap.
The next test is more interesting. For a given N, start by publishing N messages, then publish-and-fetch-a-message 2N times, and finally drain the remaining N messages. Fewer graphs this time, just one before we hit swap, where N is 64:
Note that for the middle segment, the time is for publishing and getting a message. Now, as soon as we have N as 256, we start running out of memory. This is only in the middle segment and again, does make sense — although we can fit 310 messages in memory, as we are publishing and getting, the memory is (presumably) much more fragmented and as such we can fit in fewer messages. We’re also at the mercy of the garbage collector to reclaim messages to which we no longer need to hold.
In the cumulative graph here, we can see that it starts off pretty much the same as for when N is 64 — the gradient gets a bit steeper when we start publishing and getting, but when we get to about 330 messages, suddenly we hit the first step, when we run out of memory and start making use of swap. Now let’s see about N is 512. Again, this one took so long that I only ran it once:
Again, the step where we start swapping is clearly visible at 310, though of course in this test, we’re still ramping up and just publishing messages at this point. Interestingly, in the one-in-one-out phase of the test, performance seems to repeat its pattern (in the differential graph). Whilst we’ve had some guesses, we’re really not too sure what’s going on here, though it’s likely very specific to the swap algorithms, kernel and interaction with the garbage collector. Fun.
So it’s good to see that nothing really goes wrong: it does keep working, and if you don’t need Rabbit to be amazingly fast but want lots and lots of big messages in your bunny, then this is perhaps a good enough solution. Certainly pairing Rabbit with a good SSD swap disk may work well enough for you. For others though, we now have a repeatable set of metrics that allow us to test different designs for a disk-backed queue.
Some of you may have noticed that when I first published this, all the graphs had y-axis that said milliseconds, not microseconds. Publishing a message does not take over 100 seconds, fear not. I had just managed to not read the documentation about what Erlang’s
now() function returned and had failed to consider whether the values were likely to be milliseconds. Fortunately, I’d saved all the graphs in postscript, so a quick find and replace in emacs and everything’s better!