Offering a RabbitMQ server on the public internet poses a few challenges. Today we’ll look at the problem of controlling fast producers.
What’s the problem with a fast producer? First, “fast” means “faster than the consumers can pull messages off the queues”. That means that our queues start getting long. Long queues mean that
- messages start to sit for longer and longer in queues, ruining latency, and
- enough messages and RabbitMQ will start to page to disk to cope with the excessive amounts of memory, ruining performance, and
- enough of those messages and the broker’s disks will fill up and it’ll use disk-based flow control to prevent anyone from publishing messages to the broker.
Even once we or RabbitMQ have throttled the fast producer – even if we stop the producer completely – it takes time for the consumers to chew through the backlog. For some applications – ones where we simply need a big buffer to temporally decouple parts of our application – there’s no problem. For applications wanting low latency messages, high latency can be disastrous.
Now we know the problem, what can we do? Firstly, note that RabbitMQ and TCP already provide throttling mechanisms. If something publishes faster than RabbitMQ can write to a queue, RabbitMQ will apply “flow control” to the producer, forcing it to slow down. But in the context of a RabbitMQ as a service, we a bigger problem. We want to share resources among several users. Possibly even share a RabbitMQ between users, because vhosts are cheap. How do we ensure that a user doesn’t compromise someone else’s use of the service?
One approach – one that RabbitMQ Bigwig uses – is to throttle users at the TCP level. This has the advantage of not having to alter RabbitMQ itself. This implies giving each user a separate, non-standard port: how else could we identify a user when we have no idea from what host or port they might connect? However, we can’t naïvely throttle a user’s connection. Consider that a producer will publish a message to the broker, and the broker will acknowledge delivery by sending a
basic.ack back to the producer. In contrast, the broker will deliver a message to a consumer, and the consumer will send the broker a
basic.ack back. If we tightly throttle a fast producer, we slow down the rate of messages arriving… but we also end up preventing consumers from acknowledging messages they’ve received! While we’ve prevented more messages arriving, in other words – preventing the further clogging of the broker – we’ve also remove the ability for a user to fix the problem!
RabbitMQ Bigwig adds one extra trick to prevent this second problem. We provide users with separate producer and consumer ports. We apply ingress limits to the producer port, to prevent any user from impacting negatively on other users, and a tight limit on the ingress limits of the consumer port. This permits the user to publish as hard as she likes (subject to the constraints of her plan) while letting her consume as fast as she likes. There’s no point in someone trying to circumvent the control and try to increase ingress rates by sending to the consumer port because its ingress rate, designed to permit the small, low-cost
basic.ack frames, simply isn’t useful for publishing.
To summarise, RabbitMQ Bigwig
- identifies a user by giving her a unique pair of ports, and so
- prevents anyone from compromising someone else’s network performance by applying Quality of Service rules at the TCP layer