By: on October 31, 2016

Over the last couple of years I’ve been reading and talking about a lot of things related to distributed systems. This is a common train of thought around here, and after working on this on and off for the past 18 months or so (the version you’re seeing here is in fact version 3 having repeatedly changed my mind about how it should work), I’m placing my own stake in the sand. Potboiler is an AP Event Sourcing system. More specifically, it’s an MVP/research prototype of said, with known issues and is not even slightly suitable for production use. However, I think the shortcuts I’ve taken aren’t fundamental problems of the design but trade-offs to get to something sufficiently usable to play with and get a sense if the design works. There’s a couple of larger problems still to be solved, but I’ll address those in a minute.

Potboiler actually has multiple event logs. Each node has a single node-specific log that it can append arbitrary events to (currently blobs of JSON), and it also stores a local copy of all the other logs it knows about as well, but can only append events that other nodes tell it belong there (which may be the originator node for that event, or another intermediate node). An individual log is linear, and strictly append-only, but to aid ordering of events each has a Hybrid Logical Clock timestamp applied. Nodes tell the other nodes they know about when they get new log events in and they also periodically poll the other nodes they know about in order to do reconciliation of any events that failed to get through the first time. They also periodically poll other nodes for what nodes they know about in order to try and build a fully connected topology of all nodes in the network. Nodes also have clients which they tell about new events. The intention is for there to be a Potboiler node per server (or in less ideal situations one within each presumed-reliable network subset) that needs any of it’s data, and so have shared fate with the other services that use it.

I’ve also built the first client of Potboiler, a Key/Value store that uses Potboiler to distribute update events. All of the keys are arbitrary unicode strings, and the values are CmRDTs. The only current one is a variant on Last-Writer-Wins Register with Hybrid Logical Clocks for timestamps, but I plan to add OR-Set soon. By default, there exists a table called “_config” using LWW for values. If you add an entry to _config, the key name indicates a new table to create, and the value is a config (currently just {“crdt”:”LWW”}). There’s also a simple browser for said K/V tables.

Trade-offs to get to this point are as follows:

  • Postgres is used for storage of everything (logs, K/V tables, etc). This was purely for ease of use. The K/V tables should probably stay there, but the logs should go somewhere else more optimised for that use case rather than the overhead of a full-blown RDBMS for a non-relational use case (possibly RocksDB which was used briefly in an earlier version)
  • All communication is over HTTP and JSON, again for ease-of-use. This will probably continue for external APIs, but something more performant might be a good idea at least for internal communication. One of the earlier versions used ZeroMQ and that might be a good idea again

Major problems remaining:

  • Cross-event dependencies – I’m considering options along the line of an event declaring a list of other event ids it depends on, and that clients wouldn’t be informed of events until after they’ve been told about all the dependent events. This avoids per-client logic and should work with all the main scenarios for events.
  • Scaling as the number of events gets large – current design assumes all nodes have every event since the beginning of time. I’m considering options like “history nodes” which are rarer but store all events, assuming that complete rebuilds of data are relatively rare events and that the other nodes store the last “some decently large N events”. This causes a bunch of problems with the AP properties possible so far, but might be an acceptable trade-off (allowing also for very large values of N if you want a large amount of local storage)

Over the next few months I plan on adding more clients, hopefully without having to change the core node code, and attempt to further demonstrate that a CRDT-based AP approach to data is the right way to go (despite some of my colleagues opinions to the contrary).


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>