I tend to think of Oracles BDB java edition as the solution to all storage performance problems: The least performance sacrifice you can possibly make to get ACID storage in java. So when as our automated configuration import for Magnolia got slower and slower, it was naturally what I turned to.
Its particularly suited in this case, because Jackrabbit has a very simple contract for a persistence manager to implement, and handles indexing (and hence searching) separately, so there’s no need to handle query optimization.
Unfortunately, it didn’t immediately prove to be reliable, and I had to shelve it until I had some more time to work on it. In fact, the problems turn out to be simple – they just benefited from fresh eyes.
I haven’t wanted to delve much into Jackrabbit, so I have ignored how it uses the persistence manager. PersistenceManager.store(ChangeLog) does all its work in a transaction. This is probably enough to make Jackrabbit ACID, however either Magnolia or Jackrabbit have this wrong – you can certainly leave your Magnolia instance in an inconsistent state if you interrupt the application.
I’ve mostly noted this as various strategies to get the BDB environment shut down properly have failed. You want to share an environment between multiple workspaces, because Magnolia sets up lots of them, and BDB’s cache would be very inefficient if it was split up into lots of little segments. Its not easy to use finalize to clean up the environment – I use the collections interface which I have to pass the environment to directly, and of course BDB runs several threads, which mean an environment won’t be garbage collected until its closed. In the end, I’ve settled on tracking the set of workspaces using the environment. I also have a fall-back – a ServletContextListener that will shut down any open environments when the servlet context shuts down.
I still need to write some decent documentation for this, but there is the mercurial repository, if you want to try it out.Its a maven project.
It is fast, but I haven’t measured the performance. Because of the above mentioned problems with Magnolia and transactions, you may not want to use it for your production magnolia author instance, but its great for developers and public instances. A remotely accessible service would be at least as reliable as using an external DBMS. Its already no worse than using an embedded DB or other embedded persistence manager.