Towards Production: Getting in Shape
Behind Automerge-RS
In 2017, Ink & Switch decided to collaborate with Martin Kleppmann to see if we could build an easy to use CRDT in javascript based on Martin’s research. By and large, I think we were successful, which can be seen in Automerge v0.14.1.
While we hit the goals of being a correct CRDT that was easy to use, there were a number of shortcomings. It was too slow, and as every change in the history was stored completely in an uncompressed format, the footprint of a document was too large.
Martin began work on the performance branch to address these problems. The new, but unreleased, version of Automerge is much more robust, and solves many of the performance problems of the original. But most importantly creates an extremely efficiently compressed binary format that makes radical improvements on the memory and storage footprint of these documents. This performance branch is very solid for day-to-day use but not recommended for anything in production as tweaks to the binary format and api calls are still happening regularly.
Early this year, I decided that Automerge was ready move from its proof-of-concept javascript-only implementation to the cross-platform, high performance implementation I’ve always desired. To accomplish this, Alex Good and I teamed up to write AutomergeRS, an API compatible implementation of Automerge’s Backend half, implemented in Rust, with WASM bindings. The package also contains an early implementation of a rust-native frontend, and C bindings for the backend, to enable Frontend implementations in other languages such as Swift. While there is not an official npm package for AutomergeRS yet, I have rolled one for myself. It only works in Node, as the wasm loading is async in the browser and would require the library to be wrapped in a promise.
We modified a pre existing performance test made by Martin to ensure we met or exceeded the performance of the javascript performance branch.
Next Steps
There are many steps in getting Automerge ready for the big time. But a very important one that I am personally focused on is making sure Automerge is the performance king of CRDT’s. That crown is currently held by yjs, an excellent CRDT that focuses specifically on collaborative text editing. Now, there’s some apples-to-oranges issues in comparing them, as these two CRDT’s have very different goals. Yjs is javascript only, Automerge intends to be available on all platforms. Yjs is for managing lists (or things you can build out of lists) with a bespoke API. Automerge allows you to manage your document as a plain old javascript object with arbitrarily nested maps and lists. Yjs does not preserve the document history in a meaningful way, whereas Automerge preserves the entire history and allows for easy point in time recovery and branching, similar to Git. Automerge allows you to move the CRDT portion of the document out of the render thread, or even onto a different machine, via the frontend/backend split, while Yjs just focuses on being fast enough that you shouldn’t need to.
Nevertheless I wanted to see how far we needed to go to catch up and what I could learn from their approach. We adapted an existing benchmark used to compare Yjs to other CRDTs to see what we could see. This test suite is very comprehensive and has been invaluable to work with. But for this post I want to cherry pick a handful of tests that I think distill the current performance story
yjs | automerge1 | automergeWASM | |
---|---|---|---|
Version | 13.4.1 | 09-08-2020 | 09-08-2020 |
[B1.2] Insert string of length N (time) | 4 ms | 441 ms | 35 ms |
[B1.4] Insert N characters at random positions (time) | 99 ms | 947 ms | 224 ms |
[B3.1] 20√N clients concurrently set number in Map (time) | 95 ms | 516 ms | 8 ms |
First, in test [B3.1]
, Automerge does very well with concurrency and maps. This makes sense since Automerge supports maps natively whereas Yjs implements them on top of lists. Second, Yjs dominates in sequential character insertions as seen in test [B1.2]
. Yjs has a number of very clever optimizations around this specific scenario because text entry (its target use case) is mostly sequential and thus deserves specific optimizations. The first is compressing spans of elementId’s with common actors and sequential counters. In fact test [B1.2]
becomes a single op where Automerge creates an op per character. The second is keeping a set of “hot” list positions cached to speed up indexOf(elementId)
operations which are predictable in mostly sequental text entry.
Both of these are very clever optimizations that we could take advantage of but, to me, the standout test is the one that shows Automerge falling behind on random inserts. In this scenario, neither of these optimizations play a part and as (under the hood) both are using the same Lamport Compare to do the CRDT work, we should be neck and neck. As it stands we need to reduce our execution time by about 50% to catch up.
After some aggressive profiling I was able to determine a large chunk of time was spent in memory allocations related to change request versioning. To explain, the frontend does not generate a fully formed change. It instead creates a ‘change request’ which is like a change but missing four things. It lacks the pred
, startOp
, deps
fields and list elementId’s. It’s the backend’s job to fill these out before creating the cannoical change object. The problem is that the backend needs to hold on to previous versions so that when a request says “insert after list index 7” it has to know what the state of the world was at that time to correctly traslate 7
into an elementId. This requires the backend to keep multiple versions of the document state in memory. Currently we use immutable data structures so the cost to do so is low… but not zero. And in the usual case of the frontend and backend being caught up this versioning does nothing while extracting a hefty tax.
I tried disabling the versioning code (which would produce wrong results when the front-end back-end get out of sync) to understand what the performance cost was and saw speedups between 15% and 30% from that alone. This is no small matter when all I need is a total gain of 50%. The solution is to have the frontend produce fully formed changes, complete with pred
and elementId’s.
That change has been the subject of the current pull request I have open with Automerge that I hope to have merged soon. This would be the first time my rust work would result in a significant change to the Automerge API as up until now I’ve been trying to work entirely within the constraints laid out by Martin’s design. With any luck this will be accepted into Automerge/performance and I’ll be back with better benchmarks next week.