Fork me on GitHub

CDVCS

CDVCS (Confluent Distributed Version Control) is a datatype we have developped for replikativ, which models strong consistency inside of the eventual consistent system. You can imagine it as git for data.

Convergence

While in git you have to manually push to different peers to propagate changes, CDVCS automatically converges on all replicas. The same type of merge conflicts as in a distributed version control system like git can occur, but you will be able to resolve them on any peer. This means you have to provide conflict resolution, either automatically by inspecting the conflicting actions or by asking the user.

Advantages of CDVCS:

  • Prototyping: you can easily track the changes of your application and visualize the commit history as a graph. This is only possible because all replicas must agree on the commit graph, which means they agree on a total order of events.
  • Replication of strongly consistent systems: If you just want to replicate your database to your application end-points you can serialize all write operations through one peer of replikativ and then apply them to your desired storage engine. As long as you can provide that all potentially writing peers at maximum have a single writer, e.g. by a coordinator like Apache Zookeeper, you can stream existing architectures through CDVCS.

Weaknesses of CDVCS:

  • Conflict-Management: Managing conflicts is cumbersome and needs some reasoning of how to resolve different commit histories by understanding all potential interactions of conflicting write operations. While the resolution can be decided manually, this often is undesirable as it can be as tedious in many edge cases.
  • Availability: To ensure that merge operations do not introduce more conflicts and diverge because of a slow network, merges block availability by waiting for an adaptive period of time before they commit. This delay is determined by the ratio of #merges/#all-commits which slows divergence down if peers repeatedly merge concurrently. This effectively slows down write speeds and the capability to store data. Other CRDTs like an OR-Map do not remove availability as they resolve conflicts internally without agreement between conflicting peers.

CDVCS works nicely if you just have one writer in your system. Even if this writer is guaranteed to be singular from the outside (e.g. always being the same user), conflict management will be very limited if the writing peers are mostly online. So CDVCS can be a valid choice for many personal apps which do not share state between users.

Background

The paper describing its technical design can be found here. A detailed version of it was presented at PaPoC2016.