Sunday, November 04, 2012

Consensus Protocol

You probably heard of 2-phase commit, Paxos, consensus protocol. Paxos itself seems to be a complex thing to understand and, although proven to be correct, it's not uncommon to hear that it's not really required and some "simpler" algorithm can be used.

These posts I came across give a great explanation on the shortcomings of 2-phase commit when it comes to failures and also of 3-phase commit, which handles well one type of failure (fail-stop). It gives great examples of hard distributed system issues with these protocols that make them not robust enough.

And this is a great quote from these articles:
Mike Burrows, inventor of the Chubby service at Google, says that “there is only one consensus protocol, and that’s Paxos” – all other approaches are just broken versions of Paxos.
I'd definitely recommend you take the time to read them:

Consensus Protocols: Two-Phase Commit
Consensus Protocols: Three-phase Commit
Consensus Protocols: Paxos

And, just to finalize, when I see quotes like above and the number of issues with 2PC and 3PC, I wonder how reliable consensus protocols like the one used by MongoDB to pick the primary replica actually is:
We use a consensus protocol to pick a primary. Exact details will be spared here but that basic process is:
1.get maxLocalOpOrdinal from each server.
2.if a majority of servers are not up (from this server's POV), remain in Secondary mode and stop.
3.if the last op time seems very old, stop and await human intervention.
4.else, using a consensus protocol, pick the server with the highest maxLocalOpOrdinal as the Primary.

Any server in the replica set, when it fails to reach master, attempts a new election process.
Post a Comment