Flavours of Consensus: Sync, Async and Byzantine

Consensus is hard/impossible in general. To understand the source of this difficulty, it’s instructive to look at the three flavours of consensus: Synchronous, Asynchronous and Byzantine.

Synchronous setup is one where messages have bounded delays — which also implies no message losses because “message loss” is equivalent to “message with infinite delay.” The only bad thing that can happen is “failures” and that too of the crash-stop kind (which is also indirectly a consequence of finite message delay — because if message delay is not bounded, there is no way to distinguish a failed node from one whose messages are delayed/lost).

In Synchronous Consensus (SC), bounded delays allow us to design “round”-based protocols, where each round lasts longer than the message delay bound. Given N nodes and up to F failures, consensus can be achieved in no more than F+1 rounds.

Asynchronous Consensus (AC) is the setup where you allow messages to have unbounded delays/losses in addition to node failures. And further, the Byzantine setting is where you allow “malicious failures” like nodes lying or “equivocating” (e.g. tell node A the value is ‘0’, but tell node B the value is ‘1’) — SC/AC need not worry about such mischiefs.

In terms of difficulty of achieving consensus:

Synchronous (SC) ≪ Asynchronous (AC) < Byzantine

So the real difficulty of consensus arises from unbounded message delays. AC protocols like Paxos/Raft attempt to be round-based like SC. But due to unbounded delays they have to contend with many possible interleavings of messages, especially stale messages reaching late. The famous FLP theorem states that in an asynchronous setting any consensus protocol cannot be guaranteed to end — i.e. liveness can’t be guaranteed. (One should remember however that FLP only applies to “deterministic” protocols.)

Beyond the Canonical Settings

There are many interesting intermediate settings. For instance:

  1. What if we have bounded delays but have message losses? i.e. if a message doesn’t reach within a bound it is safe to assume it is lost. One can achieve this using timestamps, if the system affords one.
  2. Or bounded delays without losses but with node failures? i.e. if a message is not received then we can safely assume the node to have failed, simplifying failure detection. (But if a message is received, that does not mean the node is still alive — it could have sent the message and then failed.)
  3. Or bounded delays, but node failures that are NOT crash-stop? That is, a machine can resurrect. This can usually be solved using version numbers for nodes if the node has non-volatile memory. (If not, it’s a harder problem with churn in membership.)

There can be settings harder than the traditional Byzantine one, Bitcoin being the most prominent/practical example. You can have malicious nodes, changing cluster memberships, message loss/delays and every possible “bad” thing one can think of. And yet consensus here is solvable (with very high probability) — which is mind-blowing.