You are not logged in Log in Join
You are here: Home » Members » Tres Seaver's Zope.org Site » Various Projects » Multiplexing calls in ZEO RPC » wiki » QuorumBasedReplication

Log in
Name

Password

 
 

History for QuorumBasedReplication

??changed:
-
Assumption

  We want to have multiple, cooperating ZEOStorageServer(s), with clients
  able to initiate writes at each one.

Protocol

 1. The ZEOClientApplication begins a transaction with its
    ZEOClientStorage, calling 'tpc_begin'.  We will refer to the
    ZEOStorageServer invoked here as the ReplicationCoordinator, and
    to the other ZEOStorageServer(s) as ReplicationPeer(s).

 2. The ZEOClientApplication then issues multiple 'store' messages
    to its storage (one for each object modified by the transaction).
    For each 'store', the ReplicationCoordinator solicits a vote from
    each ReplicationPeer, passing the server ID, the object ID, and the
    new GenerationNumber.

 3. Each ReplicationPeer votes, replying:

  * "Abort" if the new GenerationNumber <= its own
     CommittedGenerationNumber.

  * "Yes" if it has no current TentativeGenerationNumber and if the
    new GenerationNumber > its own CommittedGenerationNumber.

    o the peer copies the supplied GenerationNumber as its own
      TentativeGenerationNumber)

  * "Yes" if it has a current TentativeGenerationNumber and if the
    new GenerationNumber > its own TentativeGenerationNumber.

    o the peer copies the supplied GenerationNumber as its own
      TentativeGenerationNumber)

  * "No" if it has a current TentativeGenerationNumber and if the
    new GenerationNumber <= its own TentativeGenerationNumber.

 4. The ReplicationCoordinator tallies these votes:

  * Upon receiving any "Abort" reply:

    - broadcasts an "Abort Tentative" message to all peers, supplying
      its server ID, the object ID, and the now-cancelled
      TentativeGenerationNumber;

    - clears its own TentativeGenerationNumber;

    - enqueues a read request for that object from the replying peer;

    - raises a ConflictError.

  * Upon failing to achieve a majority, either through the receipt of
    explicit "No" votes or through timeout:

    - broadcasts an "Abort Tentative" message to all peers, supplying
      its server ID, the object ID, and the now-cancelled
      TentativeGenerationNumber;

    - clears its own TentativeGenerationNumber;

    - raises a ConflictError.

  * Upon achieving a majority of "Yes" votes, returns normally to the
    client.

 5. After succesfully completing all 'store' requests, the client invokes
    'tpc_vote' on the ReplicationCoordinator, which then marks all stored
    objects as committed (i.e., TentativeGenerationNumber ->
    CommittedGenerationNumber) and enqueues updates for each object.

    Alternate scenario -- the client chooses to abort the transaction,
      invoking 'tpc_abort';  the ReplicationCoordinator then broadcasts
      "Abort Tentative" messages for each object, clearing each
      TentativeGenerationNumber.

 6. As each ReplicationPeer receives "Update Object" messages, it:

  * updates its CommittedGenerationNumber, and the object itself, IFF
    the new GenerationNumber > its existing CommittedGenerationNumber.

  * clears its TentativeGenerationNumber, IFF the new GenerationNumber
    > its TentativeGenerationNumber.