History for QuorumBasedReplication
??changed:
-
Assumption
We want to have multiple, cooperating ZEOStorageServer(s), with clients
able to initiate writes at each one.
Protocol
1. The ZEOClientApplication begins a transaction with its
ZEOClientStorage, calling 'tpc_begin'. We will refer to the
ZEOStorageServer invoked here as the ReplicationCoordinator, and
to the other ZEOStorageServer(s) as ReplicationPeer(s).
2. The ZEOClientApplication then issues multiple 'store' messages
to its storage (one for each object modified by the transaction).
For each 'store', the ReplicationCoordinator solicits a vote from
each ReplicationPeer, passing the server ID, the object ID, and the
new GenerationNumber.
3. Each ReplicationPeer votes, replying:
* "Abort" if the new GenerationNumber <= its own
CommittedGenerationNumber.
* "Yes" if it has no current TentativeGenerationNumber and if the
new GenerationNumber > its own CommittedGenerationNumber.
o the peer copies the supplied GenerationNumber as its own
TentativeGenerationNumber)
* "Yes" if it has a current TentativeGenerationNumber and if the
new GenerationNumber > its own TentativeGenerationNumber.
o the peer copies the supplied GenerationNumber as its own
TentativeGenerationNumber)
* "No" if it has a current TentativeGenerationNumber and if the
new GenerationNumber <= its own TentativeGenerationNumber.
4. The ReplicationCoordinator tallies these votes:
* Upon receiving any "Abort" reply:
- broadcasts an "Abort Tentative" message to all peers, supplying
its server ID, the object ID, and the now-cancelled
TentativeGenerationNumber;
- clears its own TentativeGenerationNumber;
- enqueues a read request for that object from the replying peer;
- raises a ConflictError.
* Upon failing to achieve a majority, either through the receipt of
explicit "No" votes or through timeout:
- broadcasts an "Abort Tentative" message to all peers, supplying
its server ID, the object ID, and the now-cancelled
TentativeGenerationNumber;
- clears its own TentativeGenerationNumber;
- raises a ConflictError.
* Upon achieving a majority of "Yes" votes, returns normally to the
client.
5. After succesfully completing all 'store' requests, the client invokes
'tpc_vote' on the ReplicationCoordinator, which then marks all stored
objects as committed (i.e., TentativeGenerationNumber ->
CommittedGenerationNumber) and enqueues updates for each object.
Alternate scenario -- the client chooses to abort the transaction,
invoking 'tpc_abort'; the ReplicationCoordinator then broadcasts
"Abort Tentative" messages for each object, clearing each
TentativeGenerationNumber.
6. As each ReplicationPeer receives "Update Object" messages, it:
* updates its CommittedGenerationNumber, and the object itself, IFF
the new GenerationNumber > its existing CommittedGenerationNumber.
* clears its TentativeGenerationNumber, IFF the new GenerationNumber
> its TentativeGenerationNumber.