Try-Confirm-Cancel (TCC) Protocol
TCC in Distributed Transactions: Bringing Order to Chaos
Intro
In the world of distributed transactions, there are many ways to get transactions working. What is important is that we find a way to make sure they can succeed or fail as a unit. If multiple transactions are going to execute on different databases and commit as one logical unit, it is important that we find a way to ensure they either all commit or all fail. We can do this in the world of distributed transactions by using something called the T-C-C protocol. This stands for Try, Confirm, and Cancel. It’s not hard to understand once you see it in action.
What is a Distributed Transaction?
A distributed transaction is a transaction that spans across databases and systems. Distributed transactions are useful when we need to perform a set of operations that span across multiple systems. In a distributed transaction, we can either commit or roll-back the transaction across all systems. A distributed transaction can involve multiple participants. This means that when a transaction is committed or rolled back, the information is available to all participants. Participants in a distributed transaction can communicate with one another to make sure the distributed transaction is carried out as expected.
The T-C-C Protocol
When using the Try-Confirm-Cancel protocol, a transaction is started in one of three ways:
- We first try to execute the transaction.
- We then confirm that the transaction completed, and;
- we cancel the transaction if it fails.
If we’re using pessimistic locking, we can also have to wait for the locks to clear before executing the transaction.
TRY Phase (Command) in Distributed Transactions
The TRY phase puts a service in pending state. For example, a TRY request in a flight booking service will reserve a seat for the customer and insert a customer reservation record with reserved state into database. If any service fails to make a reservation or times out, the coordinator will send a cancel request in the next phase.
The TRY phase is the first phase. When a try command is issued and it fails, then the transaction fails and depending on the logic, a CANCEL (Idempotent) command may be issued. The TRY command validates that the distributed transaction will likely succeed. The distributed transaction will be committed only if the try command succeeds. The distributed transaction will be rolled back if the try command fails.
CONFIRM Phase (Command) in Distributed Transactions
The CONFIRM phase moves the service to confirmed state. A confirm request will confirm that a seat is booked for a customer in a flight booking system and he or she will be billed. A customer reservation record in database will be updated to confirmed state. If any service fails to confirm or times out, the coordinator will either retry confirmation until success or involve manual intervention after it retries a certain number of times.
The CONFIRM phase is the second phase. The confirm command validates the transaction to have succeeded. The transaction has been committed only if the confirmation succeeds. The distributed transaction will be rolled back if the confirmation fails.
CANCEL Phase (Command) in Distributed Transactions
The cancel command is used to invalidate or reverse a transaction. A distributed transaction can be cancelled at any time. CANCEL is a reverse of the initial action (Try). The transaction manager will rollback the transaction when the CANCEL command is used. It will carry out each of the cancel operations (Cancel) specified by the preliminary operations (Try) one at a time and discard all of the completed items from the preliminary operations (Try).
A distributed transaction might fail due to various reasons, such as network failure, power, error etc. When we cancel a transaction, it’s like we’re reversing the distributed transaction. The distributed transaction is reverted back to its original status before the distributed transaction was performed. However, distributed transactions are asynchronous, so it’s very likely that the distributed transaction has already been committed when we cancel the distributed transaction.
Problems with the TCC protocol
Even though the TCC protocol has some benefits, it also has some drawbacks.
- First, it's slow. This is because you have to wait until the transaction is complete before you can do anything else.
- If you use pessimistic locking, then you may also have to wait to gain the locks. This can slow down your distributed transactions.
- Additionally, it is more complicated to implement. Since you have to make sure that all of your transactions are set up to use this protocol, it may not be ideal for all situations.
Conclusion
Distributed transactions are useful when we perform operations that span across multiple systems. A distributed transaction is like a single transaction that spans multiple databases and systems. The TCC protocol is useful when the transactions are distributed between multiple services. The TCC protocol ensures that the transaction either all commits or all fails. We first try to execute the transaction. We then confirm that the transaction is completed and finally we cancel the transaction if it fails. Remember that distributed transactions are mostly asynchronous and using the TCC commands will help you understand the state of the distributed transaction better.