In last blog, we talk about four kinds of problems when we lack of a reasonable implementation of transaction:
- Lost Update;
- Inconsistent Retrieval;
- Dirty Read;
- Premature Write;
Today, we focus on how we make it.
How
The first tool we can come up with when dealing with concurrent access problem is locking. Besides it, we will also see how optimistic concurrency control
and timestamp order
make it.
Lock
The basic idea of lock
is very simple: when multiple clients want to access same object, only one client will get the lock
and continue, other clients will wait until lock
is free – Serialization
between multiple clients. In the context of Transaction
, lock
is relative complex.
Two Phase Locking
We have introduced the conflict operations between transactions, and in order to solve it we can make transaction serially equivalent (or totally serialization). In details, Serial Equivalence
requires that all of a transaction’s accesses to a particular object be serialized with respect to accesses by other transactions, in other words, all pairs of conflicting operations of two transactions should be executed in the same order.
To ensure this, a transaction is not allowed any new locks after it has released a lock. The first phase of each transaction is a ‘growing phase’, during which new locks are acquired. In the second phase, the locks are released (a ‘shrinking phase’). This is called two-phase locking.
- Timeline if no locking, so in different order:
Timeline Transaction T Transaction U 1 x = read(i); write(i, 10) 2 y = read(j); write(j, 30) 3 write(j, 20) 4 z = read (i) - Timeline if has locking, so prevent this situation:
Timeline Transaction T Transaction U 0.5 gain lock of i
1 x = read(i); write(i, 10) 1.5 gain lock of j
2 y = read(j); write(j, 30) 2.5 try to gain lock of j
, waiting…3.5 try to gain lock of i
, waiting…x deadlock detected, abort one of transaction
Lock implementation
The lock
implementation in Database
is not same as programming language like Java
, which has lock information in the object head.
In Database
, lock is usually implemented by hash table (Map<Object, Lock>
). lock
is not within objects because so many objects (every line of table) may need too much memory/disk, but they may actually need that much lock.
Deadlock
When it comes to deadlock, we have two main aspects to solve it: one is prevent deadlock, another is detection of deadlock and re-assign lock.
- How to prevent: The core of this strategy is to lock all required resources at the beginning of transaction, so we can prevent deadlock in the middle of execution and abort what we have changed; However, it is sometimes impossible to know all the required resources before we begin, especially in interactive scenario.
- How to detect
- Lock graph circle detection
Actually, there exists another very naive and simple solution to solve deadlock – timeout lock, in other words, make lock preemptive. By using this method, no deadlock will happen because finally some lock will timeout and the transaction have to release it and other transaction will go on.
The solution is simple, but also has many defeat. The worst problem is that transactions are sometimes aborted due to their locks becoming vulnerable when other transactions are waiting for them, but there is actually no deadlock. In an overloaded system, the number of transactions timing out will increase, and transactions taking a long time can be penalized. In addition, it is hard to decide on an appropriate length for a timeout.
Increasing concurrency
When using the lock, we can have the correctness of logic, but we lost some performance also. In order to decrease the penalty as more as we can, we should increase concurrency between clients.
- Increase granularity: use as smaller lock as we can, lock as short as we can;
- Use read-write lock: reader share lock and not block each other;
- Use two version lock: an optimistic scheme that allows one transaction to write tentative versions of objects while other transactions read from the committed versions of the same objects.
- Use hierarchic locks: rather than acquire a whole collection of locks, make the abstraction of lock hierarchy and gain the top level if necessary. One of example of it is in bank, when bank going to sum the balance in a branch bank, using hierarchic lock (one branch bank lock rather than many accounts lock) reduce the number locks.
Optimistic concurrency control
The concepts behind the optimistic concurrency control is first try to execute the transaction, then try to submit if no conflict or abort if conflict.
- Working phase: read from tentative version (from the same transaction) or last committed version; write create a new tentative version;
- Validation phase: decide whether there is conflicts according to read-write rules between overlapping transactions as following shows;
- Update phase: tentative version is made permanent if no conflicts;
Tv | Ti | Rule |
---|---|---|
write | read | 1.Ti must not read objects written by Tv. |
read | write | 2.Tv must not read objects written by Ti. |
write | write | 3.Ti must not write objects written by Tv and Tv must not write objects written by Ti. |
Timestamp Order
All operations are ordered by a timestamp (maybe a logic timestamp). Requests from transactions can be totally ordered according to their timestamps. The basic timestamp ordering rule is based on operation conflicts and is very simple:
A transaction’s request to write an object is valid only if that object was last read and written by earlier transactions. A transaction’s request to read an object is valid only if that object was last written by an earlier transaction.
Tentative versions
Tentative version is a very commonly used technique in transactions. It is required not only because the usage of concurrent access, but also to make transaction isolated.
For a server of recoverable objects to participate in transactions, it must be designed so that any updates of objects can be removed if and when a transaction aborts. To make this possible, all of the update operations performed during a transaction are done in tentative versions of objects in volatile memory. Each transaction is provided with its own private set of tentative versions of any objects that it has altered. All the update operations of a transaction store values in the transaction’s own private set. Access operations in a transaction take values from the transaction’s own private set if possible, or failing that, from the objects.
To Be Continued
In this blog, we review how to do concurrency control in transaction, in next blog, we will move to Distributed Transaction
.
Written with StackEdit.
评论
发表评论