Readings in Database Systems, 3rd Edition

Stonebraker & Hellerstein, eds.

Concurrency Control 1: Locking and Degrees of Consistency

Transaction Refresher

Statement of problem:

Database: a fixed set of named resources (e.g. tuples, pages, files, whatever)
Consistency Constraints: must be true for DB to be considered "consistent". Examples:

sum(account balances) = sum(assets)
P is index pages for R
ACCT-BAL > 0
each employee has a valid department

Transaction: a sequence of actions bracketed by begin and end statements. Each transaction ("xact") is assumed to maintain consistency (this can be guaranteed by the system).
Goal: Concurrent execution of transactions, with high throughput/utilization, low response time, and fairness.

The ACID test for transaction management:

Atomicity: Either all actions within a transaction occur, or none do ("all or nothing at all")
Consistency: Each transaction takes the database from one consistent state to another. No intermediate states are observable. (We will refine this definition today)
Isolation: Events within a transaction must be invisible to other transactions. This allows transactions to be aborted in isolation (i.e. avoids cascaded aborts)
Durability: Once a transaction is committed, its results must be preserved even in the case of failures.

C & I guaranteed by concurrency control. A & D guaranteed by recovery.

A transaction schedule:

T0	T1
read(A)
A = A - 50
write(A)
	read(A)
	temp = A*0.1
	A = A - temp
	write(A)
read(B)
B = B + 50
write(B)
	read(B)
	B = B + temp
	write(B)

The system "understands" only reads and writes; cannot assume any semantics of other operations.

Arbitrary interleaving can lead to:

temporary inconsistencies (ok, unavoidable)
"permanent" inconsistencies, that is, inconsistencies that remain after transactions have completed.

Some definitions:

Schedule: A "history" or "audit trail" of all actions in the system, the xacts that performed them, and the objects they affected.
Serial Schedule: A schedule is serial if all the actions of each single xact appear together.
Equivalence of schedules: Two schedules S1, S2 are considered computationally equivalent if:

The set of transactions that participate in S1 and S2 are the same.

For each data item Q in S1, if transaction Ti executes read(Q) and the value of Q read by Ti was written by Tj, then the same will hold in S2. [reads are all the same]

For each data item Q in S1, if transaction Ti executes the last write(Q) instruction, then the same holds in S2. [the same writers "win"]

Serializability: A schedule S is serializable if there exists a serial schedule S’ such that S and S’ are computationally equivalent.

One way to think about concurrency control – in terms of dependencies:

T1 reads N ... T2 writes N: a RW dependency
T1 writes N ... T2 reads N: a WR dependency
T1 writes N ... T2 writes N: a WW dependency

Can construct a "serialization graph" for a schedule S (SG(S)):

nodes are transactions T1, ..., Tn
Edges: Ti -> Tj if there is a RW, WR, or WW dependency from Ti to Tj

Theorem: A schedule S is serializable iff SG(S) is acyclic.

Locking

A technique to ensure serializability, but hopefully preserve high concurrency as well. The winner in industry.
Basics:

A "lock manager" records what entities are locked, by whom, and in what "mode". Also maintains wait queues.
A well-formed transaction locks entities before using them, and unlocks them some time later.

Multiple lock modes: Some data items can be shared, so not all locks need to be exclusive.
Lock compatibility table 1:
Assume two lock modes: shared (S) and exclusive (X) locks.

	S	X
S	T	F
X	F	F

If you request a lock in a mode incompatible with an existing lock, you must wait.

Two-Phase Locking (2PL):

Growing Phase: A transaction may obtain locks but not release any lock.
Shrinking Phase: A transaction may release locks, but not obtain any new lock. (in fact, locks are usually all released at once to avoid "cascading aborts".)

Theorem: If all xacts are well-formed and follow 2PL, then any resulting schedule is serializable
(note: this is if, not if and only if!)

Gray, et al.: Granularity of Locks

Theme: Correctness and performance

Granularity tradeoff: small granularity (e.g. field of a tuple) means high concurrency but high overhead. Large granularity (e.g. file) means low overhead but low concurrency.
Possible granularities:

DB
Areas
Files
Pages
Tuples (records)
fields of tuples

Want hierarchical locking, to allow "large" xacts to set large locks, "small" xacts to set small locks
Problem: T1 S-locks a record in a file, then T2 X-locks the whole file. How can T2 discover that T1 has locked the record?

Solution: "Intention" locks

	NL	IS	IX	S	SIX	X
NL
IS
IX
S
SIX
X

IS and IX locks
T1 obtains S lock on record in question, but first gets IS lock on file.
Now T2 cannot get X lock on file
However, T3 can get IS or S lock on file (the reason for distinguishing IS and IX: if there were only I, T3 couldn’t get an S lock on file)
For higher concurrency, one more mode: SIX. Intuitively, you read all of the object but only lock some subparts. Allows concurrent IS locks (IX alone would not). Note: gives S access, so disallows IX to others.
requires that xacts lock items root to leaf in the hierarchy, unlock leaf to root
generalization to DAG of resources: X locks all paths to a node, S locks at least one.

Some implementation background (not in paper):

maintain a lock table as hashed main-mem structure
lock/unlock must be atomic operations (protected by critical section)
typically costs several hundred instructions to lock/unlock an item
suppose T1 has an S lock on P, T2 is waiting to get X lock on P, and now T3 wants S lock on P. Do we grant T3 an S lock?

No! (starvation, unfair, etc.) So...

Manage FCFS queue for each locked object with outstanding requests
all xacts that are adjacent and compatible are a compatible group
The front group is the granted group
group mode is most restrictive mode amongst group members

Conversions: often want to convert (e.g. S to X for "test and modify" actions). Should conversions go to back of queue?
No! Instant deadlock (more notes on deadlock later). So put conversions right after granted group.

Gray, et al.: Degrees of Consistency

First, a definition: A write is committed when transaction if finished; otherwise, the write is dirty.

A Locking-Based Description of Degrees of Consistency:

This is not actually a description of the degrees, but rather of how to achieve them. But it’s easier to understand (?)

Degree 0: set short write locks on updated items
Degree 1: set long write locks on updated items ("long" = EOT)
Degree 2: set long write locks on updated items, and short read locks on items read
Degree 3: set long write and read locks

A Dirty-Data Description of Degrees of Consistency

Transaction T sees degree X consistency if...

Degree 0: T does not overwrite dirty data of other transactions
Degree 1:

T sees degree 0 consistency, and
T does not commit any writes before EOT

Degree 2:

T sees degree 1 consistency, and
T does not read dirty data of other transactions

Degree 3:

T sees degree 2 consistency, and
Other transactions do not dirty any data read by T before T completes.

Examples of Inconsistencies prevented by Various Degrees

Garbage reads:

T1: write(X); T2: write(X)

Who knows what value X will end up being?

Solution: set short write locks (degree 0)

Lost Updates:

T1: write(X)

T2: write(X)

T1: abort (restores X to pre-T1 value)

At this point, the update to T2 is lost (note: log contains T1.X.oldval, newval)

Solution: set long write locks (degree 1)

Dirty Reads:

T1: write(X)

T2: read(X)

T1: abort

Now T2’s read is bogus.

Solution: set long X locks and short S locks (degree 2)

Many systems do long-running queries at degree 2.

Unrepeatable reads:

T1: read(X)

T2: write(X)

T2: end transaction

T1: read(X)

Now T2 has read two different values for X.

Solution: long read locks (degree 3)

Phantoms:

T1: read range [x - y]

T2: insert z, x < z < y

T2: end transaction

T1: read range [x - y]

Z is a "phantom" data item (eek!)

Solution: ??

NOTE: two-phase well-formed implies degree-3 consistency. (Why?)

Further reading on consistency levels: Berenson, et al., SIGMOD 1995.

Notes on Deadlock

In OS world, deadlock usually due to errors or overloads
In DB/xact world with 2PL, they’re inherent.
Most common causes:

Differing access orders

T1: X-lock P

T2: X-lock Q

T1: X-lock Q // block waiting for T2

T2: X-lock P // block waiting for T1

lock-mode upgrades

T1: S-lock P

T2: S-lock P

T1: convert S-lock on P to X-lock // block

T2: convert S-lock on P to X-lock // block

Usual DB solution: deadlock detection (other option: deadlock avoidance)

Use "waits-for" graph and look for cycles
Empirically, in actual systems the waits-for graph shows:

cycles fairly rare
cycle length usually 2, sometimes 3, virtually never >3
use DFS to find cycles

When to look for cycles?

whenever a xact blocks
periodically
never (use timeouts)

Typically, in centralized systems, run deadlock detection whenever blocking occurs. Arguably this is cheap: most recently blocked transaction T must be the one that caused the deadlock, so just DFS starting from T

In distributed systems, even this can be expensive, so often do periodic detection (more later)

Who to restart ("victim selection")

current blocker
youngest XACT
least resources used
fewest locks held (common)
fewest number of restarts

� 1998, Joseph M. Hellerstein. Last modified 08/18/98.
Feedback welcomed.