redbookcover.gif (13955 bytes) Readings in Database Systems, 3rd Edition

Stonebraker & Hellerstein, eds.


Concurrency Control 1: Locking and Degrees of Consistency

Transaction Refresher

Statement of problem:

  • Database: a fixed set of named resources (e.g. tuples, pages, files, whatever)
  • Consistency Constraints: must be true for DB to be considered "consistent". Examples:
    • sum(account balances) = sum(assets)
    • P is index pages for R
    • ACCT-BAL > 0
    • each employee has a valid department
  • Transaction: a sequence of actions bracketed by begin and end statements. Each transaction ("xact") is assumed to maintain consistency (this can be guaranteed by the system).
  • Goal: Concurrent execution of transactions, with high throughput/utilization, low response time, and fairness.

 

The ACID test for transaction management:

  • Atomicity: Either all actions within a transaction occur, or none do ("all or nothing at all")
  • Consistency: Each transaction takes the database from one consistent state to another. No intermediate states are observable. (We will refine this definition today)
  • Isolation: Events within a transaction must be invisible to other transactions. This allows transactions to be aborted in isolation (i.e. avoids cascaded aborts)
  • Durability: Once a transaction is committed, its results must be preserved even in the case of failures.

 

C & I guaranteed by concurrency control. A & D guaranteed by recovery.

 

A transaction schedule:

T0 T1
read(A)
A = A - 50
write(A)
read(A)
temp = A*0.1
A = A - temp
write(A)
read(B)
B = B + 50
write(B)
read(B)
B = B + temp
write(B)

 

The system "understands" only reads and writes; cannot assume any semantics of other operations.

Arbitrary interleaving can lead to:

  • temporary inconsistencies (ok, unavoidable)
  • "permanent" inconsistencies, that is, inconsistencies that remain after transactions have completed.

 

Some definitions:

  • Schedule: A "history" or "audit trail" of all actions in the system, the xacts that performed them, and the objects they affected.
  • Serial Schedule: A schedule is serial if all the actions of each single xact appear together.
  • Equivalence of schedules: Two schedules S1, S2 are considered computationally equivalent if:
  1. The set of transactions that participate in S1 and S2 are the same.
  2. For each data item Q in S1, if transaction Ti executes read(Q) and the value of Q read by Ti was written by Tj, then the same will hold in S2. [reads are all the same]
  3. For each data item Q in S1, if transaction Ti executes the last write(Q) instruction, then the same holds in S2. [the same writers "win"]
  • Serializability: A schedule S is serializable if there exists a serial schedule S’ such that S and S’ are computationally equivalent.

 

One way to think about concurrency control – in terms of dependencies:

  1. T1 reads N ... T2 writes N: a RW dependency
  2. T1 writes N ... T2 reads N: a WR dependency
  3. T1 writes N ... T2 writes N: a WW dependency

Can construct a "serialization graph" for a schedule S (SG(S)):

  • nodes are transactions T1, ..., Tn
  • Edges: Ti -> Tj if there is a RW, WR, or WW dependency from Ti to Tj

Theorem: A schedule S is serializable iff SG(S) is acyclic.
 

Locking

A technique to ensure serializability, but hopefully preserve high concurrency as well. The winner in industry.
Basics:

  • A "lock manager" records what entities are locked, by whom, and in what "mode". Also maintains wait queues.
  • A well-formed transaction locks entities before using them, and unlocks them some time later.

Multiple lock modes: Some data items can be shared, so not all locks need to be exclusive.
Lock compatibility table 1:
Assume two lock modes: shared (S) and exclusive (X) locks.

  S X
S T F
X F F

If you request a lock in a mode incompatible with an existing lock, you must wait. 

Two-Phase Locking (2PL):

  • Growing Phase: A transaction may obtain locks but not release any lock.
  • Shrinking Phase: A transaction may release locks, but not obtain any new lock. (in fact, locks are usually all released at once to avoid "cascading aborts".)

Theorem: If all xacts are well-formed and follow 2PL, then any resulting schedule is serializable
(note: this is if, not if and only if!)
 

Gray, et al.: Granularity of Locks

Theme: Correctness and performance

  • Granularity tradeoff: small granularity (e.g. field of a tuple) means high concurrency but high overhead. Large granularity (e.g. file) means low overhead but low concurrency.
  • Possible granularities:
    • DB
    • Areas
    • Files
    • Pages
    • Tuples (records)
    • fields of tuples
  • Want hierarchical locking, to allow "large" xacts to set large locks, "small" xacts to set small locks
  • Problem: T1 S-locks a record in a file, then T2 X-locks the whole file. How can T2 discover that T1 has locked the record?
  • Solution: "Intention" locks
      NL IS IX S SIX X
    NL            
    IS            
    IX            
    S            
    SIX            
    X            

 

    • IS and IX locks
    • T1 obtains S lock on record in question, but first gets IS lock on file.
    • Now T2 cannot get X lock on file
    • However, T3 can get IS or S lock on file (the reason for distinguishing IS and IX: if there were only I, T3 couldn’t get an S lock on file)
    • For higher concurrency, one more mode: SIX. Intuitively, you read all of the object but only lock some subparts. Allows concurrent IS locks (IX alone would not). Note: gives S access, so disallows IX to others.
    • requires that xacts lock items root to leaf in the hierarchy, unlock leaf to root
    • generalization to DAG of resources: X locks all paths to a node, S locks at least one.

Some implementation background (not in paper):

  • maintain a lock table as hashed main-mem structure
  • lock/unlock must be atomic operations (protected by critical section)
  • typically costs several hundred instructions to lock/unlock an item
  • suppose T1 has an S lock on P, T2 is waiting to get X lock on P, and now T3 wants S lock on P. Do we grant T3 an S lock?

No! (starvation, unfair, etc.) So...

    • Manage FCFS queue for each locked object with outstanding requests
    • all xacts that are adjacent and compatible are a compatible group
    • The front group is the granted group
    • group mode is most restrictive mode amongst group members
  • Conversions: often want to convert (e.g. S to X for "test and modify" actions). Should conversions go to back of queue?
  • No! Instant deadlock (more notes on deadlock later). So put conversions right after granted group.

Gray, et al.: Degrees of Consistency

First, a definition: A write is committed when transaction if finished; otherwise, the write is dirty.

A Locking-Based Description of Degrees of Consistency:

This is not actually a description of the degrees, but rather of how to achieve them. But it’s easier to understand (?)

  • Degree 0: set short write locks on updated items
  • Degree 1: set long write locks on updated items ("long" = EOT)
  • Degree 2: set long write locks on updated items, and short read locks on items read
  • Degree 3: set long write and read locks

A Dirty-Data Description of Degrees of Consistency

Transaction T sees degree X consistency if...

  • Degree 0: T does not overwrite dirty data of other transactions
  • Degree 1:
    1. T sees degree 0 consistency, and
    2. T does not commit any writes before EOT
  • Degree 2:
    1. T sees degree 1 consistency, and
    2. T does not read dirty data of other transactions
  • Degree 3:
    1. T sees degree 2 consistency, and
    2. Other transactions do not dirty any data read by T before T completes.

Examples of Inconsistencies prevented by Various Degrees

  • Garbage reads:
  • T1: write(X); T2: write(X)

    Who knows what value X will end up being?

    Solution: set short write locks (degree 0)

  • Lost Updates:
  • T1: write(X)

    T2: write(X)

    T1: abort (restores X to pre-T1 value)

    At this point, the update to T2 is lost (note: log contains T1.X.oldval, newval)

    Solution: set long write locks (degree 1)

  • Dirty Reads:
  • T1: write(X)

    T2: read(X)

    T1: abort

    Now T2’s read is bogus.

    Solution: set long X locks and short S locks (degree 2)

    Many systems do long-running queries at degree 2.

  • Unrepeatable reads:
  • T1: read(X)

    T2: write(X)

    T2: end transaction

    T1: read(X)

    Now T2 has read two different values for X.

    Solution: long read locks (degree 3)

  • Phantoms:
  • T1: read range [x - y]

    T2: insert z, x < z < y

    T2: end transaction

    T1: read range [x - y]

    Z is a "phantom" data item (eek!)

    Solution: ??

    NOTE: two-phase well-formed implies degree-3 consistency. (Why?)
     

Further reading on consistency levels: Berenson, et al., SIGMOD 1995.

Notes on Deadlock

  • In OS world, deadlock usually due to errors or overloads
  • In DB/xact world with 2PL, they’re inherent.
  • Most common causes:
    • Differing access orders

      T1: X-lock P

      T2: X-lock Q

      T1: X-lock Q // block waiting for T2

        T2: X-lock P // block waiting for T1

    • lock-mode upgrades

      T1: S-lock P

      T2: S-lock P

      T1: convert S-lock on P to X-lock // block

      T2: convert S-lock on P to X-lock // block

  • Usual DB solution: deadlock detection (other option: deadlock avoidance)
    • Use "waits-for" graph and look for cycles
    • Empirically, in actual systems the waits-for graph shows:
      • cycles fairly rare
      • cycle length usually 2, sometimes 3, virtually never >3
      • use DFS to find cycles
    • When to look for cycles?
        1. whenever a xact blocks
        2. periodically
        3. never (use timeouts)

Typically, in centralized systems, run deadlock detection whenever blocking occurs.   Arguably this is cheap: most recently blocked transaction T must be the one that caused the deadlock, so just DFS starting from T

In distributed systems, even this can be expensive, so often do periodic detection (more later)

  • Who to restart ("victim selection")
    1. current blocker
    2. youngest XACT
    3. least resources used
    4. fewest locks held (common)
    5. fewest number of restarts
 

© 1998, Joseph M. Hellerstein.  Last modified 08/18/98.
Feedback welcomed.