redbookcover.gif (13955 bytes) Readings in Database Systems, 3rd Edition

Stonebraker & Hellerstein, eds.

Buffer Management

DBMIN (Chou & DeWitt)

Theme: There are just a few basic access patterns in a query processing system. Make straightforward observations about locality behavior of these access patterns, and expose them to your buffer manager.

Old Stuff:

  • Background: blocks, frames, pages, "pinning"
  • Stochastic OS replacement policies: LRU, MRU, FIFO, LIFO, Clock, etc. None is uniformly appropriate for typical DBMS access patterns (see Stonebraker’s "OS Dump").
  • Domain Separation: Split up work/memory into categories, and do LRU within your category. If there are no pages in your chunk of memory, borrow from somewhere else. Example domains: a non-leaf level of a B+-tree. 8-10% improvement over global LRU(?)
    • domains are static
    • pages belong to partitions regardless of how they’re being used (e.g. no distinction between heap file page for sequential scan and for nested loop)
    • does not prevent over-utilization of memory by multiple users, since there’s no notion of "users" or "processes"
    • needs an orthogonal load-control facility
  • Group LRU: Like DS, but prioritize domains. Free buffers are stolen in order of priority (low high)
    • optimization: adjust sizes of domains using a "working-set" judgment (i.e. pages in last Ti refs are kept for domain i)
    • Effelsberg & Haerder: no convincing evidence that any of this works better than LRU or Clock.
  • The "New" Algorithm: Modification of INGRES.
    • Two observations:
      1. priority not a property of a page, but of a relation
      2. each relation needs a working set
    • Note: this is a query-based intuition!  Anticipates DBMIN.
    • Subdivide memory into a chunk per relation, and prioritize chunks.
    • Assign an empty resident set per relation.
    • Use MRU per relation, but each relation gets one buffer at all times.
    • Heuristics available to reprioritize chunks.
    • Simulation study looked good, but implementation in INGRES didn’t beat LRU.
  • Hot Set
    • Also uses query-based info
    • A set of pages which are accessed over and over form a "hot set".
      • If you graph buffer size against # of page faults, you see "hot points".
    • If your hot set fits in memory, you win! Otherwise you lose.
    • Example: NL-join, the hot set is |inner| + 1
    • Don’t let a query into the system unless its hot set fits in memory. Each query is given its hot set worth of buffers.
    • The idea assumes LRU is going on. But MRU is better for looping access, and makes the "hot point" go away
    • Using LRU over-allocates (i.e. under-utilizes) memory, since the "hot point" analysis can be fixed with MRU.


Based on the Query Locality Set Model (QLSM), which characterizes DBMS reference patterns in 3 ways:

  • Sequential: Straight Sequential (SS) & Clustered Sequential (CS)
  • Random: Independent Random (IR) and Clustered Random (CR)
  • Hierarchical: Straight Hierarchical (SH), Hierarchical with Straight Sequential (H/SS), Hierarchical with Clustered Sequential (H/CS), Looping Hierarchical (LH)


Questions: which query processing operators correspond to each category? Do the categories cover all the operators?


The DBMIN Algorithm:

  • associate a chunk of memory with each "file instance" (more like each table in the FROM clause). This is called the file instance’s locality set.
  • estimate max locality set sizes via looking at query plan & database stats. A query is allowed to run if the sum of its locality sets fits in free buffers.
  • a global page table and global free list is kept in addition to locality sets
  • on page request
    • if page in global table & the locality set, just update usage stats of page
    • else if page in memory but not in LocSet, grab page, and if not in another LocSet put it in our LocSet
    • else read page into LocSet (using a page from global free list)
    • If locality set gets bigger than max needed, choose a page to toss according to a LocSet-specific policy (to be discussed next)


Locality Set size & replacement policies for different reference patterns:

  • SS: LocSet size = 1. Replace that page as soon as needed.
  • CS: LocSet size = (#tuples in largest cluster)/(# of tuples per page). FIFO or LRU replacement work.
  • LS: LocSet size = size of relation. MRU is best.
  • IR: odds of revisit are low, so LocSet either 1, or the magic number b from the "Yao" formula. Residual value r = (k - b)/b of a page can be used to choose between 1 and b (i.e. k is number of accesses, b is # of pages that will be referenced, so this is # of revisits over # of pages). Replacement policy?
  • CR: Just like CS, but pages are not packed onto blocks. So it’s just # of tuples in largest cluster.
  • SH, H/SS: like SS
  • H/CS: like CS, but replace tuples with (key,ptr) pairs
  • LH: at each level h of an index, you have random access among pages. Use Yao to figure out how many pages you’ll access at each level in k lookups. LocSet is sum of these over all levels that you choose to worry about (maybe only the root!) LIFO with a few (4-5) buffers probably an OK replacement strategy.

 A Detailed Simulation Study (Welcome to Wisconsin!)

  • trace-based & distribution-based: traces used to model individual queries, but workload synthesized based on different distributions. Traces done on a popular benchmark database (from the Wisconsin Benchmark). Queries of 1 and 2 tables.
  • simulator models CPU, one I/O device, and RAM access. Simulation tuned to micro-benchmark of WiSS. Performance metric is query throughput.
  • 3 levels of sharing modeled: full, half, and no sharing
  • Disk arm was jostled around randomly.
  • Memory set big enough to hold about 8 concurrent working sets.
  • statistical confidence intervals on validity of results guarantee things within 5% of the mean (how often do you see that in CS performance studies these days?!  Why not?)
  • comparison of RAND, FIFO, CLOCK, HOT, Working Set, DBMIN
  • Bottom line:
    • as expected, DBMIN is the top line on every graph
    • Stochastic techniques are no good, though feedback-based load control helps
      • Later work on such load control shows it's VERY tricky in mixed workloads.
    • WS not a big winner either, though a popular OS choice
    • DBMIN beats HOT by 7-13% more throughput

Later work: LRU-K, by O'Neil & O'Neil.

1998, Joseph M. Hellerstein.  Last modified 08/18/98.
Feedback welcomed.