Buffer Management

DBMIN (Chou & DeWitt)

Theme: There are just a few basic access patterns in a query processing system. Make straightforward observations about locality behavior of these access patterns, and expose them to your buffer manager.

Old Stuff:

Background: blocks, frames, pages, "pinning"
Stochastic OS replacement policies: LRU, MRU, FIFO, LIFO, Clock, etc. None is uniformly appropriate for typical DBMS access patterns (see Stonebraker’s "OS Dump").
Domain Separation: Split up work/memory into categories, and do LRU within your category. If there are no pages in your chunk of memory, borrow from somewhere else. Example domains: a non-leaf level of a B+-tree. 8-10% improvement over global LRU(?)

domains are static
pages belong to partitions regardless of how they’re being used (e.g. no distinction between heap file page for sequential scan and for nested loop)
does not prevent over-utilization of memory by multiple users, since there’s no notion of "users" or "processes"
needs an orthogonal load-control facility

Group LRU: Like DS, but prioritize domains. Free buffers are stolen in order of priority (low à high)

optimization: adjust sizes of domains using a "working-set" judgment (i.e. pages in last T_i refs are kept for domain i)
Effelsberg & Haerder: no convincing evidence that any of this works better than LRU or Clock.

The "New" Algorithm: Modification of INGRES.

Two observations:

priority not a property of a page, but of a relation
each relation needs a working set

Note: this is a query-based intuition! Anticipates DBMIN.
Subdivide memory into a chunk per relation, and prioritize chunks.
Assign an empty resident set per relation.
Use MRU per relation, but each relation gets one buffer at all times.
Heuristics available to reprioritize chunks.
Simulation study looked good, but implementation in INGRES didn’t beat LRU.

Hot Set

Also uses query-based info
A set of pages which are accessed over and over form a "hot set".

If you graph buffer size against # of page faults, you see "hot points".

If your hot set fits in memory, you win! Otherwise you lose.
Example: NL-join, the hot set is |inner| + 1
Don’t let a query into the system unless its hot set fits in memory. Each query is given its hot set worth of buffers.
The idea assumes LRU is going on. But MRU is better for looping access, and makes the "hot point" go away
Using LRU over-allocates (i.e. under-utilizes) memory, since the "hot point" analysis can be fixed with MRU.

DBMIN

Based on the Query Locality Set Model (QLSM), which characterizes DBMS reference patterns in 3 ways:

Sequential: Straight Sequential (SS) & Clustered Sequential (CS)
Random: Independent Random (IR) and Clustered Random (CR)
Hierarchical: Straight Hierarchical (SH), Hierarchical with Straight Sequential (H/SS), Hierarchical with Clustered Sequential (H/CS), Looping Hierarchical (LH)

Questions: which query processing operators correspond to each category? Do the categories cover all the operators?

The DBMIN Algorithm:

associate a chunk of memory with each "file instance" (more like each table in the FROM clause). This is called the file instance’s locality set.
estimate max locality set sizes via looking at query plan & database stats. A query is allowed to run if the sum of its locality sets fits in free buffers.
a global page table and global free list is kept in addition to locality sets
on page request

if page in global table & the locality set, just update usage stats of page
else if page in memory but not in LocSet, grab page, and if not in another LocSet put it in our LocSet
else read page into LocSet (using a page from global free list)
If locality set gets bigger than max needed, choose a page to toss according to a LocSet-specific policy (to be discussed next)

Locality Set size & replacement policies for different reference patterns:

SS: LocSet size = 1. Replace that page as soon as needed.
CS: LocSet size = (#tuples in largest cluster)/(# of tuples per page). FIFO or LRU replacement work.
LS: LocSet size = size of relation. MRU is best.
IR: odds of revisit are low, so LocSet either 1, or the magic number b from the "Yao" formula. Residual value r = (k - b)/b of a page can be used to choose between 1 and b (i.e. k is number of accesses, b is # of pages that will be referenced, so this is # of revisits over # of pages). Replacement policy?
CR: Just like CS, but pages are not packed onto blocks. So it’s just # of tuples in largest cluster.
SH, H/SS: like SS
H/CS: like CS, but replace tuples with (key,ptr) pairs
LH: at each level h of an index, you have random access among pages. Use Yao to figure out how many pages you’ll access at each level in k lookups. LocSet is sum of these over all levels that you choose to worry about (maybe only the root!) LIFO with a few (4-5) buffers probably an OK replacement strategy.

A Detailed Simulation Study (Welcome to Wisconsin!)

trace-based & distribution-based: traces used to model individual queries, but workload synthesized based on different distributions. Traces done on a popular benchmark database (from the Wisconsin Benchmark). Queries of 1 and 2 tables.
simulator models CPU, one I/O device, and RAM access. Simulation tuned to micro-benchmark of WiSS. Performance metric is query throughput.
3 levels of sharing modeled: full, half, and no sharing
Disk arm was jostled around randomly.
Memory set big enough to hold about 8 concurrent working sets.
statistical confidence intervals on validity of results guarantee things within 5% of the mean (how often do you see that in CS performance studies these days?! Why not?)
comparison of RAND, FIFO, CLOCK, HOT, Working Set, DBMIN
Bottom line:

as expected, DBMIN is the top line on every graph
Stochastic techniques are no good, though feedback-based load control helps

Later work on such load control shows it's VERY tricky in mixed workloads.

WS not a big winner either, though a popular OS choice
DBMIN beats HOT by 7-13% more throughput

Later work: LRU-K, by O'Neil & O'Neil.