DBMIN (Chou & DeWitt)
Theme: There are just a few basic access patterns in a query processing system. Make
straightforward observations about locality behavior of these access patterns, and expose
them to your buffer manager.
- Background: blocks, frames, pages, "pinning"
- Stochastic OS replacement policies: LRU, MRU, FIFO, LIFO, Clock, etc. None is uniformly
appropriate for typical DBMS access patterns (see Stonebrakers "OS Dump").
- Domain Separation: Split up work/memory into categories, and do LRU within your
category. If there are no pages in your chunk of memory, borrow from somewhere else.
Example domains: a non-leaf level of a B+-tree. 8-10% improvement over global LRU(?)
- domains are static
- pages belong to partitions regardless of how theyre being used (e.g. no
distinction between heap file page for sequential scan and for nested loop)
- does not prevent over-utilization of memory by multiple users, since theres no
notion of "users" or "processes"
- needs an orthogonal load-control facility
- Group LRU: Like DS, but prioritize domains. Free buffers are stolen in order of priority
(low à high)
- optimization: adjust sizes of domains using a "working-set" judgment (i.e.
pages in last Ti refs are kept for domain i)
- Effelsberg & Haerder: no convincing evidence that any of this works better than LRU
- The "New" Algorithm: Modification of INGRES.
Note: this is a query-based intuition! Anticipates DBMIN.
Subdivide memory into a chunk per relation, and prioritize chunks.
Assign an empty resident set per relation.
Use MRU per relation, but each relation gets one buffer at all times.
Heuristics available to reprioritize chunks.
Simulation study looked good, but implementation in INGRES didnt beat LRU.
- priority not a property of a page, but of a relation
- each relation needs a working set
- Also uses query-based info
- A set of pages which are accessed over and over form a "hot set".
- If you graph buffer size against # of page faults, you see "hot points".
- If your hot set fits in memory, you win! Otherwise you lose.
- Example: NL-join, the hot set is |inner| + 1
- Dont let a query into the system unless its hot set fits in memory. Each query is
given its hot set worth of buffers.
- The idea assumes LRU is going on. But MRU is better for looping access, and makes the
"hot point" go away
- Using LRU over-allocates (i.e. under-utilizes) memory, since the "hot point"
analysis can be fixed with MRU.
Based on the Query Locality Set Model (QLSM), which characterizes DBMS reference
patterns in 3 ways:
- Sequential: Straight Sequential (SS) & Clustered Sequential (CS)
- Random: Independent Random (IR) and Clustered Random (CR)
- Hierarchical: Straight Hierarchical (SH), Hierarchical with Straight Sequential (H/SS),
Hierarchical with Clustered Sequential (H/CS), Looping Hierarchical (LH)
Questions: which query processing operators correspond to each category? Do the
categories cover all the operators?
The DBMIN Algorithm:
- associate a chunk of memory with each "file instance" (more like each table in
the FROM clause). This is called the file instances locality set.
- estimate max locality set sizes via looking at query plan & database stats. A query
is allowed to run if the sum of its locality sets fits in free buffers.
- a global page table and global free list is kept in addition to locality sets
- on page request
- if page in global table & the locality set, just update usage stats of page
- else if page in memory but not in LocSet, grab page, and if not in another LocSet put it
in our LocSet
- else read page into LocSet (using a page from global free list)
- If locality set gets bigger than max needed, choose a page to toss according to a
LocSet-specific policy (to be discussed next)
Locality Set size & replacement policies for different reference patterns:
- SS: LocSet size = 1. Replace that page as soon as needed.
- CS: LocSet size = (#tuples in largest cluster)/(# of tuples per page). FIFO or LRU
- LS: LocSet size = size of relation. MRU is best.
- IR: odds of revisit are low, so LocSet either 1, or the magic number b from the
"Yao" formula. Residual value r = (k - b)/b of a page can be used to
choose between 1 and b (i.e. k is number of accesses, b is # of
pages that will be referenced, so this is # of revisits over # of pages). Replacement
- CR: Just like CS, but pages are not packed onto blocks. So its just # of tuples in
- SH, H/SS: like SS
- H/CS: like CS, but replace tuples with (key,ptr) pairs
- LH: at each level h of an index, you have random access among pages. Use Yao to figure
out how many pages youll access at each level in k lookups. LocSet is sum
of these over all levels that you choose to worry about (maybe only the root!) LIFO with a
few (4-5) buffers probably an OK replacement strategy.
A Detailed Simulation Study (Welcome to Wisconsin!)
- trace-based & distribution-based: traces used to model individual queries, but
workload synthesized based on different distributions. Traces done on a popular benchmark
database (from the Wisconsin Benchmark). Queries of 1 and 2 tables.
- simulator models CPU, one I/O device, and RAM access. Simulation tuned to
micro-benchmark of WiSS. Performance metric is query throughput.
- 3 levels of sharing modeled: full, half, and no sharing
- Disk arm was jostled around randomly.
- Memory set big enough to hold about 8 concurrent working sets.
- statistical confidence intervals on validity of results guarantee things within 5% of
the mean (how often do you see that in CS performance studies these days?! Why not?)
- comparison of RAND, FIFO, CLOCK, HOT, Working Set, DBMIN
- Bottom line:
- as expected, DBMIN is the top line on every graph
- Stochastic techniques are no good, though feedback-based load control helps
- Later work on such load control shows it's VERY tricky in mixed workloads.
- WS not a big winner either, though a popular OS choice
- DBMIN beats HOT by 7-13% more throughput
Later work: LRU-K, by O'Neil & O'Neil.