Readings in Database Systems, 3rd Edition

Stonebraker & Hellerstein, eds.

Access Methods: B-Trees, R-Trees & GiSTs

First, a Review of Disks

hardware: platters, arms, spindle. Often a cache (often 1 track) -- very device-specific.
data layout: sectors, blocks, tracks, cylinders
"access time": ha!
seek time, rotational delay, transfer time
random vs. sequential I/O
Examples:

Seagate Hawk 2XL (2.15Mb) -- 5.5 msec average rotation delay, 9 msec average seek time (varies from 1 to 22 msec), 5MB/sec transfer rate
Note: average disk access time is about 10-15 msec, RAM access is more like 5 microsecs!
Another note: while transfer rate and capacity are growing wildly every year, seek time is not shrinking nearly as fast!

Access Methods: General Issues

Come in two flavors: heap files and indexes
Support AMI interface:

open (possibly with selection condition)
get_next
close
insert, delete, update_field

indexes usually do two things:

partition: partition a dataset or domain into buckets
label: provide a label for each bucket
this is sometimes done hierarchically (trees), sometimes not (hashing)
the whole game boils down to how you partition & label!

index's utility defined by queries, not data!
typically random I/O in index
Performance goals:

cold vs. hot lookups
# of I/Os vs. # of blocks worth of output
because of random vs. sequential, should always compare to heap file!

Implementation issues:

concurrency & recovery
cost estimation
bulk loading

The "Ubiquitous" B-tree

Query load: equality & linear range predicates

Basics

Basic B-tree properties: balanced, high fanout, some minimal fill factor (order)

in practice, may want minimal fill factor to be less than 50% (delete at empty has been shown good in TP applications)

B-tree vs. B+-tree. Note that B+-tree internal keys simply "direct traffic"; this is very useful.
intra-node operations

Compression

rear & front key compression
pointer compression
Very common in practice, since B+-tree is on critical path of TP apps

Variable-length keys

not too hard -- fill factor now expressed in terms of bytes, not entries.

Multiuser & Recovery issues

tricky, solved problem
high concurrency locking: need not be 2PL
watch repeatable read, too!
we will revisit this issue in detail later

Leaves usually contain record pointers (as in VSAM)

VSAM

unified B+-tree-based implementation for a file-system (recall Stonebraker's gripe)
store sibling leaves and parent in one cylinder
replicate parent along one track to minimize rotation time

R-trees

Query load: Equality and Range predicates in n dimensions

point in polygon
polygon in polygon
Overlaps polygon
Contains polygon

Applications: GIS, VLSI, n-d range queries over traditional data where n is typical

Notation.

R-tree

Structure: like a B+-tree, but keys are MBRs (rectilinear minimum bounding rectangles), one key per pointer
Search: Traverse multiple paths! (DFS)
Insertion:

Where: ChooseLeaf. Least enlargement. Ties broken by picking smallest key.
Node splitting as in B+-tree. Question: what goes left, what goes right?
AdjustTree required. Why not in B+-tree?

Deletion:

Seek and destroy
CondenseTree: throw entries from under-full pages back into tree. Make sure they end up at same level. Why different from B+-tree?

Update: delete, update, reinsert
Split algorithms:

exhaustive
quadratic: start with most distant seeds, greedily add based on maximal preference
linear: like quadratic, but picks seeds differently

Performance study

RISC-II VLSI data set
In practice, people (Postgres & SHORE) seem to use M/2, quadratic split

R-Tree Variants

R*-tree (Beckman, et al., SIGMOD 90)

like R-tree, but 3 distinctions:

insertion minimizes different badness metric (depends on tree level)
new & improved split algorithm (nlogn, more effective)
forced reinsert of 30% most "extreme" entries

decreases overlap of siblings
improves storage utilization
splits less often
shapes tend to be more quadratic (pack better, generate smaller parents, minimizes split of associated pixels)

Popular, since it out-performs R-tree on search. Concurrency problems because of reinsertion.

R+-Tree (Sellis, et al, VLDB87)

instead of overlapping keys, use multiple inserts (i.e. replicate items in leaves)
speeds search, complicates insert/delete/update
buggy!
NOTE: data vs. space partitioning. R-trees partition data. R+-trees partition space.
other space-partitioning trees: K-D-B trees, Quad trees

TV(telescopic vector)-tree (Lin, et al, VLDB Journal, 1994)

Goal: solve high-d problems
Keys: compressed (center, radius) pairs in n-d
compression "telescopes" out to only the appropriate dimensions at each level (other dimensions are "wildcards")
The "dimensionality curse" remains! (See discussion of indexability below)

hB (holey-brick B) trees (Lomet & Salzburg, TODS 90)

solve three problems

naturally handles swiss-cheese polygons
speeds intra-node search
space partitioning & data partitioning

idea: put a main-memory k-d tree (space-partitioning tree) on each node. This tree can represent a holey brick!
Neat idea, pretty complicated. Not really a tree -- can be a DAG. Journal paper is buggy.
Not necessarily more effective than R*-trees

...and a recent flurry of new trees. In practice, most 2-d access methods have proved to be within 20-30% of each other, though no systematic benchmarking has been done.

Generalized Search Trees

Indexing in OO/OR Systems

Quick access to user-defined objects
Support queries natural to the objects
Two previous approaches

Specialized Indices (“ABCDEFG-trees”)

redundant code: most trees are very similar
concurrency control, etc. tricky!

Extensible B-trees & R-trees (Postgres/Illustra)

B-tree or R-tree lookups only!
E.g. ‘WHERE movie.video < ‘Terminator 2’

A Third Approach

A generalized search tree. Must be:

Extensible in terms of queries
General (B+-tree, R-tree, etc.)
Easy to extend
Efficient (match specialized trees)
Highly concurrent, recoverable, etc.

Uses for GiSTs

New indexes needed for new apps...

find all supersets of S
find all molecules that bind to M
your favorite query here (multimedia?)

...and for new queries over old domains:

find all points in region from 12 to 1 o’clock
find all strings that match R. E.

Structure: balanced tree of (p, ptr) pairs

p is a key “predicate”
p holds for all objects below ptr
keys on a page may overlap
Key predicates: a user-defined class

This is the only extensibility required!

Key Methods

Consistent(E,q): E.p && q? (no/maybe)

Labeling

Union(P): new key that holds for all tuples in P

Partitioning

Penalty(E1,E2): penalty of inserting E2 in subtree E1
PickSplit(P): split P into two groups of entries

General technique:

traverse tree where Consistent is TRUE

For fancier things, see [Aoki98].

Insert

descend tree along least increase in Penalty
if there’s room at leaf, insert there
else split according to PickSplit
propagate changes using Union
Notes:

on overflow, can do R*-tree style reinsert

Delete

find the entry via Search, and delete it
propagate changes using Union
on underflow:

ordered keys, do B+-tree style borrow/coalesce
else reinsert stuff on page and delete page

GiSTS over (B+-trees)

Logically, keys represent ranges [x,y)
Queries:

Contains([a,b), v)

Consistent(E,q): (x<b) && (y > a)
Union(P): [MIN(xi), MAX(yi))
Penalty(E1, E2): – return MAX(y2 - y1, 0) + MAX(x1 - x2, 0)

if E1 is leftmost or rightmost, drop a term

PickSplit(P): split evenly in order

Key Compression

Keys may take up too much room on a page
Two extra key methods:

Compress(E)/Decompress(E)
Compression can be lossy: over-generalization OK

B+-tree Compression

Compress(E=([x,y), ptr)):

if E is leftmost return NULL, else return x

Decompress(E=(p, ptr)):

if E is leftmost, let x = -infinity, else let x = p.
if E is rightmost, let y = infinity, else let y be the value stored in the next key on the right.
if E is rightmost on a leaf page, let y = x+1.

GiSTs over R2 (R-tree)

Logically, keys represent bounding boxes
Queries: Contains, Overlaps, Equals
Consistent(E,q): E.p overlap/contain q?
Union(P): bounding box of all entries
Compress(E): form bounding box
Decompress(E): identity function
Penalty(E,F): size(Union({E,F}) - size(E)
PickSplit(P): R-tree or R*-tree methods

GiSTs over P(Z) (RD-tree)

Logically, keys represent bounding sets
Queries: Contains, Overlaps, Equals
Consistent(E,q):
Union(P): set-union of keys
Compress(E): Bloom filters, rangesets, etc.
Decompress(E): match compress
Penalty(E,F): |E.p U F.p| - |E.p|
PickSplit(P): R-tree algorithms

Research Issues

Concurrency, Recovery, Consistency – Kornacker, et al '97
Generalizing Search -- Aoki '98
Indexability -- Hellerstein/Papadimitriou/Koutsoupias '97, Koutsoupias/Taylor '98, Tsamoladas/Miranker '98
AM development tools: amdb (Kornacker/Shah, demo at SIGMOD '98)