Access Methods: BTrees, RTrees & GiSTs
First, a Review of Disks
 hardware: platters, arms, spindle. Often a cache (often 1 track)  very
devicespecific.
 data layout: sectors, blocks, tracks, cylinders
 "access time": ha!
 seek time, rotational delay, transfer time
 random vs. sequential I/O
 Examples:
 Seagate Hawk 2XL (2.15Mb)  5.5 msec average rotation delay, 9 msec average seek time
(varies from 1 to 22 msec), 5MB/sec transfer rate
 Note: average disk access time is about 1015 msec, RAM access is more like 5 microsecs!
 Another note: while transfer rate and capacity are growing wildly every year, seek time
is not shrinking nearly as fast!
Access Methods: General Issues
 Come in two flavors: heap files and indexes
 Support AMI interface:
 open (possibly with selection condition)
 get_next
 close
 insert, delete, update_field
 indexes usually do two things:
 partition: partition a dataset or domain into buckets
 label: provide a label for each bucket
 this is sometimes done hierarchically (trees), sometimes not (hashing)
 the whole game boils down to how you partition & label!
 index's utility defined by queries, not data!
 typically random I/O in index
 Performance goals:
 cold vs. hot lookups
 # of I/Os vs. # of blocks worth of output
 because of random vs. sequential, should always compare to heap file!
 Implementation issues:
 concurrency & recovery
 cost estimation
 bulk loading
The "Ubiquitous" Btree
Query load: equality & linear range predicates
Basics
 Basic Btree properties: balanced, high fanout, some minimal fill factor (order)
 in practice, may want minimal fill factor to be less than 50% (delete at empty has been
shown good in TP applications)
 Btree vs. B+tree. Note that B+tree internal keys simply "direct
traffic"; this is very useful.
 intranode operations
Compression
 rear & front key compression
 pointer compression
 Very common in practice, since B+tree is on critical path of TP apps
Variablelength keys
 not too hard  fill factor now expressed in terms of bytes, not entries.
Multiuser & Recovery issues
 tricky, solved problem
 high concurrency locking: need not be 2PL
 watch repeatable read, too!
 we will revisit this issue in detail later
Leaves usually contain record pointers (as in VSAM)
VSAM
 unified B+treebased implementation for a filesystem (recall Stonebraker's gripe)
 store sibling leaves and parent in one cylinder
 replicate parent along one track to minimize rotation time
Rtrees
Query load: Equality and Range predicates in n dimensions
 point in polygon
 polygon in polygon
 Overlaps polygon
 Contains polygon
Applications: GIS, VLSI, nd range queries over traditional data where n is typical
Notation.
Rtree
 Structure: like a B+tree, but keys are MBRs (rectilinear minimum bounding rectangles),
one key per pointer
 Search: Traverse multiple paths! (DFS)
 Insertion:
 Where: ChooseLeaf. Least enlargement. Ties broken by picking smallest key.
 Node splitting as in B+tree. Question: what goes left, what goes right?
 AdjustTree required. Why not in B+tree?
 Deletion:
 Seek and destroy
 CondenseTree: throw entries from underfull pages back into tree. Make sure they end up
at same level. Why different from B+tree?
 Update: delete, update, reinsert
 Split algorithms:
 exhaustive
 quadratic: start with most distant seeds, greedily add based on maximal preference
 linear: like quadratic, but picks seeds differently
 Performance study
 RISCII VLSI data set
 In practice, people (Postgres & SHORE) seem to use M/2, quadratic split
RTree Variants
R*tree (Beckman, et al., SIGMOD 90)
 like Rtree, but 3 distinctions:
 insertion minimizes different badness metric (depends on tree level)
 new & improved split algorithm (nlogn, more effective)
 forced reinsert of 30% most "extreme" entries
 decreases overlap of siblings
 improves storage utilization
 splits less often
 shapes tend to be more quadratic (pack better, generate smaller parents, minimizes split
of associated pixels)
 Popular, since it outperforms Rtree on search. Concurrency problems
because of reinsertion.
R+Tree (Sellis, et al, VLDB87)
 instead of overlapping keys, use multiple inserts (i.e. replicate items in leaves)
 speeds search, complicates insert/delete/update
 buggy!
 NOTE: data vs. space partitioning. Rtrees partition data. R+trees partition space.
 other spacepartitioning trees: KDB trees, Quad trees
TV(telescopic vector)tree (Lin, et al, VLDB Journal, 1994)
 Goal: solve highd problems
 Keys: compressed (center, radius) pairs in nd
 compression "telescopes" out to only the appropriate dimensions at each level
(other dimensions are "wildcards")
 The "dimensionality curse" remains! (See discussion of indexability
below)
hB (holeybrick B) trees (Lomet & Salzburg, TODS 90)
 solve three problems
 naturally handles swisscheese polygons
 speeds intranode search
 space partitioning & data partitioning
 idea: put a mainmemory kd tree (spacepartitioning tree) on each node. This tree can
represent a holey brick!
 Neat idea, pretty complicated. Not really a tree  can be a DAG. Journal paper is
buggy.
 Not necessarily more effective than R*trees
...and a recent flurry of new trees. In practice, most 2d access methods have
proved to be within 2030% of each other, though no systematic benchmarking has been done.
Generalized Search Trees
Indexing in OO/OR Systems
 Quick access to userdefined objects
 Support queries natural to the objects
 Two previous approaches
 Specialized Indices (“ABCDEFGtrees”)
 redundant code: most trees are very similar
 concurrency control, etc. tricky!
 Extensible Btrees & Rtrees (Postgres/Illustra)
 Btree or Rtree lookups only!
 E.g. ‘WHERE movie.video < ‘Terminator 2’
A Third Approach
 A generalized search tree. Must be:
 Extensible in terms of queries
 General (B+tree, Rtree, etc.)
 Easy to extend
 Efficient (match specialized trees)
 Highly concurrent, recoverable, etc.
Uses for GiSTs
 New indexes needed for new apps...
 find all supersets of S
 find all molecules that bind to M
 your favorite query here (multimedia?)
 ...and for new queries over old domains:
 find all points in region from 12 to 1 o’clock
 find all strings that match R. E.
Structure: balanced tree of (p, ptr) pairs
 p is a key “predicate”
 p holds for all objects below ptr
 keys on a page may overlap
 Key predicates: a userdefined class
 This is the only extensibility required!
Key Methods
 Search
 Consistent(E,q): E.p && q? (no/maybe)
 Labeling
 Union(P): new key that holds for all tuples in P
 Partitioning
 Penalty(E1,E2): penalty of inserting E2 in subtree E1
 PickSplit(P): split P into two groups of entries
Search
 General technique:
 traverse tree where Consistent is TRUE
 For fancier things, see [Aoki98].
Insert
 descend tree along least increase in Penalty
 if there’s room at leaf, insert there
 else split according to PickSplit
 propagate changes using Union
 Notes:
 on overflow, can do R*tree style reinsert
Delete
 find the entry via Search, and delete it
 propagate changes using Union
 on underflow:
 ordered keys, do B+tree style borrow/coalesce
 else reinsert stuff on page and delete page
GiSTS over (B+trees)
 Logically, keys represent ranges [x,y)
 Queries:
 Consistent(E,q): (x<b) && (y > a)
 Union(P): [MIN(xi), MAX(yi))
 Penalty(E1, E2): – return MAX(y2  y1, 0) + MAX(x1  x2, 0)
 if E1 is leftmost or rightmost, drop a term
 PickSplit(P): split evenly in order
Key Compression
 Keys may take up too much room on a page
 Two extra key methods:
 Compress(E)/Decompress(E)
 Compression can be lossy: overgeneralization OK
 B+tree Compression
 Compress(E=([x,y), ptr)):
 if E is leftmost return NULL, else return x
 Decompress(E=(p, ptr)):
 if E is leftmost, let x = infinity, else let x = p.
 if E is rightmost, let y = infinity, else let y be the value stored in the next key on
the right.
 if E is rightmost on a leaf page, let y = x+1.
GiSTs over R2 (Rtree)
 Logically, keys represent bounding boxes
 Queries: Contains, Overlaps, Equals
 Consistent(E,q): E.p overlap/contain q?
 Union(P): bounding box of all entries
 Compress(E): form bounding box
 Decompress(E): identity function
 Penalty(E,F): size(Union({E,F})  size(E)
 PickSplit(P): Rtree or R*tree methods
GiSTs over P(Z) (RDtree)
 Logically, keys represent bounding sets
 Queries: Contains, Overlaps, Equals
 Consistent(E,q):
 Union(P): setunion of keys
 Compress(E): Bloom filters, rangesets, etc.
 Decompress(E): match compress
 Penalty(E,F): E.p U F.p  E.p
 PickSplit(P): Rtree algorithms
Research Issues
