redbookcover.gif (13955 bytes) Readings in Database Systems, 3rd Edition

Stonebraker & Hellerstein, eds.

Access Methods: B-Trees, R-Trees & GiSTs

First, a Review of Disks

  • hardware: platters, arms, spindle. Often a cache (often 1 track) -- very device-specific.
  • data layout: sectors, blocks, tracks, cylinders
  • "access time": ha!
  • seek time, rotational delay, transfer time
  • random vs. sequential I/O
  • Examples:
    • Seagate Hawk 2XL (2.15Mb) -- 5.5 msec average rotation delay, 9 msec average seek time (varies from 1 to 22 msec), 5MB/sec transfer rate
    • Note: average disk access time is about 10-15 msec, RAM access is more like 5 microsecs!
    • Another note: while transfer rate and capacity are growing wildly every year, seek time is not shrinking nearly as fast!

Access Methods: General Issues

  • Come in two flavors: heap files and indexes
  • Support AMI interface:
    • open (possibly with selection condition)
    • get_next
    • close
    • insert, delete, update_field
  • indexes usually do two things:
    • partition: partition a dataset or domain into buckets
    • label: provide a label for each bucket
    • this is sometimes done hierarchically (trees), sometimes not (hashing)
    • the whole game boils down to how you partition & label!
  • index's utility defined by queries, not data!
  • typically random I/O in index
  • Performance goals:
    • cold vs. hot lookups
    • # of I/Os vs. # of blocks worth of output
    • because of random vs. sequential, should always compare to heap file!
  • Implementation issues:
    • concurrency & recovery
    • cost estimation
    • bulk loading

The "Ubiquitous" B-tree

Query load: equality & linear range predicates


  • Basic B-tree properties: balanced, high fanout, some minimal fill factor (order)
    • in practice, may want minimal fill factor to be less than 50% (delete at empty has been shown good in TP applications)
  • B-tree vs. B+-tree.  Note that B+-tree internal keys simply "direct traffic"; this is very useful.
  • intra-node operations


  • rear & front key compression
  • pointer compression
  • Very common in practice, since B+-tree is on critical path of TP apps

Variable-length keys

  • not too hard -- fill factor now expressed in terms of bytes, not entries.

Multiuser & Recovery issues

  • tricky, solved problem
  • high concurrency locking: need not be 2PL
  • watch repeatable read, too!
  • we will revisit this issue in detail later

Leaves usually contain record pointers (as in VSAM)


  • unified B+-tree-based implementation for a file-system (recall Stonebraker's gripe)
  • store sibling leaves and parent in one cylinder
  • replicate parent along one track to minimize rotation time



Query load: Equality and Range predicates in n dimensions

  • point in polygon
  • polygon in polygon
  • Overlaps polygon
  • Contains polygon

Applications: GIS, VLSI, n-d range queries over traditional data where n is typical



  • Structure: like a B+-tree, but keys are MBRs (rectilinear minimum bounding rectangles), one key per pointer
  • Search: Traverse multiple paths! (DFS)
  • Insertion:
    • Where: ChooseLeaf. Least enlargement. Ties broken by picking smallest key.
    • Node splitting as in B+-tree. Question: what goes left, what goes right?
    • AdjustTree required. Why not in B+-tree?
  • Deletion:
    • Seek and destroy
    • CondenseTree: throw entries from under-full pages back into tree. Make sure they end up at same level. Why different from B+-tree?
  • Update: delete, update, reinsert
  • Split algorithms:
    • exhaustive
    • quadratic: start with most distant seeds, greedily add based on maximal preference
    • linear: like quadratic, but picks seeds differently
  • Performance study
    • RISC-II VLSI data set
    • In practice, people (Postgres & SHORE) seem to use M/2, quadratic split

R-Tree Variants

R*-tree (Beckman, et al., SIGMOD 90)

  • like R-tree, but 3 distinctions:
    • insertion minimizes different badness metric (depends on tree level)
    • new & improved split algorithm (nlogn, more effective)
    • forced reinsert of 30% most "extreme" entries
      • decreases overlap of siblings
      • improves storage utilization
      • splits less often
      • shapes tend to be more quadratic (pack better, generate smaller parents, minimizes split of associated pixels)
  • Popular, since it out-performs R-tree on search.   Concurrency problems because of reinsertion.

R+-Tree (Sellis, et al, VLDB87)

  • instead of overlapping keys, use multiple inserts (i.e. replicate items in leaves)
  • speeds search, complicates insert/delete/update
  • buggy!
  • NOTE: data vs. space partitioning. R-trees partition data. R+-trees partition space.
  • other space-partitioning trees: K-D-B trees, Quad trees

TV(telescopic vector)-tree (Lin, et al, VLDB Journal, 1994)

  • Goal: solve high-d problems
  • Keys: compressed (center, radius) pairs in n-d
  • compression "telescopes" out to only the appropriate dimensions at each level (other dimensions are "wildcards")
  • The "dimensionality curse" remains!  (See discussion of indexability below)

hB (holey-brick B) trees (Lomet & Salzburg, TODS 90)

  • solve three problems
    • naturally handles swiss-cheese polygons
    • speeds intra-node search
    • space partitioning & data partitioning
  • idea: put a main-memory k-d tree (space-partitioning tree) on each node. This tree can represent a holey brick!
  • Neat idea, pretty complicated. Not really a tree -- can be a DAG. Journal paper is buggy.
  • Not necessarily more effective than R*-trees

...and a recent flurry of new trees.  In practice, most 2-d access methods have proved to be within 20-30% of each other, though no systematic benchmarking has been done.

Generalized Search Trees

Indexing in OO/OR Systems

  • Quick access to user-defined objects
  • Support queries natural to the objects
  • Two previous approaches
    • Specialized Indices (“ABCDEFG-trees”)
      • redundant code: most trees are very similar
      • concurrency control, etc. tricky!
    • Extensible B-trees & R-trees (Postgres/Illustra)
      • B-tree or R-tree lookups only!
      • E.g. ‘WHERE < ‘Terminator 2’

A Third Approach

  • A generalized search tree. Must be:
    • Extensible in terms of queries
    • General (B+-tree, R-tree, etc.)
    • Easy to extend
    • Efficient (match specialized trees)
    • Highly concurrent, recoverable, etc.

Uses for GiSTs

  • New indexes needed for new apps...
    • find all supersets of S
    • find all molecules that bind to M
    • your favorite query here (multimedia?)
  • ...and for new queries over old domains:
    • find all points in region from 12 to 1 o’clock
    • find all strings that match R. E.

Structure: balanced tree of (p, ptr) pairs

  • p is a key “predicate”
  • p holds for all objects below ptr
  • keys on a page may overlap
  • Key predicates: a user-defined class
    • This is the only extensibility required!

Key Methods

  • Search
    • Consistent(E,q): E.p && q? (no/maybe)
  • Labeling
    • Union(P): new key that holds for all tuples in P
  • Partitioning
    • Penalty(E1,E2): penalty of inserting E2 in subtree E1
    • PickSplit(P): split P into two groups of entries


  • General technique:
    • traverse tree where Consistent is TRUE
  • For fancier things, see [Aoki98].


  • descend tree along least increase in Penalty
  • if there’s room at leaf, insert there
  • else split according to PickSplit
  • propagate changes using Union
  • Notes:
    • on overflow, can do R*-tree style reinsert


  • find the entry via Search, and delete it
  • propagate changes using Union
  • on underflow:
    • ordered keys, do B+-tree style borrow/coalesce
    • else reinsert stuff on page and delete page

GiSTS over (B+-trees)

  • Logically, keys represent ranges [x,y)
  • Queries:
    • Contains([a,b), v)
  • Consistent(E,q): (x<b) && (y > a)
  • Union(P): [MIN(xi), MAX(yi))
  • Penalty(E1, E2): – return MAX(y2 - y1, 0) + MAX(x1 - x2, 0)
    • if E1 is leftmost or rightmost, drop a term
  • PickSplit(P): split evenly in order

Key Compression

  • Keys may take up too much room on a page
  • Two extra key methods:
    • Compress(E)/Decompress(E)
    • Compression can be lossy: over-generalization OK
  • B+-tree Compression
    • Compress(E=([x,y), ptr)):
      • if E is leftmost return NULL, else return x
    • Decompress(E=(p, ptr)):
      • if E is leftmost, let x = -infinity, else let x = p.
      • if E is rightmost, let y = infinity, else let y be the value stored in the next key on the right.
      • if E is rightmost on a leaf page, let y = x+1.

GiSTs over R2 (R-tree)

  • Logically, keys represent bounding boxes
  • Queries: Contains, Overlaps, Equals
  • Consistent(E,q): E.p overlap/contain q?
  • Union(P): bounding box of all entries
  • Compress(E): form bounding box
  • Decompress(E): identity function
  • Penalty(E,F): size(Union({E,F}) - size(E)
  • PickSplit(P): R-tree or R*-tree methods

GiSTs over P(Z) (RD-tree)

  • Logically, keys represent bounding sets
  • Queries: Contains, Overlaps, Equals
  • Consistent(E,q):
  • Union(P): set-union of keys
  • Compress(E): Bloom filters, rangesets, etc.
  • Decompress(E): match compress
  • Penalty(E,F): |E.p U F.p| - |E.p|
  • PickSplit(P): R-tree algorithms

Research Issues


1998, Joseph M. Hellerstein.  Last modified 08/18/98.
Feedback welcomed.