Extensibility & Object-Relational Systems
Motivation & Politics
In the early 80s, it became clear that Relational systems were not robust enough
for non-administrative data-intensive applications of the day:
Two buzz-phrases began to emerge: "Object-Oriented" and
Much vision & politics ensued:
- Various data models (NF2, ER, Functional, Semantic)
- Object-Oriented DB System Manifesto (OO-ness)
- Third-Generation DB System Manifesto (Extensibility)
- Many query languages proposed
Systems were built, companies started, etc.
Today, the field has settled down into two arenas:
- Persistent OO PL systems (e.g. EXODUS, ObjectStore, Objectivity, Versant, etc.)
- Query-based systems with OO features (e.g. Starburst, Postgres, Illustra, Informix &
ORacle "Universal Servers", DB/2 UDB)
- Almost nobody does both well
Few people continue to argue in terms of the paradigms. (Exhaustion?)
Three influential research systems:
- Starburst (IBM Almaden)
- POSTGRES (Berkeley)
- EXODUS (Wisconsin)
- Others include O2 (Altair), ORION (MCC), Iris (HP), Genesis (Texas)
We will focus on POSTGRES & EXODUS, discuss Starburst briefly.
- EXODUS was intended to be both a persistent PL system and a query system
- query processing engine never got built, though the EXODUS optimizer architecture was
influential (Graefe & DeWitt)
- Ended up focusing on OODB stuff, so we will discuss it more later.
- SHORE (follow-on to EXODUS) is delivering on EXODUS promises, but rather late.
- Persistent C++
- Query Processing (w/ GIS features): Paradise
- Extensible Optimizer: Opt++
- We'll see Exodus/SHORE work on pointer swizzling, client-server caching
Informix's buzzphrase for Illustra and look-alikes (courtesy Roger Sippl). Didn't
patent the name!
Query-based, extensible systems with some OO features like inheritance and OIDs.
SQL3 draft is very much like this.
Stonebrakers application matrix:
Argues that the upper right is growing, and will engulf upper left and lower right.
- Web servers, full-text collections
- Time-series data
- "Asset Management"
- image DB
Overview: Things needed in an Object-Relational DBMS
(From the "Third-Generation Database System Manifesto")
The relational data model (as implemented!) is "semantically impoverished":
- fixed set of base types (integer, float, etc)
- only structuring allowed is normal-form relations
- only operations are relational algebra, using comparators from base types
Instead, people want a "semantically rich" data model
- extensible ADTs
- complex types based on type constructors, and methods for those types
- "Object identity"
One of the nicest, cleanest language papers, influential in the design of
object-relational data models and query languages (starting with Postquel)
- ref types, nested dots, implicit joins (implication about OIDs)
- generalization hierarchies (converted to "collection inheritence")
- set-valued attrs (non-1st-normal-form!)
Stonebraker: Extensible ADTs
Idea: you should be able to add new atomic types to the system, along with
methods for the types, and new access methods.
Type is defined by:
- storage size (can be variable)
- input method
- output method
- any other methods a user wishes to provide
Then you could do standard relational processing over those types.
Example: add 2d spatial operators to RDBMS
Engineering issues (some from the paper, some not):
- must know about user-defined types and methods (table-driven)
- must be able to compute selectivity for user-defined predicates
- must know about cost of user-defined methods, and consider predicate pullup
- must know how to match user-defined predicates to fancy new access methods
- must know whether user-defined join predicates can be evaluated by hash or merge
- must have dynamic linking support (24x7 operation)
- methods called via "function pointers", "functors" or some similar
- can slow down standard processing
- support for "untrusted" functions (hot topic today)
- support for "large objects"
- caching for expensive methods (a la subquery caching)
- extensible aggregations
- 3 functions: init, iter, end
- Access Methods
- Stonebraker: An access method is a generic object that provides
- open, get-first, get-next, close, insert, delete
- if its fancy, it takes SARGs and evaluates them quickly
- needs to provide cost estimates to optimizer
- integration with CC
- can be solved by physical logging
- can open up logging interface for AM-specific logging
- integration with Buffer Manager
- In practice, almost nobody used the Postgres Access method extensibility
- Buffer Management
- Large objects require new schemes
- integration with access methods
Note: Extensible ADTs do not fundamentally change a relational system
- they fit naturally with the relational model
- as Stonebraker shows, they fit naturally into RDBMS query processing
- for relational "believers" this is all you should need to solve all problems
Stonebraker, Rowe, a few staff and many students, 1986-1994. Post-INGRES.
The Postgres Data Model
- Co-opt the OO terminology
- class = relation
- instance = tuple
- object-id = tuple-id
- method = attribute or function of attributes
- Support extensible ADTs
- extensible procedures using C functions
- binary operators, which interface to extensible AM
- Support type constructors
- trick: use queries
- columns can be parameterized Postquel functions (returns setof, or tuple)
- queries can live in fields of a tuple (returns setof or tuple)
- another exploitation of the view paradigm!
- these derived objects can optionally be cached (never implemented)
- nested-dots used to traverse complex object structures
- leverages EXISTING techniques for relational processing. Cute!
- added array support directly
- added class inheritance (gives method inheritance and collection hierarchies)
- originally written in LISP, then ran Lisp2C, resulting in a horrible built-in
inheritance mechanism over C
- only access methods added to Postgres were done "in house"
- B-tree and R-tree early
- linear hashing late
- GiST added in the last couple years
- "Fast Path" to AMs, as an alternative to "Persistent X"
- never well-documented or used outside Berkeley
- ADTs as described above
- No Overwrite Storage, time travel, etc
- Research project was "shut down" in 94. 2 Berkeley students did a major
cleanup (remove lispisms, remove a number of theses), migrated to SQL, and released
Postgres95. This was picked up by freeware hackers on the net, and now PostgreSQL
seems to be the serious freeware db of choice (www.postgresql.org).
- Postgres Rule System: Active Database support (we will skip this topic this year)
- Shared-mem parallel version, with new optimization techniques
- "Inversion" file system
- PICASSO UI (Rowe & students)
- Support for tertiary memory
- Method Indexing
- Partial Indexes
- Expensive predicate optimization
- Commercialized as Illustra
- Concepts ported into Informix
- Postgres code was a mess. Miraculous that it worked. More miraculous that Illustra
- Data model was sloppy but clever. Burst some OO bubbles.
- Roger King: "My cat is object-oriented"
- Missed an important distinction between Class and Collection.
- No-overwrite storage was not as bad as you think. Expect the concept to resurface.
- In many ways, the most successful of the 3 projects
- used to be written off as goofy research, but Illustra is now ahead of the game
- ADT extensibility & dynamic linking are very useful Illustras
"DataBlades", Oracle's "Data Cartridges"
- though not all aspects worked well, almost all were novel and influential
- Many radical ideas in one system!
Original goal: build a nice playpen for whatever comes next.
Extensible "in-house". Not by users!
No one survey paper seems to capture the work they did. Best bet: "Starburst Mid-Flight: As The Dust Clears", Haas, et al., TKDE 1990
- clean internal query representation (QGM). Key to Query Rewrite!
- non-normalized catalogs for efficiency normalized view for users
- uniform record structure across RSS & RDS
- a single expression evaluator for RSS & RDS
- WAL instead of shadow pages
- B+-tree compression
- Buffer Pool Manager accepts hints from optimizer (a la DBMIN)
- User-defined functions:
- table expressions: queries or C functions
- scalar functions
- no dynamic linking
- Rule-based query rewrite engine
- a little rule system with QGM as "working memory"
- conditions and actions are C functions that check and change QGM
- some nifty rule control mechanisms (rule classes, rule budgets, multiple conflict res.)
- Extensible access methods (as in POSTGRES)
- "Attachments": routines to be automatically called before/after dealing with
an access method
- used by Starburst Rule System to generate transition logs
- used to implement pre-computed joins (see below)
- Complex objects implemented in a "wrapper" (SQL-XNF), translated down to
"Proofs" of Extensibility:
- added "signature" attachment to automatically tag tuples with some derived
- though writing the attachment was hard, hooking it in took only 1 day
- added Outer Join
- not so simple required adding things in QGM, optimizer, and executor
- IMS attachment
- pre-computed joins using pointers, which are maintained
- written by "outsiders" Mike Carey & Beau Shekita from Wisconsin
in a summer visit to Almaden
- Grammar-like rule-based query optimizer combined with System R bottom-up construction
- 2nd Rule System called "Alert", based on infinitely-running queries.
Lightweight and pretty effective.
- Persistent C++ support (SMRC)
- Recursive query processing!
- Lots of technology for query rewriting
Many of the Starburst folks took a "vacation" from research the last few
years, and merged Starburst technology into DB2 UDB.
- Probably the best code of the bunch
- The worlds most effective query optimizer?
- The DB2 experience good for all concerned
- Attention paid to details, shows up in complex parts of system
- A long-term advantage? Depends on market, IBMs development culture...