Motivation & DBMS Architecture Overview

The technology trend angle: emphasis in CS research has shifted from computation to information management.

Evidence:

Hardware: high-performance computer companies on hard times (Thinking Machines, KSR, Cray, SGI?). The exemplary success story in massive parallelism: Teradata (now sold by NCR). Been around since the 70's. "Shared-Nothing" (sometimes called "clusters" or "NOW/COW/WOW" etc.) Successes have been largely database-centric.
"Low-end" users: scramble to webspace reflects desire to give/receive info. Success of these efforts is questionable, and the disorganization will get worse as things grow.
"High-end" users: scientists, the biggest users of high-powered computation, now have data management problems that exceed their appetite for cycles
Other researchers: architecture, OS, theoreticians, AI are all moving this way.
PS: you will see all this in the job market!

The utilitarian angle: "Database: the boring part of accounting"? Not anymore! Interesting, world-changing apps:

digital libraries
digital ``asset mgmt'' -- i.e., multimedia & entertainment
digital mapping & geo apps
scientific applications: earth science, DNA, molecular docking, experiment management, etc.
decision-support, data analysis & "mining"
your mom and pop care about this stuff! (as do funding agencies, companies, etc.)

Big, beautiful ideas: relational model & languages, concurrency control, query processing, etc.
Real, meaty systems work: the serious 24x7, high performance, complex systems engineering domain
Room for both kinds of contributions, separately and simultaneously

The database-centric view of the CS research universe (take with a large grain of salt):

OS & Architecture are ``finished'': Ascendancy of Linux and FreeBSD. If Microsoft & Intel can mass-market these, it must be easy.
PL has become arcane
Theory is ... theoretical.
AI is that which cannot be done
etc...
while you may (should!) disagree with all the above in some respects, it is true that DB research is notably relevant and fertile these days. Lots of meaty problems remain that people care about.

Tertiary storage: EOSDIS 1 Tb/day, keep it all for 15 years
Parallelism: data parallelism is natural in a DBMS. How to do DB operations in parallel and balance load well? WalMart (365 node, 6Tb online, 4billion row table, 200million updates daily, 4000 queries/day, 1500 users/week, 4 min DS response time w/ avg. 60000 rows out)
Data Analysis, Data Mining: given huge amounts of data, try to find interesting information in the data. What is the "killer query"?

schema integration: trying to figure out how different schemas fit together. Hard!!!
DBMS integration: trying to semi-transparently glue different kinds of database systems together

Low-level ``record-at-a-time'' DML, i.e. physical data structures reflected in DML (no data independence)

1970: Codd's paper. The most influential paper in DB research. Set-at-a-time DML. Data independence. Allows for schema and physical storage structures to change under the covers''. Truly important theory, led to "paradigm shift" in thinking and in practice. (Papadimitriou: "as clear a paradigm shift as we can hope to find in computer science"). Turing award.
early-to-mid-70's: raging debate between the two camps. "great debate" in 1975
mid 70's: 2 full-function (sort of) prototypes. Ancestors of essentially all today's commercial systems
Ingres: UCB 1974-77

a ``pickup team'', including Stonebraker & Wong. early and pioneering. begat Ingres Corp (CA), Sybase, MS SQL Server, Britton-Lee, Wang's PACE.

15 PhDs. begat IBM's SQL/DS & DB2, Oracle, HP's Allbase, Tandem's Non-Stop SQL. System R arguably got more stuff ``right''

Both were viable starting points, proved practicality of relational approach. Beautiful example of theory -> practice!!
early 80's: commercialization of relational systems
mid 80's: SQL becomes ``intergalactic standard''.

design by committee leads to kitchen sink
standards body as designers, rather than codifiers
leads to wasting time (Sybase) or irrelevance of standard (Informix & IBM shipping SQL3 before standardized)

various players in research, industry and both scrambling to standardize the "next thing"

Two axes:

OODBMS: term is somewhat nebulous. usually, a persistent programming environment

an attempt to provide best of both worlds: queries & rich data types.
query interface.
Rich data types with lots of OO features, esp. object identity, type-extensibility and inheritance.
Basic ``outer'' data type is relation, with extensible data types in the fields.
relational theory applies to outer operations only

Single-Site (i.e. traditional)
Parallel: lots of tightly-coupled machines solve one query together. A database supercomputer.
Distributed: geographically distributed machines, each "hosting" different data, participate in a more loosely coupled manner