redbookcover.gif (13955 bytes) Readings in Database Systems, 3rd Edition

Stonebraker & Hellerstein, eds.


Benchmarking

Goals

  • What's a benchmark for?
    1. What do you want to learn?
    2. How much are you prepared to invest?
      1. programmer time
      2. machine resources
      3. management time
      4. lost opportunity
    3. What are you prepared to give up?
    • Side-benefits to the community...

Types of Benchmarks

  • Generic benchmarks try to give a broad idea of price/performance across many applications.
  • Application-Specific benchmarks try to focus in on a restricted class of workloads.  Some of the main consortia defining these:
    • SPEC (System Performance Evaluation Cooperative): scientific workloads, designed to measure workstation performance
    • The Perfect Club: scientific workloads on exotic parallel architectures
    • TPC (Transaction Processing Council): transaction processing and decision support workloads, architecture-independent

Gray's criteria for a good app-specific benchmark:

  • Relevant: It must measure the peak performance and price/performance of systems when performing typical operations within that problem domain.
  • Portable: It should be easy to implement the benchmark on many different systems and architectures.
  • Scaleable: The benchmark should apply to small and large computer systems. It should be possible to scale the benchmark up to larger systems, and to parallel computer systems as computer performance and architecture evolve.
  • Simple: The benchmark must be understandable, otherwise it will lack credibility.
A counter-example: MIPS (Millions of Instructions Per Second)
  • irrelevant: doesn't translate directly to useful work
  • not portable:  IntelMIPS != SunMIPS
  • not scalable: how does it apply to multiprocessors?

The importance of benchmark acceptance

  • Benchmarketing
  • Wisconsin benchmark history: DeWitt vs. Ellison.  The DeWitt clause.
  • Anon, et al.
  • Benchmark wars: escalating numbers lead to escalating tricks, wasted time ("just let us tune it as much as Vendor X did")
    • if you're doing your own, avoid this with "no reruns, no excuses"

Transaction Processing Council

TPC arose in 1988 (headed by consultant Omri Serlin), consortium of 35 hardware/software companies
  • define benchmarks for TP and DSS, define cost/performance metrics, provide official audits
  • Today: TPC-C is the TP benchmark (succeeding A and B), TPC-D is the decision-support benchmark
  • Focus on as real-world a scenario as possible:
    • performance from terminals to server and back, as opposed (say) to SPEC.  all aspects of the system.
    • note the overhead of doing this!
      • hard to set up
      • requires auditor to report results
      • functionality as well as performance

Writing a benchmark

  • try to measure core speed and potential bottlenecks
    • "micro-benchmarks", e.g. Sort
  • try to expose functionalities or lack thereof
    • e.g. Wisconsin, TPC-D
    • can backfire for adaptive parts of a system.  Optimizers are notoroiusly hard to "benchmark"
  • try to measure end-to-end performance on a realish workload
    • e.g. TPC-C, TPC-D.
  • a statistical note: don't use average performance!  consider variance.  if you must give one number, give 90th percentile performance.

DB Benchmarks to be aware of:

  • Wisconsin (mostly for history)
  • TPC-C
  • TPC-D
  • Set Query: Pat O'Neil's complex query benchmark (shows off Model 204)
  • 007:  OODBMS benchmark from Wisconsin and some of the O-vendors
  • Sequoia: app-specific benchmark for Earth Science from Sequoia project at Berkeley (Stonebraker) and UCSB (earth scientists).  Used as an early Object-Relational and GIS benchmark (mostly R-trees and user-defined functions, but also one transitive closure query (!))
  • Bucky: an Object-Relational benchmark from Wisconsin focusing on structured types (refs and nested sets)
  • OR-1: a yet-to-be-completed Object-Relational benchmark from Wisconsin, Informix (Stonebraker) and IBM (Carey).  Was supposed to fix Bucky with more focus on "what matters to customers".
  • Gray's Benchmark Handbook now lives on the web
 

© 1999, Joseph M. Hellerstein.  Last modified 05/04/99.
Feedback welcomed.