Benchmarking

Goals

Generic benchmarks try to give a broad idea of price/performance across many applications.
Application-Specific benchmarks try to focus in on a restricted class of workloads. Some of the main consortia defining these:

SPEC (System Performance Evaluation Cooperative): scientific workloads, designed to measure workstation performance
The Perfect Club: scientific workloads on exotic parallel architectures
TPC (Transaction Processing Council): transaction processing and decision support workloads, architecture-independent

Relevant: It must measure the peak performance and price/performance of systems when performing typical operations within that problem domain.
Portable: It should be easy to implement the benchmark on many different systems and architectures.
Scaleable: The benchmark should apply to small and large computer systems. It should be possible to scale the benchmark up to larger systems, and to parallel computer systems as computer performance and architecture evolve.
Simple: The benchmark must be understandable, otherwise it will lack credibility.

A counter-example: MIPS (Millions of Instructions Per Second)

Benchmarketing
Wisconsin benchmark history: DeWitt vs. Ellison. The DeWitt clause.
Anon, et al.
Benchmark wars: escalating numbers lead to escalating tricks, wasted time ("just let us tune it as much as Vendor X did")

TPC arose in 1988 (headed by consultant Omri Serlin), consortium of 35 hardware/software companies

define benchmarks for TP and DSS, define cost/performance metrics, provide official audits
Today: TPC-C is the TP benchmark (succeeding A and B), TPC-D is the decision-support benchmark
Focus on as real-world a scenario as possible:

performance from terminals to server and back, as opposed (say) to SPEC. all aspects of the system.
note the overhead of doing this!

e.g. Wisconsin, TPC-D
can backfire for adaptive parts of a system. Optimizers are notoroiusly hard to "benchmark"

a statistical note: don't use average performance! consider variance. if you must give one number, give 90th percentile performance.

Wisconsin (mostly for history)
TPC-C
TPC-D
Set Query: Pat O'Neil's complex query benchmark (shows off Model 204)
007: OODBMS benchmark from Wisconsin and some of the O-vendors
Sequoia: app-specific benchmark for Earth Science from Sequoia project at Berkeley (Stonebraker) and UCSB (earth scientists). Used as an early Object-Relational and GIS benchmark (mostly R-trees and user-defined functions, but also one transitive closure query (!))
Bucky: an Object-Relational benchmark from Wisconsin focusing on structured types (refs and nested sets)
OR-1: a yet-to-be-completed Object-Relational benchmark from Wisconsin, Informix (Stonebraker) and IBM (Carey). Was supposed to fix Bucky with more focus on "what matters to customers".
Gray's Benchmark Handbook now lives on the web