R18 — Reproducible Performance Regression Testing

Research Proposal — Mapped to backlog in docs/RESEARCH_BACKLOG.md

🔬 What's Novel

Extended replay bundle format capturing performance-critical state for reproducibility
Deterministic replay framework for database performance testing with state reconstruction
Methodology for capturing and reconstructing performance-critical state (LSM levels, cache, concurrency)
Performance regression detection framework comparing production and test latency distributions

🔧 Technical Approach

Phase 1 — State Identification

Identify performance-critical state beyond WAL: LSM level structure, block cache contents, connection pool state, and compaction queue depth. Determine minimal state for reproducibility.

Phase 2 — Bundle Extension

Extended replay bundle format with: LSM metadata snapshot, cache state approximation (LRU ordering, hot keys), and timing annotations on WAL records for pacing.

Phase 3 — Deterministic Replay

Reconstruct storage state from bundle, warm cache according to captured LRU ordering, and replay queries with timing annotations to reproduce concurrency patterns.

Phase 4 — Regression Detection

Capture bundles from production, replay in test environments with different code versions, compare latency distributions (p50, p99, p999), and alert on statistically significant deviations.

🧪 Hypotheses

Performance-critical state (compaction level, cache hotness, concurrent operations) can be captured in extended replay bundles with manageable overhead.

Deterministic replay reproduces performance characteristics within acceptable variance for regression detection.

Performance replay enables root cause analysis for production performance regressions that are otherwise unreproducible.

🔗 SkeinDB Integration

Replay Bundles

Hash-Chained WAL

LSM / Compaction

Observability

Block Cache

📚 Key References

Curtsinger & Berger — "STABILIZER: Statistically Sound Performance Evaluation" (2013)
Tene et al. — "jHiccup: A Tool for Measuring and Visualizing JVM Pauses" (2013)

← R17 — Migration Intent Inference R19 — Wasm Query Operators →