🔬 What's Novel
- Extended replay bundle format capturing performance-critical state for reproducibility
- Deterministic replay framework for database performance testing with state reconstruction
- Methodology for capturing and reconstructing performance-critical state (LSM levels, cache, concurrency)
- Performance regression detection framework comparing production and test latency distributions
🔧 Technical Approach
Phase 1 — State Identification
Identify performance-critical state beyond WAL: LSM level structure, block cache contents, connection pool state, and compaction queue depth. Determine minimal state for reproducibility.
Phase 2 — Bundle Extension
Extended replay bundle format with: LSM metadata snapshot, cache state approximation (LRU ordering, hot keys), and timing annotations on WAL records for pacing.
Phase 3 — Deterministic Replay
Reconstruct storage state from bundle, warm cache according to captured LRU ordering, and replay queries with timing annotations to reproduce concurrency patterns.
Phase 4 — Regression Detection
Capture bundles from production, replay in test environments with different code versions, compare latency distributions (p50, p99, p999), and alert on statistically significant deviations.
🧪 Hypotheses
Performance-critical state (compaction level, cache hotness, concurrent operations) can be captured in extended replay bundles with manageable overhead.
Deterministic replay reproduces performance characteristics within acceptable variance for regression detection.
Performance replay enables root cause analysis for production performance regressions that are otherwise unreproducible.
🔗 SkeinDB Integration
📚 Key References
- Curtsinger & Berger — "STABILIZER: Statistically Sound Performance Evaluation" (2013)
- Tene et al. — "jHiccup: A Tool for Measuring and Visualizing JVM Pauses" (2013)