🔬 What's Novel
- Energy model for LSM compaction operations modeling CPU, SSD write amplification, and GC interactions
- Constrained optimization framework for compaction scheduling with performance bounds
- Integration of external signals (power source, electricity pricing, carbon intensity) into database scheduling
- Empirical study of energy-performance tradeoffs in real database workloads
🔧 Technical Approach
Phase 1 — Energy Modeling
Model energy per compaction operation as a function of LSM state and compaction size. Include SSD garbage collection interactions and CPU power states in the model.
Phase 2 — Constraint Specification
Define performance constraints: maximum acceptable read amplification, write amplification, and compaction backlog (space amplification). Energy optimization operates within these bounds.
Phase 3 — Scheduling Algorithm
Predict future compaction needs, estimate energy cost at different scheduling times, and optimize scheduling to minimize total energy while satisfying all performance constraints.
Phase 4 — External Signals
Integrate real-world signals: power source (battery vs. plugged), electricity grid pricing, carbon intensity of the grid, and predicted workload patterns for lookahead scheduling.
🧪 Hypotheses
Compaction timing significantly impacts total energy consumption due to SSD write amplification and CPU utilization patterns.
Deferring compaction to off-peak periods reduces energy costs without causing unacceptable performance degradation.
Energy-aware scheduling with explicit performance constraints provides practical tradeoffs for edge and cloud deployments.
🔗 SkeinDB Integration
📚 Key References
- Harizopoulos et al. — "Energy-Efficient Query Processing in Database Servers" (2008)
- Tsirogiannis et al. — "Analyzing the Energy Efficiency of a Database Server" (2010)