🔬 What's Novel
- Runtime dependency tracking repurposed as the primary signal for index candidate generation
- Cost-benefit model for continuous index adaptation in an online database
- Online index evolution algorithm that creates, modifies, and retires indexes without downtime
- Empirical comparison of automatic dependency-based vs. manual index design
🔧 Technical Approach
Phase 1 — Dependency Collection
Record columns in predicates, key range patterns, join conditions, and ordering requirements from actual query execution. Aggregate patterns across queries weighted by frequency.
Phase 2 — Candidate Generation
Generate index candidates: single-column for equality predicates, composite for multi-column filters, and covering indexes for frequently accessed column subsets.
Phase 3 — Cost-Benefit Analysis
Evaluate each candidate: estimated query speedup (frequency × improvement), write overhead for maintaining the index, and storage cost. Select candidates with net-positive benefit.
Phase 4 — Online Adaptation
Continuously monitor dependency patterns, propose new indexes as workload evolves, retire indexes no longer providing benefit, and handle schema changes gracefully.
🧪 Hypotheses
Runtime dependency tracking captures access patterns that static analysis misses (correlated subqueries, dynamic predicates).
Dependency-based index synthesis achieves better query performance than traditional rule-based index advisors.
Continuous index adaptation maintains near-optimal performance as workloads evolve over time.
🔗 SkeinDB Integration
📚 Key References
- Chaudhuri & Narasayya — "An Efficient Cost-Driven Index Selection Tool (AutoAdmin)" (1998)
- Petraki et al. — "Automatic Index Management for Structured Data" (2015)
- Ding et al. — "AI Meets DB: Future Directions for Self-Driving Database Systems" (2019)