All Research Tracks
R16 · Developer Experience & Tooling

Automatic Index Synthesis from Dependency Analysis

SkeinDB's dependency tracking knows which key ranges and indexes queries touch. This runtime information captures actual access patterns more accurately than static EXPLAIN analysis. By inverting the dependency relationship — from "what does this query depend on" to "what indexes would benefit this query" — SkeinDB can automatically synthesize optimal indexes.

Research Proposal — Mapped to backlog in docs/RESEARCH_BACKLOG.md

🔬 What's Novel

🔧 Technical Approach

Phase 1 — Dependency Collection

Record columns in predicates, key range patterns, join conditions, and ordering requirements from actual query execution. Aggregate patterns across queries weighted by frequency.

Phase 2 — Candidate Generation

Generate index candidates: single-column for equality predicates, composite for multi-column filters, and covering indexes for frequently accessed column subsets.

Phase 3 — Cost-Benefit Analysis

Evaluate each candidate: estimated query speedup (frequency × improvement), write overhead for maintaining the index, and storage cost. Select candidates with net-positive benefit.

Phase 4 — Online Adaptation

Continuously monitor dependency patterns, propose new indexes as workload evolves, retire indexes no longer providing benefit, and handle schema changes gracefully.

🧪 Hypotheses

H1

Runtime dependency tracking captures access patterns that static analysis misses (correlated subqueries, dynamic predicates).

H2

Dependency-based index synthesis achieves better query performance than traditional rule-based index advisors.

H3

Continuous index adaptation maintains near-optimal performance as workloads evolve over time.

🔗 SkeinDB Integration

Dependency Tracking
LSM / Compaction
SkeinQL RPC
Index Advisor
Web Admin

📚 Key References