🔬 What's Novel
- Replay bundles as a partial replication primitive for edge computing deployments
- Query routing protocol based on WAL coverage analysis at edge nodes
- Adaptive bundle sizing algorithm optimizing consistency-latency-cost tradeoff
- Continuum framework connecting CDN caching, partial replication, and full replication
🔧 Technical Approach
Phase 1 — Edge Bundle Protocol
Protocol for edge nodes to request and maintain replay bundles: initial transfer, incremental WAL streaming, bundle compaction, and retention policies based on edge storage capacity.
Phase 2 — Query Routing
Router analyzing query dependencies and determining which edge nodes have sufficient WAL coverage. Route to nearest sufficient node or fall back to origin for uncovered queries.
Phase 3 — Consistency Levels
Three levels: "strong" (origin only), "bounded staleness" (edge with recent bundle), "eventual" (any cached result). Client-specified consistency preference per query.
Phase 4 — Adaptive Sizing
Controller adjusting bundle coverage per edge node based on observed query patterns, network costs, and available edge storage. Minimizes origin traffic while satisfying consistency requirements.
🧪 Hypotheses
Replay bundles can serve as a partial replication primitive where edge nodes maintain bounded WAL windows.
Query routing can direct queries to edge nodes with sufficient WAL coverage, reducing origin load.
Adaptive bundle sizing optimizes the consistency-latency-cost tradeoff for real-world edge deployments.
🔗 SkeinDB Integration
📚 Key References
- Nishimura et al. — "MD-HBase: A Scalable Multi-Dimensional Data Infrastructure" (2011)
- Tao et al. — "TAO: Facebook's Distributed Data Store for the Social Graph" (2013)