All Research Tracks
R14 · Consistency & Distribution

Geo-Distributed Replay Bundles for Edge Caching

SkeinDB's replay bundles are designed for debugging, but they're also a powerful replication primitive. Edge nodes can maintain partial replicas via bounded WAL slices rather than full database copies — creating a continuum between full replicas (consistent but expensive) and CDN caching (cheap but limited), with configurable consistency-latency tradeoffs.

Research Proposal — Mapped to backlog in docs/RESEARCH_BACKLOG.md

🔬 What's Novel

🔧 Technical Approach

Phase 1 — Edge Bundle Protocol

Protocol for edge nodes to request and maintain replay bundles: initial transfer, incremental WAL streaming, bundle compaction, and retention policies based on edge storage capacity.

Phase 2 — Query Routing

Router analyzing query dependencies and determining which edge nodes have sufficient WAL coverage. Route to nearest sufficient node or fall back to origin for uncovered queries.

Phase 3 — Consistency Levels

Three levels: "strong" (origin only), "bounded staleness" (edge with recent bundle), "eventual" (any cached result). Client-specified consistency preference per query.

Phase 4 — Adaptive Sizing

Controller adjusting bundle coverage per edge node based on observed query patterns, network costs, and available edge storage. Minimizes origin traffic while satisfying consistency requirements.

🧪 Hypotheses

H1

Replay bundles can serve as a partial replication primitive where edge nodes maintain bounded WAL windows.

H2

Query routing can direct queries to edge nodes with sufficient WAL coverage, reducing origin load.

H3

Adaptive bundle sizing optimizes the consistency-latency-cost tradeoff for real-world edge deployments.

🔗 SkeinDB Integration

Replay Bundles
Hash-Chained WAL
Cluster Control-Plane
ETag Validators
CDC Changefeed

📚 Key References