π¬ What's Novel
- First conflict-free schema evolution protocol for distributed database systems
- MVCC extension for schema versioning alongside data versioning in the same framework
- Automatic schema conversion during query execution across heterogeneous schema versions
- Analysis of schema change compatibility patterns in real-world applications
π§ Technical Approach
Phase 1 β Schema Versioning
Extend MVCC metadata to include schema version. Each row is tagged with the schema version it was written under, enabling multiple schema versions to coexist in storage.
Phase 2 β Concurrent Evolution
Protocol where nodes propose schema changes independently. Non-conflicting changes (add column, add index) apply immediately; conflicting changes (incompatible type changes) queue for resolution.
Phase 3 β Query Adaptation
Handle schema heterogeneity at query time: detect version mismatch between query expectation and row schema, apply automatic conversion, and fail gracefully for truly incompatible cases.
Phase 4 β Merge Protocol
Detect schema divergence across cluster nodes, compute a merged schema that preserves all non-conflicting changes, and propagate the unified schema to all nodes.
π§ͺ Hypotheses
Many schema changes (adding columns, indexes, renaming) can be applied concurrently without conflicts in practice.
MVCC version metadata can track schema version alongside data version without significant storage overhead.
Schema conflicts (incompatible type changes, contradictory constraints) can be reliably detected and reported to administrators.
π SkeinDB Integration
π Key References
- Curino et al. β "Schema Evolution in Wikipedia: Toward a Web Information System Benchmark" (2008)
- Kleppmann β "Schema Evolution in Avro, Protocol Buffers, and Thrift" (2017)