All Research Tracks
R02 Β· Storage & Query Optimization

Adaptive Row-Column Hybrid Execution

Modern workloads mix OLTP and OLAP patterns. SkeinDB's optional column snapshots for analytics lack a formal model for when to materialize them. A principled approach to automatic column projection materialization provides HTAP capabilities without the complexity of separate engines, using SkeinDB's existing cache invalidation and dependency tracking infrastructure for online adaptive decisions.

Research Proposal β€” Mapped to backlog in docs/RESEARCH_BACKLOG.md

πŸ”¬ What's Novel

πŸ”§ Technical Approach

Phase 1 β€” Cost Model

Formalize the cost of column snapshot creation (scan cost, storage overhead) vs. benefit (reduced I/O for projections). Include compaction interaction costs to model the full lifecycle.

Phase 2 β€” Pattern Detection

Query pattern analysis identifying frequently accessed column subsets, scan-heavy queries benefiting from columnar format, and temporal access patterns for adaptive thresholds.

Phase 3 β€” Dependency Integration

Extend dependency tracking to column granularity. Mark affected column snapshots for incremental refresh or invalidation on row-level updates, avoiding full recomputation.

Phase 4 β€” Adaptive Materialization

Continuous controller evaluating materialization decisions based on recent query patterns, resource availability, and cost-benefit thresholds. No offline workload analysis required.

πŸ§ͺ Hypotheses

H1

Query pattern analysis can predict which column projections offer the highest benefit-to-cost ratio for materialization.

H2

Dependency tracking can maintain column snapshot consistency with minimal overhead compared to full invalidation.

H3

Adaptive materialization decisions can be made online without offline workload analysis, adapting to changing access patterns.

πŸ”— SkeinDB Integration

ValueID Store
Dependency Tracking
LSM / Compaction
SkeinQL RPC
Column Snapshots

πŸ“š Key References