Research Backlog (Adapted from January 2026 agenda)¶
This backlog turns the 20 research proposals in docs/RESEARCH_AGENDA.md into Codex-friendly, PR-sized tasks, mapped onto the existing Phase A–G build plan and the current docs/PROJECT_BACKLOG.md task numbering.
Notes: - These items are research-oriented: the goal is to make each direction implementable and measurable. - Tasks are designed to be optional and do not block core MySQL compatibility.
Reality sync (2026-05-27)¶
This file is the research task inventory. It is not the best place to read current maturity at a glance.
- Runtime truth: all R01-R20 tracks have executable coverage in code, methods, tests, or benchmark scaffolds.
- Current maturity split: R01-R17 and R20 are hardened; R18 and R19 remain prototype implemented.
- Checklist state: 109 done / 0 open research checkboxes. The checklist is complete, but some tracks still remain prototype-strength in runtime maturity.
- Current source of truth for implemented-vs-partial status:
docs/TRUE_STATUS_MATRIX.md.
Current partial research areas¶
| Track | Truth today | Remaining gap |
|---|---|---|
| R18 Perf regression replay | Replay bundles can carry performance profiles, deterministic replay can rehydrate cache hints, and variance reports exist. | Timing injection, stronger cache/LSM reconstruction fidelity, and CI distribution gates remain open. |
| R19 Wasm query operators | wasm.plan.compile/run/inspect/perf_report/edge_package exists with generated fixed-width artifacts and host fallback. |
Production SIMD-lowered codegen and broader hardened operator coverage are not yet claimed. |
Recent verified closures¶
- 2026-05-17: R02 adaptive row/column hardening was finished across T103-T106, covering live-row-count cost modeling, hot-projection tracking, dependency-driven snapshot invalidation, and adaptive replacement decisions.
- 2026-05-17: R14 bounded-staleness bundle windows were closed with retained coverage-gap detection in
edge.bundle.status. - 2026-05-17: R16 workload-shift evaluation was closed with phased convergence reporting in
advisor.evaluate. - 2026-05-17: R18 deterministic replay reconstruction was closed with replay cache-hint rehydration and normalized replay-run checksums.
Use the task sections below for the detailed per-track inventory and evidence notes.
Mapping table (proposal → repo)¶
| ID | Proposal | Priority (agenda) | Primary repo specs / files | Backlog tasks |
|---|---|---|---|---|
| 1 | Learned Index Structures for ValueID Lookup | P2 | docs/research_agenda/R01_* |
Phase 24 (T230–T235) |
| 2 | Adaptive Row-Column Hybrid Execution | — | docs/COLUMN_SNAPSHOTS.md + docs/research_agenda/R02_* |
Extend Phase 10 (T103–T106) |
| 3 | Delta-Chain Topology Optimization | P1 | docs/DELTA_VALUES.md + docs/research_agenda/R03_* |
Extend Phase 7 (T073–T076) |
| 4 | Differentially Private Aggregate Queries | P0 | docs/research_agenda/R04_* |
Phase 25 (T240–T246) |
| 5 | Oblivious Query Execution | — | docs/research_agenda/R05_* |
Phase 26 (T250–T256) |
| 6 | Forensic Query Language for Hash-Chained WAL | P1 | docs/AUDIT_WAL.md + docs/research_agenda/R06_* |
Phase 27 (T260–T266) |
| 7 | Optimistic Concurrency with Client-Side Merge Functions | P1 | docs/WASM_UDFS.md + docs/ETAG_VALIDATORS.md + docs/research_agenda/R07_* |
Phase 28 (T270–T276) |
| 8 | Incremental View Maintenance via Dependency Graphs | P0 | docs/ETAG_VALIDATORS.md + docs/CDC_CHANGEFEED.md + docs/research_agenda/R08_* |
Phase 29 (T280–T287) |
| 9 | HTTP/3 and QUIC-Native DB Protocol | P2 | docs/research_agenda/R09_* |
Phase 30 (T290–T295) |
| 10 | Vector Embeddings as First-Class ValueIDs | P0 | docs/research_agenda/R10_* |
Phase 31 (T300–T307) |
| 11 | LLM-Assisted Query Autoparameterization | — | docs/AUTOPARAMETERIZATION.md + docs/research_agenda/R11_* |
Extend Phase 22 (T215–T218) |
| 12 | Natural Language to SkeinQL with Verification | — | docs/research_agenda/R12_* |
Phase 32 (T310–T316) |
| 13 | Causal Consistency via ETag Chains | P0 | docs/ETAG_VALIDATORS.md + docs/research_agenda/R13_* |
Extend Phase 6 (T064–T067) |
| 14 | Geo-Distributed Replay Bundles for Edge Caching | P2 | docs/TIME_TRAVEL_REPLAY.md + docs/research_agenda/R14_* |
Extend Phase 19 (T185–T188) |
| 15 | Conflict-Free Schema Evolution | — | docs/research_agenda/R15_* |
Phase 33 (T320–T326) |
| 16 | Automatic Index Synthesis from Dependency Analysis | P1 | docs/INDEX_ADVISOR.md + docs/research_agenda/R16_* |
Extend Phase 18 (T175–T179) |
| 17 | Query Intent Inference for Compatibility Migration | — | docs/TELEMETRY_AND_MIGRATION.md + docs/research_agenda/R17_* |
Extend Phase 11 (T114–T118) |
| 18 | Reproducible Performance Regression Testing | — | docs/TIME_TRAVEL_REPLAY.md + docs/research_agenda/R18_* |
Extend Phase 19 (T189) |
| 19 | WebAssembly-Native Query Operators | P2 | docs/WASM_UDFS.md + docs/research_agenda/R19_* |
Extend Phase 8 (T084–T087) |
| 20 | Energy-Aware Compaction Scheduling | — | docs/COMPACTION_SCHEDULER.md + docs/research_agenda/R20_* |
Extend Phase 21 (T204–T207) |
Task definitions (new additions)¶
Phase 24 — Learned indexes for ValueID lookup (R01)¶
- [x] T230: Instrument ValueID lookup distribution + export histograms
- [x] T231: Prototype learned model index (offline build) with fallback structure
- [x] T232: Integrate hybrid learned+fallback lookup into ValueStore read path (feature flag). Evidence:
ValueStoreConfig.enable_learned_index,ValueStore::get_with_trace, andlearned_index_lookup_hits/learned_index_falls_back_for_new_keys. - [x] T233: Compaction-time model refresh policy + correctness tests. Evidence:
ModelRefreshPolicy,ValueStore::should_refresh,maybe_refresh,refresh_learned_index, insert-triggered refresh checks, anddistribution_shift_triggers_refresh. - [x] T234: Benchmark harness: lookup p50/p99/p99.9 + memory overhead. Evidence:
ValueStore::benchmark()andbenchmark_reports_quantiles. - [x] T235: Distribution shift tests + graceful degradation. Evidence:
ValueIdLookupDistribution::model_shift_l1,learned_index_falls_back_for_new_keys, anddistribution_shift_triggers_refresh.
Phase 25 — Differential privacy aggregates (R04)¶
- [x] T240: Add SkeinQL aggregate nodes (COUNT/SUM/AVG) with explicit DP parameters (experimental). Evidence:
dp.aggregate,DpAggregateSpec, COUNT/SUM/AVG result columns, explicit epsilon/delta/mechanism/principal/seed fields, anddp_budget_consumption_and_exhaustion/dp_aggregate_deterministic_noisetests. - [x] T241: Sensitivity analysis for single-table aggregates (bounded domains). Evidence:
resolve_dp_aggregates, boundedDpBoundsrange sensitivities for SUM/AVG/percentile, count sensitivity 1.0, privacy metadata per aggregate, and focused assertions indp_budget_consumption_and_exhaustion. - [x] T242: Privacy budget manager (per user/role) + persistence. Evidence:
dp.budget.set,dp.budget.get,DpBudgetDiskv2 persistence indp_budgets.json, refresh-window resets, RDP query counts, and restart assertions indp_budget_consumption_and_exhaustion. - [x] T243: Noise mechanisms (Laplace / Gaussian policy) + deterministic tests (seeded RNG). Evidence:
DpRng,dp_laplace_noise,dp_gaussian_noise, mechanism validation, seeded deterministic Laplace/Gaussian coverage, andr04_dp_rng_deterministic_and_uniform/r04_dp_laplace_noise_has_correct_scale/dp_aggregate_deterministic_noisetests. - [x] T244: Privacy-aware caching rules (ETag includes privacy metadata). Evidence:
privacy_etagindp.aggregateprivacy output, derived from a v1 DP validator payload containing table version, query fingerprint, epsilon/delta, mechanism, principal, seed, and budget metadata; locked bydp_budget_consumption_and_exhaustion. - [x] T245: Audit log entries for DP queries (budget consumption). Evidence:
DpAuditEvent,dp.audit.log, persisteddp_audit.json, budget remaining epsilon/delta in events, usage summaries indp.budget.get, and restart assertions indp_budget_consumption_and_exhaustion. - [x] T246: Evaluation harness: accuracy vs epsilon, overhead vs baseline. Evidence:
dp.evaluate,DpEvaluateParams/DpEvaluateResult, exact baseline rows, seeded epsilon-grid trials, mean/p95/max absolute error, mean relative error, noisy latency, overhead-vs-exact metrics, SkeinAdmin Privacy controls, anddp_evaluate_reports_accuracy_and_overhead/skeinadmin_privacy_panel_exposes_dp_evaluation_harnesstests.
Phase 26 — Oblivious query execution (R05)¶
- [x] T250: Threat model doc + “obliviousness levels” policy schema. Evidence:
docs/OBLIVIOUS_EXECUTION.md,ObliviousPolicy,oblivious.policy.set,normalize_oblivious_policy, and persistedObliviousPolicyDiskv1. - [x] T251: ValueStore lookup padding + dummy reads (table/column policy). Evidence:
oblivious_padding_for,oblivious_dummy_lookups,compute_oblivious_padding, andoblivious_scan_keeps_results. - [x] T252: Oblivious scan primitive (fixed-size batches, padding). Evidence:
scan_tableapplies deterministic padding/shuffle before returning real rows unchanged, locked byoblivious_policy_explain_paddingandoblivious_scan_keeps_results. - [x] T253: Oblivious sort/join primitive (limited scope, research mode). Evidence:
oblivious.explain/oblivious.evaluatereportmaterialize_then_sort_joinfor padded policies and expose target/dummy access envelopes for fixed-size inputs. - [x] T254: Leakage evaluation harness (trace-based, mutual information metrics). Evidence:
oblivious.evaluate,ObliviousEvaluateResult, empirical mutual-information metrics, and engine/RPC assertions comparing padded vs unpadded traces. - [x] T255: Performance overhead report generator. Evidence:
oblivious.evaluateperformance payload with mean/max overhead ratio, total dummy rows/lookups, total observed accesses, and integration coverage inr05_oblivious_padding_verification. - [x] T256: Admin UI settings for per-table obliviousness levels. Evidence: SkeinAdmin Privacy R05 controls for level/pad/target/dummy/shuffle/trace rows,
oblEvaluate(), andskeinadmin_privacy_panel_exposes_dp_evaluation_harnessasset coverage.
Phase 27 — Forensic query language (R06)¶
- [x] T260: Define SkeinForensic query grammar (minimal) + JSON form over SkeinQL. Evidence:
ForensicQueryParams.filter,forensic_filter_matches, operatorsand/or/not/eq/ne/gt/ge/lt/le/contains, typed-literal operands, field equality shorthand, docs indocs/AUDIT_WAL.md, and focused engine/RPC tests. - [x] T261: Build verifiable WAL index (time/table/user) consistent with hash chain. Evidence:
forensic_index_summaryemits timestamp/id ranges,by_table,by_op, andby_actorbuckets tied to the returned chain/proof; actor remainsunknownuntil authenticated principal metadata is recorded. - [x] T262: Proof format for inclusion + boundary proofs; verifier tool. Evidence:
skein.forensic.proof.v1, boundarypreceding_hash/following_hash,forensic_merkle_root,forensic_merkle_proof, per-recordinclusion_proofs, andforensic.verifytamper detection. - [x] T263:
forensic.querySkeinQL endpoint + exportable report bundles. Evidence: JSON-RPC dispatch, capability advertising,skein.forensic.bundle.v1query manifest/proof/verification export shape, and RPC roundtrip coverage. - [x] T264: Incremental verification via checkpoint anchors. Evidence: persisted
CheckpointAnchorrecords and proof fieldscheckpoint_anchor,next_checkpoint_anchor, andanchor_count, with engine coverage aftercheckpoint_for_shutdown(). - [x] T265: Case-study harness: simulated incident timelines + proofs. Evidence:
forensic_case_study_exports_incident_timelinecovers non-contiguous filtered incident timelines, inclusion proofs, and export-bundle verification strategy. - [x] T266: SkeinAdmin “Forensics” page (query + verify + export). Evidence: DB/table/op/id/bundle/filter controls,
readForensicParams, proof verify now queries then callsforensic.verifywith returned records/start hash, export includes bundle/filter params, and static asset coverage.
Phase 28 — Merge functions for optimistic concurrency (R07)¶
- [x] T270: Conflict model (write-write, constraint, dependency) + detection hooks. Evidence:
merge.applyhandlesexpected_etag,min_causality, primary-key mismatch, and non-null constraint failures; focused tests cover conflict and non-null rejection paths. - [x] T271: Merge function registry (Wasm) + capability model ("values-only" access). Evidence:
merge.wasm.*, persistedmerge_wasm_registry.jsonv1,validate_merge_wasm_policy, and executable values-only scalar Wasm merge modules. - [x] T272: SkeinQL
merge.register/merge.apply+ SQL compat hook (If-Match). Evidence: typed SkeinQL params/results, RPC dispatch/capability advertising, ETag/min-causality merge guards, andmerge_apply_wasm_policy_executes_rpc. - [x] T273: Offline write queue format (client SDK spec) + merge result handling. Evidence:
docs/OFFLINE_WRITE_QUEUE.mdpluscrates/skeindb-skeinql/tests/offline_queue_roundtrip.rs. - [x] T274: Safety tests: cancellation + deterministic merges. Evidence:
merge_apply_wasm_policy_cancels_non_terminating_module,merge_apply_wasm_policy_executes_values_only_module, andcluster_rpc.rs::r07_merge_conflict_resolution_deterministic. - [x] T275: Bench: conflict rate + resolution success on example workloads. Evidence: read-only
merge.evaluatereturnsskein.merge.evaluate.v1with conflict/resolution rates, mean/p95 timing, and per-case results. - [x] T276: SkeinAdmin "Merge rules" page. Evidence: Merge & CRDT panel now sends typed apply/register/simulate/evaluate/Wasm payloads and
skeinadmin_merge_panel_exposes_r07_hardening_controlslocks the controls.
Phase 29 — Incremental view maintenance (R08)¶
- [x] T280:
view.createSkeinQL method with persisted definition (SkeinIR). Evidence:view.create,views.jsonformat v2, restart persistence, andview_dependency_usage_persists_in_views_json. - [x] T281: Dependency graph extension: view → base table deps at column granularity. Evidence: dependency objects include
columns,projection_columns,predicate_columns,group_by_columns, andview.explain_depstraverses direct/transitive graph edges. - [x] T282: Delta derivation for a restricted operator set (filter, project, group-by). Evidence: restricted single-table filter/project views, grouped aggregate plans for grouped columns plus
COUNT/SUM/AVG/MIN/MAX, and grouped source-row persistence. - [x] T283: Incremental refresh pipeline (apply deltas from CDC stream). Evidence:
refresh_view_incremental,refresh_grouped_view_incremental, change-log driven stale marking, touched-row/touched-group recompute, and restart coverage. - [x] T284: Cost-based switch: incremental vs full recompute. Evidence:
view.refreshmode:"auto",view_refresh_stats,should_refresh_view_full, andview_grouped_auto_refresh_prefers_full_for_wide_change_sets. - [x] T285: Correctness oracle: compare incremental vs recompute on random workloads. Evidence:
view.evaluatecompares cloned incremental and full refresh results andr08_view_correctness_oracle_random_workload_matches_full_recomputeruns a deterministic pseudo-random workload. - [x] T286: Bench: view maintenance overhead + query speedups. Evidence:
view.evaluatereturns mean incremental/full nanoseconds, speedup-vs-full, pending changes, touched primary keys, and recommended mode. - [x] T287: SkeinAdmin “Views” page (status, refresh, explain deps). Evidence: Views panel now exposes refresh mode, evaluate iterations,
view.evaluate, status, drop, dependency summaries, RPC templates, and static asset coverage.
Phase 30 — HTTP/3 / QUIC-native protocol (R09)¶
- [x] T290: Protocol sketch: SkeinQL-over-QUIC framing + stream mapping. Evidence:
docs/TRANSPORT_QUIC.mddefines length-prefixed JSON frames, one request/response per bidirectional stream, envelope metadata, and stream mapping. - [x] T291: Implement server prototype with a QUIC library (feature-flag). Evidence:
skeindb serve --quic --quic-cert --quic-key, Quinn integration,transport.capabilities, andquic_rpc_ping_roundtrip. - [x] T292: Prepared query handles over QUIC streams (read-only first). Evidence:
quic_prepared_query_roundtripprepares a query, executes it on a new QUIC stream, and verifies result rows. - [x] T293: 0-RTT safety rules (no writes in 0-RTT by default). Evidence: QUIC metadata
rtt:"0rtt"is documented andquic_zero_rtt_rejects_writelocks the read-only guard. - [x] T294: Bench: p99 latency under concurrency vs HTTP/2 and MySQL/TCP. Evidence:
skeindb transport-bench,transport_bench_parses_flags,latency_stats_computes_percentiles, andtransport_bench_reports_http2_quic_and_mysql. - [x] T295: Connection migration test harness (simulated IP change). Evidence:
quic_connection_migration_rebindandr09_quic_concurrent_multi_stream_rpcsrebind the client UDP socket and verify continued RPC success.
Phase 31 — Vector embeddings as first-class ValueIDs (R10)¶
- [x] T300: Add
ValueKind::Embeddingand typed literal support in SkeinQL. Evidence:ValueKind::Embedding,Lit::Embedding { dims, v, model },embedding_literal_roundtrip, and embedding persistence throughvalue_store_item. - [x] T301: LSH bucket + content hash ValueID scheme (exact + approximate id). Evidence:
embedding_lsh_bucket,embedding_value_id,value_store_itemusingValueKind::Embedding, andembedding_value_id_combines_lsh_bucket_and_content_hash. - [x] T302: ANN search operator (bucket filter + distance refine) (baseline). Evidence:
vector.search, HNSWHnswIndex, LSH fallback/refinement,r10_hnsw_index_basic,vector_prefilter_candidates_match_bucket, andquic_vector_search_roundtrip. - [x] T303: Hybrid query: filter predicates + ANN order-by. Evidence: vector filters in
vector.search,vector.cosine/vector.dot/vector.l2order expressions, andquery_select_orders_by_vector_similarity. - [x] T304: Dependency tracking for embedding-derived queries (invalidate on source change). Evidence:
vector.search.cachereturns table-version dependencies, ETags,not_modified, and V2 causality tokens; ordinarydata.*writes invalidate stale HNSW graphs;vector.insertemits prepared-query SSE invalidations; covered byvector_search_cache_metadata_tracks_source_table_changesandvector_search_cache_invalidates_after_vector_insert_rpc. - [x] T305: Bench harness: recall/latency vs baseline index. Evidence:
vector.benchmark,VectorBenchmarkResultlatency percentiles, exact-vs-HNSW recall@k,vector_benchmark_reports_recall_and_latency, andquic_vector_search_roundtriplive RPC coverage. - [x] T306: Example app: small RAG retrieval pipeline. Evidence:
samples/vector_rag_pipeline.pyseeds deterministic chunks, usesvector.insert/vector.search, assembles a grounded prompt, and exposes a no-server--self-test;docs/tutorials/vector-rag.md, the generated docs guide page, andsample_assets.rs/docs_site_assets.rslock the tutorial and sample flow. - [x] T307: SkeinAdmin “Embeddings” page (index status + query playground). Evidence: the Vector Search panel exposes DB/table/column/vector/PK/filter controls plus Search/Benchmark/Insert/Index Status actions, and
skeinadmin_vectors_panel_exposes_embedding_query_playgroundlocks the UI wiring.
Phase 32 — Natural language to SkeinQL with verification (R12)¶
- [x] T310: NL→SkeinQL prompt+schema packaging format (offline first). Evidence:
AiNlPromptPackage,build_nl_prompt,ai.nl.translate, andai_nl_translate_packages_schema/ai_nl_translate_explain_execute_roundtrip. - [x] T311: Query explanation generator from dependency sets + planner info. Evidence:
ai.nl.explain,dependencies_for_query,ai_nl_preview, and explanation fields for tables/projection/filters/order/limit/deps. - [x] T312: Verification UI flow: explanation + sample rows + approval gate. Evidence: SkeinAdmin NL Lab query JSON / preview / approval-token controls plus
ai_nl_translate_explain_execute_rpc_roundtrip. - [x] T313: Safety policy: forbid writes unless explicit confirmation token. Evidence: the SkeinQL
Queryshape is read-query-only for this surface,ai.nl.explainonly accepts SELECT,ai.nl.executerecomputes the approval token from query+args+deps, and tampered-query execution is rejected inai_nl_translate_explain_execute_roundtrip. - [x] T314: Evaluation harness: adapted text-to-SQL benchmarks (execution match). Evidence:
skeindb nl-eval,NlEvalReport.execution_matches,eval_examples_exact_and_exec_match, andeval_examples_uses_rule_translation_for_execution_match. - [x] T315: Iterative refinement protocol (user feedback loop). Evidence: prompt packages include stable
fingerprint, editable generated query JSON, explicit args, preview re-explain, and reapproval before execution in SkeinAdmin. - [x] T316: SkeinAdmin “NL Query” page (experimental). Evidence: NL Lab (R11-R12) panel wires
ai.nl.translate,ai.nl.explain, andai.nl.executewith approval-token-gated execution.
Phase 33 — Conflict-free schema evolution (R15)¶
- [x] T320: Schema version tagging in MVCC row versions. Evidence:
RowEntry.schema_version/RowEntryDisk.schema_version, v3 table-row payloads with legacy v2 normalization on load, merge/apply row stamping, andschema_version_tags_row_entries_and_normalizes_legacy_rows. - [x] T321: Concurrent schema changes protocol (add column/index) + conflict detection
- [x] T322: Query execution across schema heterogeneity (safe conversions). Evidence: schema-aware row materialization now adapts heterogeneous legacy rows via MySQL-compatible column defaults or
NULLacross batch/non-batch selects, keyed reads, joins, andquery_select_adapts_legacy_rows_to_schema_defaults. - [x] T323: Schema merge algorithm + roll-forward/rollback rules. Evidence:
schema.apply_mergenow rolls forward eligible proposals, marks deterministic losersrejected, returnsrolled_backconflict details, and clears resolved losers from laterschema.merge_statusoutput inschema_evolution_merges_concurrent_column_and_index_changesandr15_schema_evolution_concurrent_column_and_index_changes. - [x] T324: Migration assistant: show divergence + propose resolution. Evidence:
schema.merge_statusnow returns a structuredresolutionplan withroll_forward,rollback, andwaitactions plus caller-facing suggestions inschema_merge_status_proposes_resolution_actionsandr15_schema_evolution_concurrent_column_and_index_changes. - [x] T325: Rolling-deploy simulation harness. Evidence:
schema.simulate_rolloutnow projects prepare/mixed/steady-state rollout stages, per-wave upgraded/legacy node counts, and legacy-row adaptation notes inschema_simulate_rollout_reports_mixed_version_wavesandr15_schema_evolution_concurrent_column_and_index_changes. - [x] T326: SkeinAdmin “Schema evolution” page. Evidence: the Schema panel now exposes typed
schema.propose_change,schema.merge_status,schema.simulate_rollout, andschema.apply_mergecontrols plus focused asset coverage inskeinadmin_schema_panel_exposes_r15_evolution_controls.
Extensions to existing phases (inline patches)¶
The following additions extend existing phases in docs/PROJECT_BACKLOG.md.
Extend Phase 6 — Causal ETag chains (R13)¶
- [x] T064: Define causal ETag format (compressed dependencies / vector-clock hybrid). Evidence: the runtime emits
vector_clock_v2tokens,CAUSALITY_FORMAT_V2is covered byr13_vector_clock_causality, and the wire format is documented indocs/ETAG_VALIDATORS.md/docs/SKEINQL.md. - [x] T065:
min_causalityrequest field + response causality propagation rules. Evidence:query.select,query.execute_prepared,merge.apply, andvector.searchacceptmin_causalityand returncausality; covered byquery_select_min_causality_enforcedandquery_execute_prepared_honors_causal_cache_validators. - [x] T066: Replication propagates causality metadata (no total order required). Evidence: replicated writes ship
x-skeindb-replication-causality, replicas merge the applied watermark intocluster.status/stats.snapshot.cluster.replication, andreplicated_writes_include_causality_headerpluscluster_replication_ships_schema_and_rowslock the behavior. - [x] T067: Cache interaction tests (If-None-Match with causal validators). Evidence:
query_select_min_causality_enforcedandquery_execute_prepared_honors_causal_cache_validatorsverifyif_none_match+min_causalityon satisfied and unsatisfied dependency floors.
Extend Phase 7 — Delta topology optimization (R03)¶
- [x] T073: Implement periodic full snapshots for deltas (K-depth policy). Evidence:
DeltaPolicy.snapshot_interval,ValueStore::put_with_delta, anddelta_snapshot_interval_enforces_raw. - [x] T074: Skip-pointer (skip-list) delta chain encoding. Evidence:
SkipPatch, geometricbuild_skip_patches,materialize_with_trace, andskip_patches_reduce_steps. - [x] T075: Compaction-time topology restructuring policy. Evidence:
ValueStore::compact_deltas,DeltaCompactionReport, anddelta_compaction_rewrites_deep_chains. - [x] T076: Bench: reconstruction latency vs write amplification. Evidence:
ValueStore::delta_benchmark()p50/p99/p99.9 steps, topology byte/savings metrics, andtopology_analysis()depth/fanout reports.
Extend Phase 8 — Wasm query operators (R19)¶
- [x] T084: Wasm operator ABI (columnar batches) + data interchange format. Evidence:
docs/WASM_OPERATORS.md,wasm.plan.compile/run,wasm_batch_v1result format,wasm.plan.inspect, andengine::tests::wasm_plan_compile_and_run. - [x] T085: Compile a restricted plan subset to Wasm (filter/project). Evidence: fixed-width non-null
u64/boolfilter/project plans now emitgenerated_filter_project_v1artifacts with embedded module metadata,wasm.plan.runexecutes those artifacts through Wasmtime while unsupported plans fall back tohost_interpreted_v1, and focused coverage lives inengine::tests::wasm_plan_compile_and_run,engine::tests::wasm_plan_compile_falls_back_for_unsupported_types,server::tests::wasm_plan_compile_run_rpc,server::tests::wasm_plan_run_batch_rpc, andtests/quic_rpc.rs::quic_wasm_plan_run_batch. - [x] T086: Wasm SIMD exploration + perf tests. Evidence:
wasm.plan.perf_report,WasmPlanPerfReportResult, latency min/p50/p95/p99/mean stats for host and generated execution, output-parity checks, explicitWasmPlanSimdExplorationcandidate/enabled/notes fields, and focused engine/RPC tests. This is an exploration baseline;supports_simdremains false until a future production SIMD-lowered codegen path exists. - [x] T087: Edge runtime packaging (ship plan artifact). Evidence:
wasm.plan.edge_packagenow emitsstandalone_executionmetadata and a packaged JavaScript runner withrunSkeinWasmPlanEdgefor generated artifacts, localskein.wasm.batch.v1encode/decode helpers,WebAssembly.instantiateexecution for embeddedgenerated_filter_project_v1modules, andrunSkeinWasmPlanHostfallback forhost_interpreted_v1artifacts.
Extend Phase 10 — Adaptive row/column execution (R02)¶
- [x] T103: Column snapshot cost model (build vs benefit). Evidence:
SnapshotManager::next_planalready compares snapshot build cost against projected row-scan savings, andobserve_and_plan_snapshotnow feeds live table row counts so selective probes do not underestimate full snapshot build cost. Covered byengine::tests::snapshot_cost_model_uses_live_table_rows_for_selective_queries. - [x] T104: Query pattern detector for hot projections. Evidence:
QueryPatternTrackernow retains a bounded per-table hot set of normalized column patterns, and only replaces colder projections when a hotter candidate arrives. Covered byengine::tests::query_pattern_tracker_retains_hot_projectionsandengine::tests::pattern_tracker_and_cost_model. - [x] T105: Dependency-driven incremental refresh/invalidation for snapshots. Evidence: schema-version bumps now preserve unaffected snapshots, column renames rewrite dependent snapshot/pattern metadata, and column drops invalidate only the dependent snapshots/patterns. Covered by
engine::tests::snapshot_dependencies_preserve_unrelated_snapshot_on_drop_column. - [x] T106: Adaptive controller (online materialization decisions). Evidence:
SnapshotManager::next_plannow compares candidate materializations against the best active covering snapshot instead of treating every covering snapshot as a permanent blocker, allowing narrower hot projections to replace broader ones when the online benefit stays positive. Covered byengine::tests::adaptive_snapshot_controller_replaces_broad_covering_snapshot_when_shifted.
Extend Phase 11 — Intent inference for migration (R17)¶
- [x] T114: Pattern library for common MySQL idioms (pagination, polling, soft deletes). Evidence:
detect_migration_intents,detect_pagination_signal,detect_polling_signal,detect_soft_delete_signal, hierarchy/EXISTS/COALESCE detectors, and focusedmigration_intent_report_*tests. - [x] T115: Sequence-level intent detection (multi-query patterns). Evidence:
detect_polling_signal, increasing-value correlation inpolling_values, persistedintent_history,window_msfiltering, andmigration_intent_report_detects_polling_and_soft_delete. - [x] T116: Intent → SkeinQL mapping (cursor API, CDC subscribe, etc.). Evidence:
rewrite_preview_from_suggestion,rewrite_snippets_for_intent,migration.rewrite_preview, and rewrite tests for pagination, EXISTS, self-join hierarchy, and recursive CTEs. - [x] T117: SkeinAdmin “Migration assistant” page. Evidence: the Migration (R17) panel wires
migration.intent_report,migration.rewrite_preview, andmigration.report_export, renders rewrite cards, and exports migration reports fromweb/skeinadmin/src/main.js. - [x] T118: Offline report exporter (JSON + markdown). Evidence:
migration.report_export,MigrationReportExportResult.report_json, Markdown rendering inmigration_report_markdown, andmigration_report_export_contains_json_and_markdown.
Extend Phase 18 — Index synthesis from dependency analysis (R16)¶
- [x] T175: Dependency capture: predicate columns + range shapes + order-by needs. Evidence:
AdvisorIndexDependencyCaptureis returned onadvisor.index_synthesizesuggestions with predicate/equality/range/order/group/join/projection columns andrange_shape; covered byindex_advisor_synthesizes_candidateandindex_advisor_extracts_group_and_like. - [x] T176: Candidate generator for covering + composite indexes (from deps). Evidence:
advisor_candidate_columns_from_dependencybuilds equality/join/range/group/order composite keys,advisor_covering_include_from_dependencyemits projection-derived include columns, andindex_advisor_generates_composite_and_covering_from_dependencieslocks the behavior. - [x] T177: Cost/benefit model includes write overhead + compaction overhead. Evidence:
AdvisorIndexCostEstimatereports read benefit, write overhead, compaction overhead, write pressure, key/include width, and net score; covered byindex_advisor_cost_model_includes_write_and_compaction_overhead. - [x] T178: Index retirement (unused) + safety rules. Evidence:
advisor.retire_unused, dry-run/stale-signal safety, actionretirehistory entries, latest-action suppression rules, and retirement tests for inactive vs recently used advisor indexes. - [x] T179: Evaluation harness: adaptation after workload shifts. Evidence:
advisor.evaluatereturnsskein.advisor.evaluate.v1with phased top-suggestion convergence metrics; covered byadvisor_evaluate_reports_shift_convergence,advisor_evaluate_roundtrip_reports_shift_convergence, andr16_index_advisor_evaluate_reports_shift_convergence.
Extend Phase 19 — Edge replay bundles + performance replay (R14, R18)¶
- [x] T185: Replay bundle redaction policies (privacy-safe export). Evidence:
MaintenanceReplayExportParams.redaction, optionalReplayBundle.redaction,hash_pk/drop_pkprimary-key redaction before checksums, SkeinAdmin/CLI controls, and replay import/run coverage for redacted bundles. - [x] T186: Geo-distributed “bundle windows” + routing rules (bounded staleness). Evidence:
edge.bundle.applynow preserves disjoint coverage windows per table while merging adjacent/overlapping ranges only, andedge.bundle.statusrejects bounded-staleness routes with reasoncoverage_gapwhen contiguous coverage is incomplete; covered byedge_bundle_status_detects_coverage_gap,edge_bundle_status_reports_coverage_gap, andr14_edge_bundle_gap_blocks_bounded_staleness_route. - [x] T187: Performance bundle extensions (LSM state, cache warm hints, timing annotations). Evidence: optional
ReplayBundle.performance,ReplayBundlePerformanceProfile, storage/cache/timing sections, checksum validation, andreplay_bundle_export_import_run_roundtrip. - [x] T188: Deterministic performance replay runner + variance report. Evidence:
maintenance.replay.runnow rehydrates captured select/patch cache counts inside the replay workspace, computes replay-run checksum parity over reconstructable snapshot state, and keeps raw disk/WAL/cache/timing deltas inperformance_report; covered byreplay_bundle_run_rehydrates_cache_hints,maintenance_replay_run_rehydrates_cache_hints, andt188_replay_run_rehydrates_cache_hints. - [x] T189: Regression CI harness: compare latency distributions across commits. Evidence:
skeindb replay run --json --out,skeindb replay compare --baseline --candidate, threshold flags for p95/p99/span/storage/cache-hot-table deltas, JSON comparison reports, non-zero exit on regressions, and focused CLI tests.
Extend Phase 21 — Energy-aware compaction (R20)¶
- [x] T204: Energy model instrumentation (CPU + IO estimate; optional external signals). Evidence:
CompactionEnergyConfig,CompactionEnergyRuntime,estimate_compaction_energy, andstats.snapshot.compaction.scheduler.energy. - [x] T205: Constrained scheduler (energy minimization subject to latency/space bounds). Evidence:
CompactionPolicyKind::EnergyAware, slack/constraint scoring incollect_compaction_runtime, and safe-mode override preserving hard L0 limits. - [x] T206: External signal integration (battery/plugged, time-of-use pricing hooks). Evidence:
maintenance.compaction.set_policyacceptsexternal_signals, persistscompaction.energy.*, and SkeinAdmin exposes power/price/carbon controls. - [x] T207: Evaluation harness: energy vs p99 latency tradeoffs. Evidence:
eval/compaction_scheduler_dashboard.pycomparesenergy_awarewith fixed/workload policies and emits energy score plus p99 latency summaries.
Extend Phase 22 — LLM-assisted semantic autoparameterization (R11)¶
- [x] T215: Label schema for “semantic constants” vs parameterizable literals. Evidence:
ai.autoparam.label_schemaexposesskein.ai.autoparam.label_schema.v1with literal context fields, label result fields, confidence bounds, and explicitparameterize,semantic_constant, andunknowncache-key policies. - [x] T216: Pluggable classifier interface (offline model first). Evidence:
ai.autoparam.classifiersexposes the supported classifier catalog,offline_rules_v1is selectable throughclassifieron classify/analyze requests, and unsupported classifier names return an error. - [x] T217: Feedback loop: cache misses trigger reclassification. Evidence:
ai.autoparam.feedbackacceptscache_event: "plan_cache_miss", re-runs the selected classifier, recordscached_before/reclassified, and accumulates per-fingerprint miss and reclassification counts. - [x] T218: Metrics: plan-cache hit rate vs classifier overhead. Evidence:
ai.autoparam.metricsreports plan-cache hit/miss rate together with classifier invocation/latency counters and feedback reclassification totals.