Engineering roadmap

Last updated 2026-05-09. Tracked in docs/PROJECT_BACKLOG.md and docs/RESEARCH_BACKLOG.md.

SkeinDB is shipped one small, test-backed slice at a time. Each phase below groups a logical area; every item in the backlog ships with unit + integration tests and documentation updates before it is merged.

Phase 0 — Repo setup

Done

T001doneEncoding primitives (VarU, Bytes/String, CRC32C)
T002doneFileHeader read/write
T003doneRecordFrame append/iterate

Phase 1 — Storage core

Active

T010doneMANIFEST.log reader/writer
T011doneWAL writer/reader + recovery
T012doneValueStore (.vseg) append/read + ValueID
T013doneSorted runs (.run) + simple LSM
T014doneRowSeg (.rseg) + RowVersion encoding
T015doneRowDir (row_id → head ptr)
T016doneMVCC visibility

Phase 2 — SQL + virtual metadata

Done

T020doneCatalog schema + TableDef
T021doneinformation_schema.tables / columns
T022doneMinimal executor: CREATE TABLE, INSERT, SELECT

Phase 3 — MySQL protocol

Done

Baseline handshake, COM_QUERY, COM_STMT parity, WordPress-class workload coverage, and an extensive scalar/date-time/JSON function set. Follow-up parity work continues through corpus growth.

Phase 4 — Web console

Done

HTTP API (/api/v1/sql/exec), schema browser, SQL editor, data browse/edit + CSV/JSON import/export, users/privileges + status dashboard.

Phase 5 — SkeinQL native API

Done

Native SkeinQL RPC (/api/v1/rpc) with schema.*, query.*, tx.*, system.*.

Phase 6 — Cache-coherent HTTP queries

Done

Row ETags, If-Match/If-None-Match, query.prepare, SSE subscriptions.

Phase 7 — Delta-chained values

Done

DELTA value kind, selection policy + metrics, compaction rebase bounded by chain depth.

Phase 8 — Wasm extensions

Done

T080doneModule store + UDF catalog
T081doneScalar UDF sandbox
T082doneSafe cancellation (fuel/time)
T083doneAggregate + table-function UDFs

Phase 9 — Tamper-evident WAL audit

Done

WALHeader v2, hash chaining, checkpoint anchors, verify CLI/API.

Phase 10 — Hybrid row/column snapshots

Active

T100, T101, and T102 are in place: snapshot builds materialize rows at the requested snapshot timestamp, emit manifest plus .cseg sidecars, simple single-table SELECTs can read through a column-scan cursor, and an explicit rule now routes only eligible covered reads into snapshots. Hybrid merge planning remains open.

T100doneSnapshot builder + .cseg writer
T101doneColumn scan operator
T102doneOptimizer rule for column snapshots

Phase 11 — Compatibility telemetry

Done

Phase 12 — SkeinAdmin standalone console

Done

Phase 13 — Observability & server load stats

Done

Phase 14 — Cluster management

Active

Node identity, replication transport, CAS pull, read-only replicas, cluster.* RPCs, sharding metadata router (single-shard txns).

Phase 15 — Additional performance improvements

Done

T150 through T153 are now in place: interned-column schema flags persist separately from the catalog, single-table scan paths can precompile `eq`/`ne`/`in` predicates into ValueID comparisons, row/snapshot executors only materialize the columns the query still needs, eligible full scans now run through a 1024-row scan->filter->project batch loop, and the core MVCC layer exposes a validated visible-version cache keyed by row id plus snapshot bucket.

T150doneSchema flag for interned columns + ValueID-first predicate ops
T151doneLate materialization (decode only projected columns)
T152doneBatch (vectorized) scan/filter/project pipeline
T153doneMVCC Visible Version Index cache

How to follow

The most authoritative state is the GitHub repository. Pinned reading order:

docs/PROJECT_BACKLOG.md — live backlog with dated “Latest” notes per task
docs/TRUE_STATUS_MATRIX.md — implemented vs partial matrix
docs/RESEARCH_BACKLOG.md — research tracks
git log — commit-level granularity