Engineering roadmap
SkeinDB is shipped one small, test-backed slice at a time. Each phase below groups a logical area; every item in the backlog ships with unit + integration tests and documentation updates before it is merged.
Phase 0 — Repo setup
Done- T001doneEncoding primitives (VarU, Bytes/String, CRC32C)
- T002doneFileHeader read/write
- T003doneRecordFrame append/iterate
Phase 1 — Storage core
Active- T010doneMANIFEST.log reader/writer
- T011doneWAL writer/reader + recovery
- T012doneValueStore (.vseg) append/read + ValueID
- T013doneSorted runs (.run) + simple LSM
- T014doneRowSeg (.rseg) + RowVersion encoding
- T015doneRowDir (row_id → head ptr)
- T016doneMVCC visibility
Phase 2 — SQL + virtual metadata
Done- T020doneCatalog schema + TableDef
- T021doneinformation_schema.tables / columns
- T022doneMinimal executor: CREATE TABLE, INSERT, SELECT
Phase 3 — MySQL protocol
DoneBaseline handshake, COM_QUERY, COM_STMT parity, WordPress-class workload coverage, and an extensive scalar/date-time/JSON function set. Follow-up parity work continues through corpus growth.
Phase 4 — Web console
DoneHTTP API (/api/v1/sql/exec), schema browser, SQL editor, data browse/edit + CSV/JSON import/export, users/privileges + status dashboard.
Phase 5 — SkeinQL native API
DoneNative SkeinQL RPC (/api/v1/rpc) with schema.*, query.*, tx.*, system.*.
Phase 6 — Cache-coherent HTTP queries
DoneRow ETags, If-Match/If-None-Match, query.prepare, SSE subscriptions.
Phase 7 — Delta-chained values
DoneDELTA value kind, selection policy + metrics, compaction rebase bounded by chain depth.
Phase 8 — Wasm extensions
Done- T080doneModule store + UDF catalog
- T081doneScalar UDF sandbox
- T082doneSafe cancellation (fuel/time)
- T083doneAggregate + table-function UDFs
Phase 9 — Tamper-evident WAL audit
DoneWALHeader v2, hash chaining, checkpoint anchors, verify CLI/API.
Phase 10 — Hybrid row/column snapshots
ActiveT100, T101, and T102 are in place: snapshot builds materialize rows at the requested snapshot timestamp, emit manifest plus .cseg sidecars, simple single-table SELECTs can read through a column-scan cursor, and an explicit rule now routes only eligible covered reads into snapshots. Hybrid merge planning remains open.
- T100doneSnapshot builder + .cseg writer
- T101doneColumn scan operator
- T102doneOptimizer rule for column snapshots
Phase 11 — Compatibility telemetry
DonePhase 12 — SkeinAdmin standalone console
DonePhase 13 — Observability & server load stats
DonePhase 14 — Cluster management
ActiveNode identity, replication transport, CAS pull, read-only replicas, cluster.* RPCs, sharding metadata router (single-shard txns).
Phase 15 — Additional performance improvements
ActiveT150 through T153 are now in place: interned-column schema flags persist separately from the catalog, single-table scan paths can precompile `eq`/`ne`/`in` predicates into ValueID comparisons, row/snapshot executors only materialize the columns the query still needs, eligible full scans now run through a 1024-row scan->filter->project batch loop, and the core MVCC layer exposes a validated visible-version cache keyed by row id plus snapshot bucket.
- T150doneSchema flag for interned columns + ValueID-first predicate ops
- T151doneLate materialization (decode only projected columns)
- T152doneBatch (vectorized) scan/filter/project pipeline
- T153doneMVCC Visible Version Index cache
How to follow
The most authoritative state is the GitHub repository. Pinned reading order:
- docs/PROJECT_BACKLOG.md — live backlog with dated “Latest” notes per task
- docs/TRUE_STATUS_MATRIX.md — implemented vs partial matrix
- docs/RESEARCH_BACKLOG.md — research tracks
- git log — commit-level granularity