Research Overview Tracks Agenda

SkeinDB Project Backlog

This backlog is designed for small PR-sized tasks. Each task should include tests.

Reality sync (2026-05-27)

This file is the core roadmap task inventory. It is not the best place to read current runtime maturity at a glance.

  • [x] = implemented and exercised in runtime/tests.
  • [ ] = still open.
  • All 140 top-level core roadmap checkboxes are currently closed.
  • Remaining partial areas and non-hardened work are tracked in docs/TRUE_STATUS_MATRIX.md, compatibility docs, and docs/RESEARCH_BACKLOG.md.

For the short implemented-vs-partial snapshot, see docs/TRUE_STATUS_MATRIX.md.

Phase 0 — Repo setup

  • Status: complete in runtime + tests (crates/skeindb-core/src/lib.rs, crates/skeindb-core/tests/phase0_format.rs)
  • [x] T001: Encoding primitives (VarU, Bytes/String, CRC32C)
  • [x] T002: FileHeader read/write
  • [x] T003: RecordFrame append/iterate

Phase 0 verification checklist: - [x] T001 evidence: VarU and hash/CRC tests (tests::varu_roundtrip_*, tests::value_id_is_stable, tests::audit_hash_is_stable) in crates/skeindb-core/src/lib.rs - [x] T002 evidence: FileHeader encode/decode + corruption tests in crates/skeindb-core/src/lib.rs and file roundtrip in crates/skeindb-core/tests/phase0_format.rs - [x] T003 evidence: RecordFrame append/decode/iterate + truncation/CRC tests in crates/skeindb-core/src/lib.rs and file-backed iteration in crates/skeindb-core/tests/phase0_format.rs

Phase 1 — Storage core

  • [x] T010: MANIFEST.log reader/writer. Latest: crates/skeindb-core/src/manifest.rs now provides a typed append-only MANIFEST implementation over FileHeader(FileKind::Manifest) + RecordFrame payloads with five v1 record variants (AddFile, RemoveFile, SetCurrentVersion, SetLastLsn, CleanShutdown), replayable ManifestState derivation, and file-backed ManifestWriter/ManifestReader APIs. Unit tests cover record encode/decode for all variants, unknown-tag/file-kind rejection, state-apply behavior (add/update/remove), file roundtrip, writer reopen replay, and bad-header rejection.
  • [x] T011: WAL writer/reader + recovery. Latest: crates/skeindb-core/src/wal.rs now implements FileHeader(FileKind::Wal) + RecordFrame WAL files with typed v1 body records (BEGIN_TXN, MUTATION, COMMIT_TXN, ABORT_TXN) layered on the existing WAL header prefix, strict read-all and lenient committed-transaction recovery, and a file-backed WalWriter that truncates torn/corrupt tails before appending. Recovery emits only committed txns in log order, discards aborted txns, and reports truncated tail bytes. Unit tests cover v1/v2 record decode/roundtrip, unknown record rejection, committed-only recovery, torn-tail truncation on reopen, and bad-header rejection; integration tests cover file-backed roundtrip and truncated-tail recovery.
  • [x] T012: ValueStore (.vseg) append/read + ValueID. Latest: crates/skeindb-core/src/valuestore.rs now adds ValueSegmentWriter / ValueSegmentReader over FileHeader(FileKind::ValSeg) + framed VE1 records, ValueStore::write_segment_file / load_segment_file convenience helpers, explicit ValueId preservation for raw and custom-ID entries, and DELTA persistence via DELTA1 payloads stored inside VE1 raw_bytes. Loading recomputes delta depths, validates materialization through the existing hash-checked reconstruction path, and rebuilds learned-index state in memory. Integration tests cover file-backed roundtrip with RAW + DELTA + custom-ID EMBEDDING entries, writer reopen append semantics, and bad-header rejection.
  • [x] T013: Sorted runs (.run) + simple LSM (memtable + level0). Latest: crates/skeindb-core/src/run.rs now implements immutable .run files over FileHeader(FileKind::Run) with typed DataBlock / IndexBlock / footer encoding, a RunWriter that enforces strictly increasing keys and block-splits by target size, a RunReader with full-scan and binary-searched point lookups, and a SimpleLsm that keeps a memtable in a BTreeMap, flushes to run-######.run, and reads level0 newest-first. Tests cover out-of-order writer rejection, bad-header rejection, multi-block file roundtrip, memtable flush + overwrite visibility across multiple level0 runs, and reopen/discovery of existing runs.
  • [x] T014: RowSeg (.rseg) + RowVersion encoding. Latest: crates/skeindb-core/src/rowseg.rs now implements the RV1 row-version record (rec_type=0x10, rec_ver=1) over FileHeader(FileKind::RowSeg) + RecordFrame framing, with a FilePtr (12-byte file_id+offset) for MVCC chain pointers, RowGroupRef::Inline / RowGroupRef::ValueId group payloads, strict decode (rejects trailing bytes, unknown rec_type/rec_ver, unknown group ref kinds), and RowSegmentWriter::{create,open,append→FilePtr,sync} / RowSegmentReader::{open,read_all,read_at} APIs so callers can chain per-row version histories. Unit tests cover RV1 encode/decode roundtrips, bad header/type/version/trailing-bytes rejection, delete-flag handling, and FilePtr sentinel semantics; integration tests cover file-backed write/read roundtrips with chained prev_ptr, reopen/append semantics, mixed Inline/ValueId groups, and rejection of wrong-FileKind files.
  • [x] T015: RowDir (row_id -> head ptr). Latest: crates/skeindb-core/src/rowdir.rs now provides an in-memory BTreeMap<u64, RowDirEntry> mapping row_id to FilePtr head pointers, with Live/Tombstone variants so deletions can shadow older-run state. Persistence reuses the existing .run format: flush_to_run writes sorted big-endian row_id keys with [tag=0][FilePtr(12)] live values or [tag=1] tombstones; load_from_run/from_run replays them, applying tombstones as removals and live entries as upserts. Unit tests cover put/get/remove/forget, sorted iteration, live/tombstone encode-decode, and bad tag/length rejection. Integration tests (crates/skeindb-core/tests/rowdir.rs) cover file-backed roundtrip preserving row_id order, tombstone-in-newer-run shadowing older-run live entries, live-in-newer overriding live-in-older, empty-run roundtrip, and tombstone-only run shadowing multiple live entries.
  • [x] T016: MVCC visibility. Latest: crates/skeindb-core/src/mvcc.rs now resolves snapshot visibility by walking RowVersion::prev_ptr chains over .rseg records loaded through RowSegmentReader, using begin_ts != 0 && begin_ts <= snapshot_ts && (end_ts == 0 || end_ts > snapshot_ts) as the committed-visibility rule and treating Snapshot::latest() as +INF. resolve_head starts from an explicit FilePtr, resolve_row_id starts from RowDir::get(row_id), and MvccLookup distinguishes missing rows, visible live rows, and visible delete markers. The module includes a RowVersionResolver trait plus RowSegmentSet for file-backed multi-segment lookup, cycle / row-id / table-id mismatch detection, and staged-row skipping (begin_ts == 0). Unit tests cover snapshot-window evaluation, delete-marker visibility, historical chain walking, staged-head fallback, and cycle / row-id mismatch detection; integration tests (crates/skeindb-core/tests/mvcc.rs) cover file-backed delete-marker history, RowDir-based lookup across multiple segment files, and staged-head fallback to the previous committed version.

Phase 2 — SQL + virtual metadata

  • [x] T020: Catalog schema + TableDef
  • [x] T021: information_schema.tables + columns
  • [x] T022: Minimal executor: CREATE TABLE, INSERT, SELECT scan+filter+limit

Phase 3 — MySQL protocol

  • Status: baseline protocol plus broad COM_QUERY / COM_STMT compatibility are implemented in runtime/tests; follow-up parity work now continues through corpus growth and later backlog phases rather than open Phase 3 checkboxes.
  • [x] T030: Handshake + mysql_native_password
  • [x] T031: COM_QUERY SELECT literals
  • [x] T032: SQL translator (subset)
  • [x] T033: DDL/DML subset for corpus.sql
  • [x] T034: SQL_CALC_FOUND_ROWS + FOUND_ROWS
  • [x] T035: Index-backed secondary/unique index enforcement for MySQL duplicate-key semantics (runtime duplicate-key checks now reuse the shared secondary-index cache, including PRIMARY KEY-changing UPDATEs; MySQL duplicate-key failures now surface as 1062 / 23000 on the wire; creating a MySQL compatibility UNIQUE INDEX still rejects pre-existing duplicate rows; and per-table secondary-index cache metadata now persists/reloads on reopen)
  • [x] T036: Broaden COM_QUERY parity for WordPress-class workloads (the MySQL listener now covers the checked-in WordPress-style corpus and companion integration tests, including grouped/simple aggregate shims with HAVING, projection-grouped GROUP BY de-dup including wildcard projections after expansion, SQL_CALC_FOUND_ROWS, wildcard join projections, top-level comma joins and left-associative join chains, parenthesized boolean predicates, index DDL, bootstrap/session compatibility SET and SHOW forms, recursive/nested compatibility rewrites for the current subquery subset, and compatibility no-op LOCK TABLES / UNLOCK TABLES)
  • [x] T037: Deepen COM_STMT parity beyond the current baseline (complex-query result metadata, stricter driver/cursor semantics, fuller protocol coverage; prepare-time metadata now also covers supported scalar-expression projections including baseline arithmetic, broader scalar/date-time functions including FIND_IN_SET / ISNULL, DATE_FORMAT / FROM_UNIXTIME, DATEDIFF / TIMESTAMPDIFF, WEEKDAY / DAYOFWEEK / DAYOFYEAR, MONTHNAME / DAYNAME, QUARTER, LAST_DAY, EXTRACT(<unit> FROM ...), and baseline interval arithmetic through DATE_ADD / DATE_SUB / TIMESTAMPADD, supported subquery-compat SELECTs whose WHERE clauses rewrite cleanly, including the current IN / EXISTS / simple scalar-compare subset, the current nested compatibility path, limited negated boolean-tree wrappers when they can still rewrite cleanly, supported projection-level scalar subqueries, embedded scalar-subquery arithmetic, plus CASE / CAST plus simple aggregate / grouped-aggregate compatibility queries). Progress: the new scalar/date-time functions, COM_INIT_DB, and COM_STATISTICS wire commands broaden the prepared-statement surface, and the latest slice adds dedicated unit + MySQL-wire regressions for projection-subquery metadata parity.
  • [x] T038: Broaden COM_QUERY beyond the current WordPress-class baseline (deeper correlated/nested subqueries beyond the current recursive IN / EXISTS / simple scalar-compare compatibility path, broader join parity beyond the current left-associative ON plus simple base-table USING subset, broader date/time/function parity beyond the current scalar/date-time baseline, and broader ALTER TABLE variants beyond the current ADD/MODIFY/CHANGE/RENAME COLUMN/RENAME [KEY|INDEX]/RENAME TO/DROP COLUMN plus index metadata surface). Progress: significant surface expansion — added BETWEEN/NOT BETWEEN, COUNT(DISTINCT col), GROUP_CONCAT(), INSERT ... SELECT, UNION/UNION ALL, TRUNCATE TABLE, DROP DATABASE, RENAME TABLE, EXPLAIN stub, DO, SAVEPOINT stubs, CREATE VIEW/DROP VIEW stubs, locking hint stripping, session functions (USER(), LAST_INSERT_ID(), CONNECTION_ID()), information_schema.schemata/statistics, expanded SHOW commands (WARNINGS, ERRORS, PROCESSLIST, TRIGGERS, EVENTS, PROCEDURE STATUS, FUNCTION STATUS, PLUGINS, PROFILES, CREATE DATABASE), SET GLOBAL/FLUSH/ANALYZE/OPTIMIZE/CHECK/REPAIR/KILL no-ops, and 30+ additional scalar/date-time functions. Corpus expanded from 772→947 lines (283 statements). Latest batch: derived tables (FROM subqueries), CTEs (WITH...AS), REGEXP/RLIKE/NOT REGEXP, <=> (NULL-safe equality), NATURAL JOIN, FULL OUTER JOIN (fully executed), multi-table DELETE, multi-table UPDATE (stub), 11 JSON functions (JSON_EXTRACT, JSON_UNQUOTE, JSON_OBJECT, JSON_ARRAY, JSON_CONTAINS, JSON_LENGTH, JSON_TYPE, JSON_VALID, JSON_SET, JSON_KEYS, JSON_MERGE_PRESERVE), plus FIELD/ELT, INET_ATON/INET_NTOA, BIN/OCT/CONV, and hash functions (CRC32, MD5, SHA1/SHA, SHA2). Corpus now at 1130 lines (over 374 statements). Latest batch: multi-column GROUP BY with multiple group columns and aggregates, 12 new scalar functions (SUBSTRING_INDEX, ASCII, ORD, CHAR, STRCMP, BIT_LENGTH, OCTET_LENGTH, REGEXP_REPLACE, REGEXP_SUBSTR, TO_BASE64, FROM_BASE64), 5 new information_schema stub tables (routines, triggers, views, processlist, user_privileges). Latest batch: window functions (ROW_NUMBER()/RANK()/DENSE_RANK() with OVER(PARTITION BY ... ORDER BY ...)), SET @var = .../SELECT @var user variables, BIT_AND()/BIT_OR()/BIT_XOR() bitwise aggregates, multi-table UPDATE (upgraded from stub to real per-row implementation), 6 new scalar functions (DEGREES, RADIANS, PERIOD_ADD, PERIOD_DIFF, MAKEDATE, MAKETIME). Corpus now at 1240+ lines (over 370 statements). Latest batch: corpus.sql fully expanded — all 16+ TODO blocks uncommented (IF/NULLIF, EXISTS, REGEXP, CAST, COUNT DISTINCT, INFORMATION_SCHEMA, LOCK/UNLOCK, window functions, CTEs, RIGHT/CROSS JOIN, derived tables, NOT EXISTS, IN/scalar subquery, multi-table DELETE/UPDATE, nested functions, SHOW PROCESSLIST/PLUGINS). ~60 new SQL statements added covering additional JOINs, INSERT...SELECT, DO, EXPLAIN, SHOW variants, system variables (@@version etc.), maintenance no-ops, CREATE/DROP VIEW, SAVEPOINT, GROUP_CONCAT DISTINCT, multi-column GROUP BY, session functions, locking hints, scalar functions, SET GLOBAL. Corpus now at 1657 lines (about 678 semicolon-terminated SQL statements) after the fully expanded compatibility sweep. Latest batch: correlated subqueries in projection (SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) FROM users), binary comparison operators in scalar expressions (>, <, >=, <=, =, !=, <>), multi-aggregate GROUP BY with ORDER BY support over JOINs, embedded subquery pre-evaluation in arithmetic expressions (salary - (SELECT AVG(salary) FROM users)), expression-based UPDATE SET values with per-row evaluation (UPDATE users SET salary = salary * 1.1 WHERE ... via data_update_exprs engine method), WordPress Site Health-style information_schema.TABLES storage summaries, WordPress Users-screen role counts via COUNT(NULLIF(<predicate>, false)), and dedicated MySQL-wire regressions for WordPress installer/admin seed queries. A fresh live WordPress admin sweep across the core dashboard/content/settings surfaces now finishes with an empty debug.log; only the theme-owned nav-menus / widgets pages still return non-database 500s. Deeper parity work is still ongoing.

Phase 4 — Web console

  • [x] T040: HTTP API /api/v1/sql/exec
  • [x] T041: Console UI scaffold
  • [x] T042: Schema browser + SQL editor
  • [x] T043: Data browse/edit + import/export (CSV + JSON export/import)
  • [x] T044: Users/privileges + status dashboard

Phase 5 - SkeinQL native API

  • [x] T050: Define SkeinQL request/response types + error model (docs/SKEINQL.md)
  • [x] T051: Implement HTTP RPC endpoint POST /api/v1/rpc (system.ping, system.version)
  • [x] T052: Implement schema.* methods (list/describe/create/drop)
  • [x] T053: Implement query.select (single-table scan + filter + limit) over SkeinIR
  • [x] T054: Implement tx.begin/commit/rollback via SkeinQL

Phase 6 - Cache-coherent HTTP queries (ETags)

  • [x] T060: Row ETags for data.get and If-Match support for data.update
  • [x] T061: Planner dependency sets for simple indexed queries
  • [x] T062: query.prepare + GET /api/v1/q/{query_id} with ETag/If-None-Match
  • [x] T063: SSE subscription to ETag changes (query.subscribe)

Phase 7 - Delta-chained values

  • [x] T070: Add ValueEntry kind DELTA + patch codec (docs/DELTA_VALUES.md)
  • [x] T071: Delta selection policy + metrics
  • [x] T072: Compaction rebase (limit delta chain depth)

Phase 8 - Wasm extensions

  • [x] T080: Module store + catalog metadata for UDFs (docs/WASM_UDFS.md). Latest: crates/skeindb-core/src/wasm_catalog.rs now stores Wasm modules as immutable ValueKind::BlobChunk values inside ValueStore and tracks typed metadata in a WasmModuleCatalog persisted as wasm_catalog.json (format v1). Catalog entries include module_id, optional name, UDF kind (scalar / aggregate / table), ABI string, entrypoint symbol, ValueId, size, creation timestamp, and capability metadata (allowed_hostcalls, per-table read/write permissions, determinism, fuel/memory/output budgets). The core API supports install/list/get/drop, overwrite-on-install, module-byte materialization back out of ValueStore, strict validation for empty ids/entrypoints/ABIs/modules, and strict JSON load validation for unsupported format versions or malformed value_id hex. Unit tests cover install/overwrite/drop flows, invalid-request rejection, and value-id hex roundtrip; integration tests cover catalog + .vseg roundtrip preserving both metadata and module bytes plus JSON-load rejection for bad format versions and malformed value_id strings.
  • [x] T081: Scalar UDF execution sandbox with resource limits. Latest: crates/skeindb-core/src/wasm_udf.rs now executes scalar Wasm modules directly from WasmModuleCatalog + ValueStore via wasmtime, using the skein.wasm.udf.v1 ABI with host-side value encoding, a required exported memory, skein_alloc(len) allocator, and a scalar entrypoint export (typically skein_scalar(ptr, len) -> u64, returning ptr<<32 | len). The sandbox enforces capability-gated imports (currently only skein.log_debug mapped from allowed_hostcalls = ["log.debug"]), memory size limits via store resource limiting, output byte limits via max_output_bytes, and keeps filesystem/network/clock/random unavailable by exposing no such imports. Unit tests cover scalar value encode/decode roundtrips and trailing-byte rejection; integration tests cover constant-value execution, hostcall denial/allow flows for log.debug, output-size rejection, and memory-growth failure beyond the configured limit.
  • [x] T082: Safe cancellation (fuel/time budget) + tests. Latest: crates/skeindb-core/src/wasm_udf.rs now enforces per-module max_fuel budgets via Wasmtime fuel metering when configured, adds a bounded wall-clock deadline via epoch interruption, and surfaces explicit FuelExhausted / TimeoutExceeded errors instead of collapsing cancellation into generic execution failures. The default execute_scalar_udf(...) path keeps a conservative host timeout, while execute_scalar_udf_with_options(...) allows embedders and tests to override it. Integration tests now cover deterministic infinite-loop cancellation on fuel exhaustion, host timeout cancellation when max_fuel = 0, and successful execution of a later UDF after a cancelled call.
  • [x] T083: Aggregate and table-function UDFs. Latest: crates/skeindb-core/src/wasm_udf.rs now executes aggregate and table Wasm modules through the same Wasmtime sandbox used for scalar UDFs. Aggregate modules consume a one-shot encoded row batch via execute_aggregate_udf(...) and return a single encoded value; table modules consume encoded args via execute_table_udf(...) and return an encoded row set. Shared encode_rows(...) / decode_rows(...) helpers define the row-batch ABI, and the runtime reuses the existing capability checks, memory/output limits, and T082 fuel/timeout cancellation path for all three module kinds. Unit tests now cover row-batch encode/decode roundtrips; integration tests cover an aggregate module that sums row values and a table module that materializes rows from input arguments.

Phase 9 - Tamper-evident WAL audit

  • [x] T090: WALHeader v2 with hash chaining (docs/AUDIT_WAL.md)
  • [x] T091: checkpoint anchors + audit status
  • [x] T092: audit verify CLI/API + console page. Latest: SkeinAdmin's Forensics panel now exposes maintenance.audit_status and maintenance.audit_verify alongside filtered forensic.query, proof-backed forensic.verify, and forensic.export report bundle tools.

Phase 10 - Hybrid row/column snapshots

  • [x] T100: Snapshot builder (scan MVCC at snapshot_ts) + cseg writer (docs/COLUMN_SNAPSHOTS.md). Latest: snapshot builds now honor snapshot_ts, persist manifest.json + .cseg sidecars under data/snapshots/, and keep those artifacts in sync during incremental refresh.
  • [x] T101: Snapshot reader + column scan operator. Latest: simple single-table SELECT execution now loads projected and PK columns from snapshot manifests and .cseg sidecars instead of cloning in-memory snapshot rows.
  • [x] T102: Optimizer rule: use column snapshots for covered ranges. Latest: an explicit optimizer rule now chooses snapshot scans for covered current-time single-table SELECTs, including covered DISTINCT projections, projection-only GROUP BY, compatible HAVING over grouped projected columns or aliases, and broad equality-index prefilter shapes, when the cost model beats both a full row scan and the competing row-side candidate scan.

Phase 11 - Compatibility telemetry and migration hints

  • [x] T110: Feature flag instrumentation in MySQL translator
  • [x] T111: Internal storage for telemetry counters + query fingerprints (optional)
  • [x] T112: telemetry.compat_summary endpoint + console dashboard
  • [x] T113: telemetry.migration_hints generator (MySQL patterns -> SkeinQL calls)

Phase 12 - Standalone management console (SkeinAdmin)

  • [x] T120: SkeinAdmin placeholder scaffold (web/skeinadmin) + connection profiles
  • [x] T121: SkeinAdmin pages: schema/data/sql workspace
  • [x] T122: SkeinAdmin security: token UI + role-aware navigation. Latest: dedicated Security panel remains reachable from both sidebar and top-tab navigation, with create/list/revoke token flows using modal confirmations instead of browser dialogs.
  • [x] T123: SkeinAdmin cluster page (cluster.*) + actions. Latest: join/leave/remove/promote controls are all surfaced in the live cluster panel.
  • [x] T124: SkeinAdmin observability page (stats.*) — comprehensive dashboard with runtime, storage/dedup, MVCC/compaction, query/cache stats + auto-refresh
  • [x] T125: SkeinAdmin Easy Viewer (phpMyAdmin-inspired) — sidebar tree, sub-tabs, inline editing, search, export, operations. Latest: inline New DB flow, live create-table SQL preview, duplicate-column / identifier validation before create, required-field validation before insert, column sorting (click-to-sort headers), styled modal confirmations (replacing browser confirm()), search operator dropdown (LIKE/=/!=/>/</BETWEEN/IS NULL/IS NOT NULL/REGEXP), visual query builder tab (column picker, WHERE condition builder, ORDER BY/LIMIT, SQL preview, execute/copy/send), 5 new dashboard cards (Top Tables, Slow Query Log, Active Sessions, Index Health, Research Track Status). 2026-04-25 (v0.3.4): wired the previously-stubbed Top Tables / Slow Query Log / Active Sessions / Index Health cards to live RPCs (information_schema.tables via sql.exec, stats.slow_queries, stats.snapshot) through new silentRpc / unwrapCellValue helpers; fixed three r?.secret / r?.tokens / r?.queries response-shape bugs in securityCreateToken / securityRefreshTokens / securityTopQueries so the panel now reads from r.json.result.*; reordered the create-token flow so the fresh secret is no longer overwritten by the subsequent token list refresh; added auto-refresh on overview/security panel switches; relabeled the Active Sessions card to match stats.snapshot fields (Sessions / Open Txns / Avg Latency).
  • [x] T126: SkeinAdmin Engine Config panel — checkbox toggles for dedup, compression, encryption, MVCC, delta chains, time travel, compaction, cache, security, replication, CDC, QUIC. Latest: storage mode selector is aligned with the real runtime values json, segment, and hybrid.

Phase 13 - Observability and server load statistics

  • [x] T130: stats.snapshot and basic counters in server. 2026-05-25: stats.snapshot now also synthesizes a basic alerts block from the existing query tail-latency, CDC backpressure, and compaction-pressure telemetry so operators can see current warning/critical conditions without scraping multiple subtrees. 2026-05-27: settings-backed observability.alert_routes now annotate stats.snapshot.alerts with matched route IDs/targets plus routing.{configured,routed_alerts,matched_routes} summary metadata. Matched http:// and https:// targets now receive a JSON POST once per active alert while repeated polls are suppressed until the alert clears, with per-route and top-level delivery counters exposed in the snapshot. Coverage: server::tests::stats_snapshot_routes_operator_alerts_from_settings, server::tests::stats_snapshot_delivers_http_alert_routes_once_per_active_alert.
  • [x] T131: query fingerprinting + top_queries / slow_queries
  • [x] T132: GET /metrics (Prometheus-style) + labels
  • [x] T133: Console widgets for CPU/memory/disk/QPS/TPS/compaction — Overview dashboard with stat cards, dedup bar chart, auto-refresh. 2026-04-25 (v0.3.5): stats.snapshot now exposes real storage.total_rows / storage.total_tables / storage.disk_bytes (computed by walking the data dir) plus new top-level mvcc.{versions, delta_chains} and cache.{hit_pct, size_bytes, hits, misses} sections, and a new etag_hits counter increments on both If-None-Match paths in GET /api/v1/q/{id} — the dashboard cards that previously rendered "--" now show live values. 2026-05-25: the Overview panel now renders an Operational Alerts card from stats.snapshot.alerts, summarizing current query tail-latency, CDC backpressure, and compaction-pressure warnings/criticals in one place.

Phase 14 - Cluster management and scale-out

  • [x] T140: Node identity (node_id) + cluster config model
  • [x] T141: Replication transport protocol (primary -> replica fanout over SkeinQL RPC)
  • [x] T142: CAS object pull protocol (replica fetch missing ValueIDs; objects.need/missing/fetch RPCs + Bloom contains)
  • [x] T143: Read-only replica serving + router (cluster.route_query RPC + replica write rejection)
  • [x] T144: cluster.* SkeinQL endpoints + join tokens + promote replica
  • [x] T145: Sharding metadata + router prototype (single-shard txns)

Phase 15 - Additional performance improvements

  • [x] T150: Schema flag for interned columns + ValueID-first predicate ops (docs/PERFORMANCE.md)
  • [x] T151: Late materialization (decode only projected columns)
  • [x] T152: Batch (vectorized) scan/filter/project pipeline
  • [x] T153: MVCC Visible Version Index cache. Latest: crates/skeindb-core/src/mvcc.rs now exposes a bounded VisibleVersionIndex keyed by row_id + snapshot_epoch_bucket, validates cached entries against the current RowDir head pointer plus the cached version's exact visibility window before reuse, and falls back to normal chain walking on head changes or same-bucket timestamp drift. Unit tests cover cache hits, same-bucket revalidation, head-change invalidation, and bounded eviction; integration tests cover file-backed reuse over RowSegmentSet + RowDir.

Phase 16 - Query coalescing (thundering herd protection)

  • [x] T160: Query fingerprint canonicalization (SkeinIR + SkeinQL) + auth scope keying
  • [x] T161: In-flight query map (leader/joiner) with cancellation semantics
  • [x] T162: Enable coalescing for GET /api/v1/q/{query_id} (cacheable) + tests
  • [x] T163: Metrics + limits + SkeinAdmin dashboard widget

Phase 17 - CAS-aware replication bandwidth bounds (object-aware sync)

  • [x] T165: Bloom summaries for ValueID existence (per valseg + union)
  • [x] T166: Object pull protocol (batch missing ValueIDs, fetch objects, verify hashes). Latest: added replica-side objects.pull, which batches locally-missing ValueIDs, calls remote objects.fetch, validates a lossless transferred VE1 payload (entry_b64) against the requested ValueID before import, and recursively fetches missing delta-base dependencies so pulled entries remain materializable after ingest. Tests cover batch fetching with local-hit skipping, delta-base dependency pulling, and hash-mismatch rejection.
  • [x] T167: Replication metrics: object hit-rate, saved bytes, ref-bytes vs obj-bytes. Latest: added ReplicationObjectCounters to the server counters, instrumented objects.need / objects.missing / objects.fetch with hit/miss accounting and byte accounting (hits accumulate ref_bytes, fetches accumulate obj_bytes), exposed a new cluster.replication_stats RPC (read-only, capability-listed) reporting need_*, missing_*, fetch_*, ref_bytes, obj_bytes, hit_rate, saved_bytes_ratio, and last_updated_ms, and embedded the same JSON under stats.snapshot.cluster.replication_objects. One end-to-end integration test (cluster_replication_stats_tracks_hits_misses_and_bytes) verifies the counters advance correctly across seed → need → missing → fetch → stats.snapshot.
  • [x] T168: Shard move/rebalance uses object manifests + progress reporting. Latest: cluster.shard.move / cluster.shard.rebalance now build shard-scoped object manifests from live row versions, ask the destination node which ValueIDs are missing via objects.need, pull only the missing objects via objects.pull before updating placement, and return manifest/progress summaries including total/missing object counts and bytes plus batch/pull/store outcomes. Tests cover engine-side manifest deduplication and an end-to-end shard move that transfers missing objects to a destination node.

Phase 18 - Self-tuning index advisor

  • [x] T170: Telemetry feature extraction (predicates/order/group/join keys) + privacy-safe storage
  • [x] T171: Candidate index generator + duplication/prefix checks. Latest: advisor synthesis now suppresses exact duplicates, primary-key prefixes, prefixes already covered by existing MySQL-compatible indexes, and any suggestion IDs that were previously applied or dismissed.
  • [x] T172: Benefit estimator (Level 0 rule-based) + SkeinQL advisor.* endpoints. Latest: advisor.evaluate now emits measured before/after latency stats for benchmarkable equality, join-key filters, multi-range filters, narrow order-by, grouped phases including mixed range/order/group layouts, and non-grouped same-leading range+order by comparing live full scans against a hypothetical advisor-built secondary index; non-grouped range+order layouts without a same-leading key still fall back to observed-before / expected-after scan summaries.
  • [x] T173: Apply suggestion (CREATE INDEX) + progress + rollback-on-failure. Latest: advisor.apply_index now queues background secondary-index builds, advisor.history records queued/building/completed/failed lifecycle state with progress percentages, and failed builds record rollback metadata before the suggestion can surface again.
  • [x] T174: SkeinAdmin "Index Advisor" page + before/after performance report

Phase 19 - Time travel and replay bundles

  • [x] T180: MVCC as_of reads (planner + executor) + SkeinQL as_of parameter (docs/TIME_TRAVEL_REPLAY.md)
  • [x] T181: SQL compatibility surface for as_of reads (session variable + query hint). Latest: MySqlSessionState now carries a skein_as_of_ms field; SET @@skein.as_of = '<iso>' | <epoch_ms> | NULL | DEFAULT parses ISO-8601 (with Z / ±HH:MM offsets, fractional seconds) and integer epoch-milliseconds values to control time-travel reads for subsequent SELECTs. Optimizer-style query hint /*+ SKEIN_AS_OF('<ts>') */ extracted and stripped in sql_exec before parsing to override the session value per-statement. Both forms thread through to the MVCC-aware query_select as_of filter (T180). 5 unit tests (parse_as_of_timestamp_accepts_iso_and_epoch_forms, parse_skein_as_of_assignment_value_handles_null_default_and_iso, extract_skein_as_of_hint_strips_hint_and_returns_epoch_ms, plus 2 integration tests) cover parsing, session SET/clear, and SELECT filtering via both hint and session variable.
  • [x] T182: History retention policy + garbage collection for old versions. Latest: new maintenance.history.* RPC surface — maintenance.history.status reports per-table live/tombstone/purgeable counts plus oldest_tombstone_commit_ts_ms; maintenance.history.set_policy persists history.retention.enabled and history.retention.window_ms via the settings subsystem; maintenance.history.gc purges MVCC tombstones whose commit_ts_ms <= horizon (explicit params or derived from retention policy). Pre-T180 tombstones (commit_ts_ms == 0) are always retained for safety. GC rebuilds the pk_index, bumps table_version so secondary indexes refresh lazily, clears cached vector indexes, and persists each touched table. maintenance.history.status is included in the read-only RPC allowlist alongside maintenance.compaction.status. Three engine unit tests (history_gc_purges_old_tombstones_and_preserves_live_rows, history_gc_retains_pre_t180_tombstones, history_gc_horizon_filters_recent_tombstones) cover the basic purge path, the pre-T180 safety retention, and the commit_ts_ms > horizon filter.
  • [x] T183: Replay bundle format + export/import tooling + deterministic replay runner. Latest: maintenance.replay.export now emits a typed replay bundle containing table schema snapshots, retained row versions, filtered ChangeEvent metadata, per-table checksums, and an overall manifest checksum; maintenance.replay.import materializes that bundle into a hidden .replay_workspaces/<workspace_id> data root; and maintenance.replay.run reopens the imported workspace and verifies deterministic integrity by recomputing canonical table and bundle checksums. The CLI now exposes skeindb replay export, skeindb replay verify, and skeindb replay run, and the server advertises the new replay methods through the SkeinQL RPC surface. Coverage includes CLI parse tests, an engine roundtrip unit test (replay_bundle_export_import_run_roundtrip), and an HTTP/RPC integration test (t183_replay_bundle_export_import_run_roundtrip).
  • [x] T184: SkeinAdmin pages for time travel and replay bundles + integrity status. Latest: SkeinAdmin now exposes a dedicated Time Travel & Replay panel with point-in-time query.select execution via as_of, live history retention/status controls backed by maintenance.history.*, replay bundle export/download/import flows backed by maintenance.replay.*, session-local replay workspace tracking, and a rendered checksum/integrity summary after maintenance.replay.run. Asset coverage includes skeinadmin_replay_panel_exposes_time_travel_and_integrity_controls.

Phase 20 - Dedup-preserving encryption

Current truth: crypto primitives, envelope helpers, rotation helpers, and settings.encryption.* controls exist; data/encryption.json now persists per-database mode/active-key metadata plus the redacted audit ring, but the main engine read/write path does not yet route normal table values through EncryptedValueStore and master key bytes still remain in-memory after restart. - [x] T190: Key management + AEAD wrappers (ENC_RANDOM, ENC_MLE_DB) (docs/CONVERGENT_ENCRYPTION.md). Latest: skeindb-core now exposes a standalone encryption baseline via encryption::DatabaseKeyManager, database-scoped encryption profiles, active key selection, and EncryptionEnvelope wrappers for ENC_RANDOM and ENC_MLE_DB. ENC_RANDOM derives a mode-specific AES-256-GCM-SIV key from the database master secret and uses randomized nonces; ENC_MLE_DB derives a deterministic content key from the database master secret plus a SHA-256 plaintext digest and returns an envelope carrying the derivation salt for later persistence work in T191. 2026-04-23 hardening: ENC_MLE_DB no longer uses a fixed zero AEAD nonce — the nonce is now HKDF-derived from the same (master_key, plaintext_digest) scope via a distinct info label, so dedup convergence is preserved while the fixed-nonce review finding is closed. Focused integration coverage lives in crates/skeindb-core/tests/encryption.rs. - [x] T191: ValueStore encryption metadata + encrypt/decrypt paths. Latest: new EncryptedValueStore wrapper in skeindb-core (crates/skeindb-core/src/encrypted_valuestore.rs) layers put_encrypted / get_decrypted / read_envelope / reencrypt_value over an underlying ValueStore without changing the on-disk .vseg format — the EncryptionEnvelope is fully self-describing (mode code, scope id, optional key id, optional 12-byte nonce, optional 32-byte derivation salt, ciphertext) and is stored as an ordinary ValueKind::Cell blob. EncryptionEnvelope::from_stored_bytes is a strict parser that rejects trailing bytes and bad version codes. Off values bypass the envelope entirely and are stored as raw bytes. Coverage: envelope_stored_bytes_roundtrip_via_from_stored_bytes, encrypted_value_store_roundtrip_under_three_modes (ENC_OFF / ENC_RANDOM / ENC_MLE_DB). - [x] T192: Key rotation + background re-encryption task + progress reporting. Latest: DatabaseKeyManager::rotate_active_key(db, new_key_id) returns a KeyRotationPlan { db, mode, previous_key_id, new_key_id }; DatabaseKeyManager::reencrypt_envelope(ctx, env, &mut ReencryptionProgress) rewrites a single envelope under the new active key (or returns Ok(None) when no rewrite is required) and updates inspected/rewritten/skipped-off/skipped-current counters plus the (previous_key_id, new_key_id) rotation context. EncryptedValueStore::reencrypt_value(ctx, value_id, &mut progress) writes the rewritten envelope back into the same ValueStore (old envelopes are left in place so historical reads continue to work under prior keys until a separate GC pass collects them). Coverage: rotate_active_key_then_reencrypt_envelope_progress_counters, encrypted_value_store_reencrypt_value_writes_new_envelope. - [x] T193: settings.encryption. SkeinQL endpoints + SkeinAdmin UI + audit notes. Latest: new RPC methods settings.encryption.status, settings.encryption.set_mode, settings.encryption.register_key, settings.encryption.set_active_key, settings.encryption.rotate_key are wired through Engine (Engine.key_manager, Engine.encryption_audit) and the JSON-RPC dispatcher (crates/skeindb/src/server.rs). Master key bytes are accepted as base64 (standard / URL-safe / URL-safe-no-pad), validated to decode to exactly 32 bytes, and never persisted* — operators re-register keys after restart. Mutating endpoints append a redacted EncryptionAuditEntry to a 256-event in-memory ring exposed by settings.encryption.status (recent_audit). SkeinAdmin gains a dedicated Encryption panel (sidebar + top tab) with cards for Status, Set Mode, Register Key, Set Active Key, and Rotate Key. Read-only allowlist + mysql_known_method registry are extended with the new method names. Coverage: skeinadmin_encryption_panel_exposes_key_management_controls.

Phase 21 - Workload-guided compaction scheduler

  • [x] T200: Telemetry signals for compaction (L0 pressure, stalls, latencies) (docs/COMPACTION_SCHEDULER.md). Latest: stats.snapshot now scans live .rseg segment files for L0 pressure, records bounded soft/hard pressure events, and exposes recent point/range/write rates plus read/write latency percentiles for SkeinAdmin and future scheduler inputs.
  • [x] T201: Budget-based compaction scheduler + peak windows + bounds enforcement. Latest: persisted compaction.* settings now drive a live heuristic scheduler state in stats.snapshot.compaction.scheduler, including configured/effective IO+CPU budgets, peak-window scaling, task priority scoring, a pressure-driven background worker that can batch multiple removable file-pressure .rseg tasks in one tick while still rewriting canonical live-table segments as needed, and hard-pressure safe-mode write throttling for write-classified SkeinQL/HTTP requests.
  • [x] T202: maintenance.compaction.* endpoints (status/set_policy/pause/resume). Latest: maintenance.compaction.status, maintenance.compaction.set_policy, maintenance.compaction.pause, and maintenance.compaction.resume now expose and persist runtime scheduler policy through the main RPC surface, maintenance.compaction.status reports real worker runs, current/last task metadata, bytes rewritten/reclaimed, orphan cleanup counts, and last-error state, and SkeinAdmin now renders those worker counters directly.
  • [x] T203: Evaluation harness scripts + dashboards for stall rate and p99 latency. Latest: eval/compaction_scheduler_dashboard.py now emits a deterministic summary JSON, timeline CSV, and self-contained HTML dashboard comparing fixed leveling, fixed tiering, and workload-guided policies on stall rate and p99 latency.

Phase 22 - SQL autoparameterization and plan cache

  • [x] T210: SQL normalization (fingerprints) + parameter extraction (docs/AUTOPARAMETERIZATION.md)
  • [x] T211: Plan cache keyed by fingerprint + schema version + session flags
  • [x] T212: Integrate autoparam with query coalescing, ETag caching, and telemetry
  • [x] T213: SQL session variable: SET @@skein.autoparameterize = 1 + safety rules
  • [x] T214: SkeinAdmin top queries grouped by fingerprint + suggested parameter schemas

Phase 23 - CDC and dependency-driven changefeeds

  • [x] T220: WAL-to-change-event translator (table-level insert/update/delete) (docs/CDC_CHANGEFEED.md). Latest: the persisted CDC change log now records commit_ts_ms plus lsn-style sequence metadata for table-level insert/update/delete events and acts as the retained WAL-equivalent source for cdc.poll / SSE replay.
  • [x] T221: cdc.subscribe_table + cdc.poll/cdc.ack/cdc.close endpoints. Latest: table subscriptions now accept optional exact pk tuple filters, inclusive single-column pk_range bounds, ops allowlists over insert / update / delete source events, changed-column columns filters that only emit events where at least one selected column value changes and project included row images down to those columns, and format: "objects_json" | "plain_json" delivery; durable cdc_subscriptions.json state is now format v8 with backward-compatible v1/v2/v3/v4/v5/v6/v7 loading.
  • [x] T222: Dependency-driven query changefeeds (cdc.subscribe_query) using ETag dependency sets. Latest: prepared queries can now create CDC subscriptions through cdc.subscribe_query, and cdc.poll emits dependency-driven invalidate events carrying the current query ETag whenever an allowed source op touches one of the prepared query's dependency tables, including base tables reached through view dependencies, tables referenced by set-operation branches like UNION / UNION ALL, and base tables referenced inside CTE definitions while suppressing the CTE names themselves as fake physical tables; optional exact pk filters, inclusive pk_range filters, and changed-column columns filters are applied to the triggering table event before the invalidate is emitted, included triggering row images are projected to those columns, and polling recomputes that base-table change set from the stored query so legacy durable subscriptions continue invalidating correctly after restart.
  • [x] T223: SSE/WebSocket streaming endpoint + backpressure + reconnect semantics. Latest: GET /api/v1/cdc/sse/{sub_id} and GET /api/v1/cdc/ws/{sub_id} stream both table CDC events and query invalidation events in the subscription's persisted event format, replay from the retained change log in bounded batches, resume from Last-Event-ID or from_offset, and emit backpressure / resnapshot control messages when lag, pause, or retention state requires operator attention.
  • [x] T224: Retention + resnapshot protocol when WAL horizon is exceeded. Latest: bounded retained CDC history now reports earliest_offset / latest_offset, cdc.poll returns explicit resnapshot_required responses when a consumer falls behind the retained horizon, and SSE emits a resnapshot control event with the same recovery metadata.
  • [x] T225: SkeinAdmin CDC page + subscription management + lag visualization. Latest: SkeinAdmin now exposes a dedicated CDC panel for table/query subscribe, poll, pause, resume, ack, and close flows with session-local lag bars, runtime pressure summaries, backpressure state badges, and recent-event inspection.

Phase 24 - Website and documentation site polish

  • [x] T230: Homepage: add Docs nav CTA, mobile hamburger menu, maturity badges on feature cards, fix broken links (architecture image, paper), consistent API endpoints
  • [x] T231: Docs site homepage: sync with public site (mobile menu, Docs CTA, maturity badges, fixed quickstart endpoint, correct paper link)
  • [x] T232: Docs landing (docs.html): add client-side search/filter, mobile menu, polished footer, keyword metadata on cards
  • [x] T233: Footer overhaul across all pages — structured 4-column footer with Product/Documentation/Community sections
  • [x] T234: Research tracks on public site converted to clickable links pointing to docs/site/research pages

Phase 25 — PostgreSQL wire protocol compatibility

  • [x] T400: PG v3 wire protocol primitives (pg_wire.rs) — message framing, encode/decode for StartupMessage, RowDescription, DataRow, CommandComplete, ErrorResponse, ParameterStatus, BackendKeyData, Terminate. Includes PG connection handler with simple query protocol, trust/SCRAM auth, SSL rejection, and delegation to the shared SQL execution engine. 20 unit tests + 6 integration tests.
  • [x] T401: SCRAM-SHA-256 authentication — RFC 5802/7677 SASL exchange with trust mode fallback. Implements full SCRAM-SHA-256 state machine in pg_wire::scram module: HMAC-SHA-256, PBKDF2-HMAC-SHA256 (4096 iterations), ScramCredentials (stored_key + server_key derivation), ScramServer (client-first → server-first → client-final → server-final with proof verification). Wire helpers: write_auth_sasl, write_auth_sasl_continue, write_auth_sasl_final, parse_sasl_initial_response, parse_sasl_response. PG connection handler upgraded from cleartext to SCRAM-SHA-256 when SKEINDB_TOKEN is set; trust mode when unset. Deterministic salt derivation via pg_scram_salt_for_token. 12 new unit tests (HMAC known vector, PBKDF2, credential derivation, full exchange success/failure, GS2 header rejection, nonce missing, SASL message parsing).
  • [x] T402: PG session state — pg_settings HashMap on MySqlSessionState initialized with 13 PG defaults; SET key = value / SET key TO value parsing (with LOCAL/SESSION prefix support); RESET key / RESET ALL; pg_bootstrap_setting_value reads session overrides first; ParameterStatus sent to client on SET/RESET; SHOW, current_setting(name), and current_setting(name, missing_ok) now reflect session values, with the two-argument form returning NULL for unknown bootstrap probes when missing_ok = true; current_schema with optional parentheses and current_schemas(bool) now derive from the effective search_path, current_catalog aliases current_database(), and bootstrap current_user / current_role / session_user / user probes preserve the startup username. 10 unit tests.
  • [x] T403: PG connection handler + listener (in server.rs) — SSL negotiation (reject with 'N'), startup message parsing, trust/SCRAM auth, ParameterStatus batch, BackendKeyData, ReadyForQuery, simple query command loop on port 5432 (configurable via --pg flag, default 5432, 0 disables)
  • [x] T404: PG SQL dialect parser (pg_rewrite_sql) — double-quoted identifiers → backtick-quoted, $$dollar quoting$$ → single-quoted, :: type casts → CAST(… AS …), IS [NOT] DISTINCT FROM → null_safe_eq, FETCH FIRST n ROWS ONLY → LIMIT n, ARRAY[…] → PG array literal string. ILIKE and boolean literals were already implemented. RETURNING deferred to T405 (DML).
  • [x] T405: PG DML extensions — ON CONFLICT DO NOTHINGINSERT IGNORE INTO, ON CONFLICT (...) DO UPDATE SET ... EXCLUDED.colON DUPLICATE KEY UPDATE ... VALUES(col) via pg_rewrite_on_conflict post-pass; INSERT/UPDATE/DELETE ... RETURNING col1, col2, * extracted and stripped at pg_dispatch_sql level with follow-up SELECT using PK lookup for INSERT RETURNING; COPY table [ (col, ...) ] TO STDOUT, COPY table [ (col, ...) ] FROM STDIN, and COPY (SELECT ...) TO STDOUT now work over simple and extended query flows for the default text protocol plus explicit WITH (FORMAT text) / WITH (FORMAT csv) / WITH (FORMAT binary), PostgreSQL keyword-style WITH (TEXT) / WITH (CSV) / WITH (BINARY) aliases, legacy bare WITH TEXT / WITH CSV / WITH BINARY forms, text/csv NULL '...', CSV HEADER, CSV HEADER MATCH on import, and single-byte DELIMITER, QUOTE, and ESCAPE on supported CSV forms; unsupported COPY formats/options still return 0A000. Expanded unit and integration coverage.
  • [x] T406: PG DDL — SERIAL/BIGSERIAL/SMALLSERIAL → auto_increment + i64 type, CREATE SCHEMA → CREATE DATABASE alias (with IF NOT EXISTS), CREATE INDEX CONCURRENTLY (accepted/ignored), CREATE INDEX IF NOT EXISTS, COMMENT ON (silently accepted). 9 unit tests.
  • [x] T407: PG type OID mapping + encoding — bool→16, i64→20, text→25, jsonb→3802, timestamp→1114, arrays; text + binary format. Added 11 array OID constants (BOOL_ARRAY, INT4_ARRAY, INT8_ARRAY, FLOAT4_ARRAY, FLOAT8_ARRAY, TEXT_ARRAY, VARCHAR_ARRAY, DATE_ARRAY, TIMESTAMP_ARRAY, JSONB_ARRAY, UUID_ARRAY) with array_element_oid/scalar_to_array_oid utilities. Enhanced type inference heuristic from 3 to 10 types (bool, i64, f64, date, time, datetime, uuid, json, bytes, string). Added encode_binary_value() for PG binary wire format (BOOL, INT4, INT8, FLOAT4, FLOAT8, TEXT, VARCHAR, JSON, JSONB, UUID, BYTEA). Bind handler now accepts binary result format codes and stores them in PgPortal; Execute path applies format-aware encoding via pg_format_code_at — binary columns use encode_binary_value, text columns use pg_text_value_for_column. 29 new unit tests.
  • [x] T408: PG result encoding — RowDescription, DataRow, CommandComplete ("INSERT 0 1"), ErrorResponse with SQLSTATE codes. Latest: simple and extended PG queries now emit typed text-format RowDescription metadata for common numeric/text results, DataRow payloads, PG-style CommandComplete tags for DML/DDL, and ErrorResponse SQLSTATEs end-to-end over the live listener.
  • [x] T409: PG system catalogs (pg_catalog.rs) — pg_database, pg_namespace, pg_roles, pg_authid, pg_user, pg_group, pg_tablespace, pg_am, pg_description, pg_tables, pg_views, pg_indexes, pg_matviews, pg_sequences, pg_stats, pg_class, pg_attribute, pg_type, pg_index, pg_constraint, pg_proc (basic builtin metadata), pg_settings, pg_stat_activity, pg_stat_database. Latest: catalog tables are served through the shared virtual-table executor, including pg_am (heap/btree access methods aligned with current pg_class.relam OIDs), pg_description (currently empty but exposed with PostgreSQL-correct OID/int4/text column metadata), pg_class (tables + index entries with relkind r/i), pg_attribute (column metadata with SkeinDB→PG type OID mapping), pg_index (primary key + secondary indexes with indkey position vectors), pg_indexes (PostgreSQL-style index definitions), pg_stat_database (single-row database counters), pg_constraint (primary key p + unique u constraints with conkey arrays), and pg_proc metadata for the bootstrap builtin set plus selected timestamp/UUID/array/string/aggregate helpers with aggregate prokind rows, both current substring arities, both current current_setting arities, and current_schemas(bool). Column-level OID overrides ensure correct wire types for catalog-specific columns (OID, bool, int4, float4/float8).
  • [x] T410: PG startup query handling — SELECT version(), current_database(), current_catalog, current_schema (with optional parentheses), current_schemas(bool), current_user, current_role, session_user, user, SHOW server_version, SHOW server_version_num, SHOW standard_conforming_strings, SHOW max_identifier_length, SHOW transaction isolation level, and SELECT current_setting(...) including the two-argument missing_ok form for the common startup/bootstrap probes used by psql/Django/Rails/SQLAlchemy-style clients
  • [x] T411: PG extended query protocol — Parse/Bind/Describe/Execute/Sync/Close/Flush, named statements + portals, $1/$2 parameter placeholders. Latest: the PG listener now keeps connection-local prepared statements and portals, supports text-format Parse / Bind / statement+portal Describe / Execute / Close / Flush, substitutes $1/$2 placeholders through the shared SQL engine, and uses Sync to recover cleanly after extended-protocol errors.
  • [x] T412: PG function mapping (pg_functions.rs) — string_agg, array_agg, gen_random_uuid, to_char/to_timestamp, date_trunc, extract(epoch FROM ...), jsonb_build_object, ->>/#>> operators, || concat, ~/~* regex, ARRAY operations, unnest. Latest: shared PG function coverage now includes split_part(text, delimiter, n) with positive and negative field indexes, text RowDescription metadata on the PG wire path, and matching pg_catalog.pg_proc builtin metadata.
  • [x] T413: PG transaction semantics — ReadyForQuery status byte (I/T/E), failed-tx-block semantics, SAVEPOINT/RELEASE/ROLLBACK TO. Latest: the PG listener now preserves session state across simple queries, emits ReadyForQuery as I / T / E, rejects commands in aborted transaction blocks with 25P02, treats COMMIT after an aborted transaction as a rollback, and wires SAVEPOINT / RELEASE SAVEPOINT / ROLLBACK TO SAVEPOINT into the existing undo-log bookkeeping.
  • [x] T414: PG SQLSTATE error codes — 42P01 (undefined table), 42703 (undefined column), 23505 (unique violation), 42601 (syntax error), etc. Latest: PG simple-query errors now translate shared-engine/MySQL-style failures into PostgreSQL SQLSTATEs, including undefined tables, undefined columns, unique violations, syntax-path parser errors, unsupported features, savepoint lookup failures, and failed-transaction-block errors.
  • [x] T415: PG compatibility test corpus (tests/compat/pg_corpus.sql) — mirror MySQL corpus structure for PG dialect. Latest: tests/compat/pg_corpus.sql now exercises the current PG baseline over the live listener, covering startup probes, shared-engine SQL, and transaction/savepoint behavior through a dedicated pg_compat_corpus_roundtrip integration test.
  • [x] T416: PG unit tests — SQLSTATE error code mapping (7 tests), type OID mapping for all MySqlStmtColumnType variants, pg_text_value bool normalization + null handling, sql_type_to_desc PG type coverage (serial/boolean/real/decimal/json/blob/timestamp), sql_detect_verb for CREATE/DROP SCHEMA and COMMENT ON, pg_rewrite_sql edge cases (casts in function args, nested parens, mixed features). 19 new tests added in this pass, bringing total PG unit test count to ~40.
  • [x] T417: PG integration tests — SCRAM-SHA-256 auth (success + wrong-password rejection), binary result format via extended query, type OID inference for BOOL/INT8/FLOAT8/TEXT literals. 4 new integration tests added covering the T401 SCRAM handshake end-to-end, binary DataRow encoding, and RowDescription type OID correctness.
  • [x] T418: PG compatibility documentation (docs/PG_COMPAT.md) — refreshed to the current partial baseline (startup/auth, SSL rejection, simple query, tx/savepoint semantics, text-format extended query protocol, and PG corpus coverage) and linked backlog gaps

Phase 26 - Distribution and installation

  • [x] T419: Debian packaging metadata + signed apt repository publication pipeline. Latest: cargo-deb metadata is wired into crates/skeindb/Cargo.toml, and tagged releases can publish a signed apt branch with Packages, Release, InRelease, and exported key material.
  • [x] T420: Homebrew tap formula + release automation. Latest: the repo now ships Formula/skeindb.rb for tap-based installs, supports immediate HEAD installs from this repo, and tagged releases auto-render a stable formula from the release source tarball.

Research Agenda Extensions (Optional)

The repository includes a January 2026 research agenda with 20 proposals.

  • Overview: docs/RESEARCH_AGENDA.md
  • Task-level research tasks (T230+): docs/RESEARCH_BACKLOG.md

These items are intentionally separated from the core phases above to keep the main build plan focused.