Monitoring and metrics¶

This guide shows the fastest way to inspect a live SkeinDB node: health endpoint, Prometheus metrics, SkeinQL stats, and the built-in admin UI.

Prerequisite: Quickstart completed.

1. Start the server¶

cargo run -- serve --data ./data --http 8080 --mysql 3306 --pg 5432

2. Check basic liveness¶

curl -s http://127.0.0.1:8080/health

Use this for load balancers and simple smoke checks.

3. Inspect Prometheus-style metrics¶

SkeinDB exposes a plain metrics endpoint at GET /metrics:

curl -s http://127.0.0.1:8080/metrics | head -40

Minimal Prometheus scrape config:

scrape_configs:
  - job_name: skeindb
    static_configs:
      - targets: ["127.0.0.1:8080"]

4. Pull a runtime stats snapshot over SkeinQL¶

curl -s -XPOST http://127.0.0.1:8080/api/v1/rpc \
  -H 'Content-Type: application/json' \
  -d '{"skeinql":"1.0","id":1,"method":"stats.snapshot","params":{}}' \
  | jq '.result | {uptime_s, qps, tps, sessions, process, storage, background}'

At the current baseline, the snapshot includes:

uptime
active and total sessions
QPS and TPS
process CPU and RSS
storage metrics such as wal_bytes and dedup_ratio
background task status

5. Watch the same data in SkeinAdmin¶

Open:

http://127.0.0.1:8080/admin

The most useful panels for day-to-day visibility are:

Overview for live health and headline server numbers
Telemetry for metrics-oriented summaries
Cluster when you are running replicas or shards
Time Travel & Replay and Forensics for operational investigations

6. What to watch first¶

For a single node, start with these checks:

wal_bytes rising steadily without checkpoints or compaction progress.
dedup_ratio dropping sharply after a workload change.
CPU or RSS growth that does not come back down after the workload quiets.
Background tasks stuck in a non-idle state.

Next¶

Edit this page on GitHub · View source