All Research Tracks
R12 · AI/ML Integration

Natural Language to SkeinQL with Verification

Natural language database interfaces often produce incorrect queries, and users cannot verify correctness without understanding the generated query. SkeinQL's structured JSON-RPC format is more amenable to LLM generation than raw SQL. SkeinDB's dependency tracking can help verify intent by enumerating what data a query could return, enabling a human-in-the-loop verification step before execution.

Research Proposal — Mapped to backlog in docs/RESEARCH_BACKLOG.md

🔬 What's Novel

🔧 Technical Approach

Phase 1 — NL-to-SkeinQL Translation

LLM translation with schema context, worked examples, and SkeinQL documentation in prompts. Structured JSON-RPC output is easier to validate than free-form SQL.

Phase 2 — Explanation Generation

Dependency tracking generates natural language explanations: "This query will return rows from table X where column Y matches Z, touching these dependencies…"

Phase 3 — Verification Protocol

UI showing: generated query explanation in plain English, sample results (dry run on data subset), user actions to confirm, modify, or reject before full execution.

Phase 4 — Refinement Loop

Iterative refinement where user feedback is incorporated into subsequent generation attempts. Conversation history provides additional context for disambiguation.

🧪 Hypotheses

H1

LLMs generate more accurate queries in SkeinQL's structured JSON-RPC format than in free-form SQL.

H2

Dependency tracking generates explanations that help users verify query intent without understanding SkeinQL syntax.

H3

Iterative refinement with dependency-based feedback converges to correct queries faster than direct SQL editing.

🔗 SkeinDB Integration

SkeinQL RPC
Dependency Tracking
Web Admin (SkeinAdmin)
Schema Introspection
LLM Gateway

📚 Key References