🔬 What's Novel
- Natural language interface to a structured JSON-RPC database API (not SQL)
- Dependency-tracking-based query explanation generation for verifying AI-generated queries
- Human-in-the-loop verification protocol showing explanations + sample results before execution
- Empirical comparison of structured (SkeinQL) vs. SQL-based natural language interfaces
🔧 Technical Approach
Phase 1 — NL-to-SkeinQL Translation
LLM translation with schema context, worked examples, and SkeinQL documentation in prompts. Structured JSON-RPC output is easier to validate than free-form SQL.
Phase 2 — Explanation Generation
Dependency tracking generates natural language explanations: "This query will return rows from table X where column Y matches Z, touching these dependencies…"
Phase 3 — Verification Protocol
UI showing: generated query explanation in plain English, sample results (dry run on data subset), user actions to confirm, modify, or reject before full execution.
Phase 4 — Refinement Loop
Iterative refinement where user feedback is incorporated into subsequent generation attempts. Conversation history provides additional context for disambiguation.
🧪 Hypotheses
LLMs generate more accurate queries in SkeinQL's structured JSON-RPC format than in free-form SQL.
Dependency tracking generates explanations that help users verify query intent without understanding SkeinQL syntax.
Iterative refinement with dependency-based feedback converges to correct queries faster than direct SQL editing.
🔗 SkeinDB Integration
📚 Key References
- Scholak et al. — "PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models" (2021)
- Rajkumar et al. — "Evaluating the Text-to-SQL Capabilities of Large Language Models" (2022)