All Research Tracks
R11 · AI/ML Integration

LLM-Assisted Query Autoparameterization

SkeinDB's autoparameterization extracts literals into parameters, but some "literals" are semantically fixed (status codes, enum values, type discriminators). Language models, trained on code and SQL, can classify which literals should be parameterized vs. semantically fixed — improving plan cache hit rates while avoiding semantic errors.

Research Proposal — Mapped to backlog in docs/RESEARCH_BACKLOG.md

🔬 What's Novel

🔧 Technical Approach

Phase 1 — Dataset Construction

Collect query corpuses from open-source applications. Manually label each literal as "parameterizable" (user-supplied value) or "semantic constant" (enum, status code, discriminator).

Phase 2 — LLM Classification

Evaluate zero-shot (GPT-4/Claude), few-shot, and fine-tuned smaller models. Input context includes query text, schema definition, column names, and value statistics.

Phase 3 — Integration

Classify literals in the MySQL compatibility layer. Cache classification results per query fingerprint. Fall back to rule-based classification when LLM confidence is low.

Phase 4 — Feedback Loop

Plan cache misses trigger reclassification with additional context. Over time, the system builds a corpus of per-application parameterization knowledge.

🧪 Hypotheses

H1

LLMs can classify literals with high accuracy based on context (column names, query structure, and value patterns).

H2

Semantic parameterization achieves higher plan cache hit rates than purely syntactic parameterization.

H3

Fine-tuned small models perform with acceptable latency for online classification in the query path.

🔗 SkeinDB Integration

MySQL Compat Layer
Query Plan Cache
SkeinQL RPC
Autoparameterization
Telemetry

📚 Key References