All Research Tracks
R04 Β· Security & Privacy

Differentially Private Aggregate Queries via SkeinQL

Organizations need to query sensitive data while protecting individual privacy. Differential privacy provides formal guarantees but is hard to retrofit onto SQL. SkeinQL's structured, versioned interface creates an opportunity to enforce differential privacy at the API level, with privacy budget management integrated into the query lifecycle and privacy-aware ETag-based caching.

Research Proposal β€” Mapped to backlog in docs/RESEARCH_BACKLOG.md

πŸ”¬ What's Novel

πŸ”§ Technical Approach

Phase 1 β€” Sensitivity Analysis

Automatic sensitivity computation for SkeinQL aggregates (COUNT, SUM, AVG, percentiles). Handle joins and GROUP BY through composition theorems to bound total sensitivity.

Phase 2 β€” Budget Management

Per-user/role privacy budgets with per-query-fingerprint tracking. Budget refresh policies (daily, per-session) and alerts when approaching budget exhaustion.

Phase 3 β€” Noise Mechanisms

Calibrated noise addition: Laplace mechanism for numeric aggregates, exponential mechanism for categorical selections. Support both global and local differential privacy models.

Phase 4 β€” Cache Integration

ETag semantics extended with privacy metadata β€” a cached result's ETag encodes both data freshness and privacy cost, enabling privacy-aware HTTP caching without double-spending budget.

πŸ§ͺ Hypotheses

H1

SkeinQL's structured representation enables automated sensitivity analysis without requiring user-provided sensitivity bounds.

H2

Privacy budget integration with query fingerprints enables efficient tracking and enforcement across multi-user sessions.

H3

ETag-based caching naturally extends to privacy-aware caching, validating both freshness and remaining privacy constraints.

πŸ”— SkeinDB Integration

SkeinQL RPC
ETag Validators
Dependency Tracking
ValueID Store
Wasm Runtime

πŸ“š Key References