π¬ What's Novel
- First integration of differential privacy into a database's native API layer (vs. external query rewriting)
- Automatic sensitivity analysis for structured SkeinQL query operations
- Privacy-aware cache validation protocol extending HTTP ETags with privacy metadata
- Practical privacy budget management system for multi-user environments
π§ Technical Approach
Phase 1 β Sensitivity Analysis
Automatic sensitivity computation for SkeinQL aggregates (COUNT, SUM, AVG, percentiles). Handle joins and GROUP BY through composition theorems to bound total sensitivity.
Phase 2 β Budget Management
Per-user/role privacy budgets with per-query-fingerprint tracking. Budget refresh policies (daily, per-session) and alerts when approaching budget exhaustion.
Phase 3 β Noise Mechanisms
Calibrated noise addition: Laplace mechanism for numeric aggregates, exponential mechanism for categorical selections. Support both global and local differential privacy models.
Phase 4 β Cache Integration
ETag semantics extended with privacy metadata β a cached result's ETag encodes both data freshness and privacy cost, enabling privacy-aware HTTP caching without double-spending budget.
π§ͺ Hypotheses
SkeinQL's structured representation enables automated sensitivity analysis without requiring user-provided sensitivity bounds.
Privacy budget integration with query fingerprints enables efficient tracking and enforcement across multi-user sessions.
ETag-based caching naturally extends to privacy-aware caching, validating both freshness and remaining privacy constraints.
π SkeinDB Integration
π Key References
- McSherry β "Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis" (2009)
- Wilson et al. β "Differentially Private SQL with Bounded User Contribution" (2020)
- Kotsogiannis et al. β "Privates: Ultra Fast Privacy Tests" (2019)