Skip to content

ADR-002: Storage Layer Using SQLite, Parquet, and DuckDB

Accepted - 2026-04-20

The current scaffold writes JSON/markdown artifacts into data/, but the roadmap expects larger forensic datasets and local analytics workflows. A storage direction is needed before scaling ingestion and analysis.

Target a layered storage strategy:

  • SQLite for lightweight transactional metadata and run indexing
  • Parquet for columnar feature/analysis datasets
  • DuckDB for local analytical queries across Parquet/SQLite-backed data

The existing JSON outputs remain valid bootstrap artifacts until migration tasks are implemented.

  • Positive:
    • Supports both reproducible pipelines and ad hoc forensic analysis.
    • Keeps local developer ergonomics strong with DuckDB SQL over files.
    • Enables incremental migration from scaffold JSON outputs.
  • Negative:
    • Introduces format and schema coordination overhead.
    • Requires migration and compatibility tests when introducing new sinks.
  • Define storage schemas and migration checkpoints.
  • Add storage adapter tests for each target sink.
  • Document operator commands in docs/RUNBOOK.md when implemented.