Skip to content

forensics features migrate

Upgrade every feature parquet to the current schema version.

Real-corpus parquets store only article_id; URLs live in articles.db. The migrator JOINs against that DB once per run to derive section for every row. If the DB is missing, rows without a url column fall back to section = "unknown" (with a WARNING per file).

Terminal window
forensics features migrate [OPTIONS]
OptionDescription
--features-dir PATHOverride the features directory (default: <project_root>/data/features).
--articles-db PATHOverride the SQLite DB used for the article_id -> url JOIN (default: <project_root>/data/articles.db).
--dry-runLog the would-be changes without touching any files.
--helpShow this message and exit.