Skip to content

forensics scrape

Crawl and fetch articles for configured authors

Terminal window
forensics scrape [OPTIONS] COMMAND [ARGS]...
OptionDescription
--discoverRun WordPress author discovery only
--metadataCollect article metadata only
--fetchFetch HTML and extract article text only
--dedupRun near-duplicate detection only
--archiveCompress data/raw/{year}/ to tar.gz
--dry-runWith —fetch: report count without HTTP
--force-refreshWith —discover: overwrite manifest
--all-authorsCollect metadata for every author in the manifest (ignore config.toml list)
--post-year-min INTEGERInclusive calendar year for WordPress posts (with —post-year-max); overrides config when set
--post-year-max INTEGERInclusive calendar year for WordPress posts (with —post-year-min); overrides config when set
--helpShow this message and exit.