forensics scrape
Crawl and fetch articles for configured authors
forensics scrape [OPTIONS] COMMAND [ARGS]...Options
Section titled “Options”| Option | Description |
|---|---|
--discover | Run WordPress author discovery only |
--metadata | Collect article metadata only |
--fetch | Fetch HTML and extract article text only |
--dedup | Run near-duplicate detection only |
--archive | Compress data/raw/{year}/ to tar.gz |
--dry-run | With —fetch: report count without HTTP |
--force-refresh | With —discover: overwrite manifest |
--all-authors | Collect metadata for every author in the manifest (ignore config.toml list) |
--post-year-min INTEGER | Inclusive calendar year for WordPress posts (with —post-year-max); overrides config when set |
--post-year-max INTEGER | Inclusive calendar year for WordPress posts (with —post-year-min); overrides config when set |
--help | Show this message and exit. |
Built by Abstract Data