4  Corpus Audit & Chain of Custody

Forensic question: Can we prove the corpus was not modified between collection and analysis?

Input artifacts: - data/articles.db - data/analysis/corpus_custody.json — hash recorded at end of analysis

Output artifacts: - (none — verification only)

Run metadata: (auto-populated by first code cell)

4.0.1 Wayback spot-checks

Optional integrity spot-checks against the Internet Archive are network-bound; enable in a trusted environment and compare normalized content hashes to clean_text digests.

Summary finding: Hashing and scrape timestamp audits above establish reproducible chain-of-custody checks for downstream chapters.