4 Corpus Audit & Chain of Custody
Forensic question: Can we prove the corpus was not modified between collection and analysis?
Input artifacts: - data/articles.db - data/analysis/corpus_custody.json — hash recorded at end of analysis
Output artifacts: - (none — verification only)
Run metadata: (auto-populated by first code cell)
4.0.1 Wayback spot-checks
Optional integrity spot-checks against the Internet Archive are network-bound; enable in a trusted environment and compare normalized content hashes to clean_text digests.
Summary finding: Hashing and scrape timestamp audits above establish reproducible chain-of-custody checks for downstream chapters.