Runbook
Operational quick reference. Agents: append new sections here whenever you discover debug techniques, resolve recurring errors, add CLI commands, or learn environment facts that future operators need.
Local Setup
Section titled “Local Setup”Peer reviewers
Section titled “Peer reviewers”-
One-shot:
make peer-setup—uv sync --extra dev --extra tui,uv run forensics validate, thenuv run forensics peer-setup(stdout:uv synctiers, spaCy wheel note, Quarto link, and oneollama pull <tag>per[baseline] modelsentry when that list is non-empty). -
Hints only:
uv run forensics peer-setup(after deps are installed). -
Endpoint smoke:
make peer-verify-networkrunsforensics validate --check-endpoints(WordPress + Ollama probes as warnings). -
Ollama models present:
uv run forensics peer-setup --check-ollamaruns the baseline preflight againstbaseline.ollama_base_urland printsPASSorFAIL(does not pull images). -
Analysis defaults: With an empty or omitted
[analysis] pipeline_b_modeinconfig.toml, the nested model default ispercentile(cross-author comparable Pipeline B). Override withpipeline_b_mode = "legacy"only when you need the older absolute-cosine behavior.
pipeline_b_mode and per-author config_hash (cohort compatibility)
Section titled “pipeline_b_mode and per-author config_hash (cohort compatibility)”HypothesisConfig.pipeline_b_mode participates in the analysis config hash (include_in_config_hash: true). Older on-disk runs that were produced when the effective default was legacy (for example, before the nested default switched to percentile) carry a different config_hash than current settings, even if your TOML never mentioned pipeline_b_mode.
Symptoms: compare-only or full runs fail validation with messages such as Analysis artifact compatibility failed or stale or mismatched analysis config hashes (from validate_analysis_result_config_hashes / _validate_compare_artifact_hashes before comparison_report.json is rebuilt).
Remediation: Re-run uv run forensics analyze for the full cohort so each data/analysis/<slug>_result.json is regenerated with the current hash, or set pipeline_b_mode = "legacy" in config.toml consistently when you intentionally need to reproduce the legacy Pipeline B behavior against existing artifacts. Downstream consumers (comparison_report.json, report hash gates) expect target and control *_result.json files to match the live analysis hash.
-
Install dependencies:
uv sync -
For Phase 10 (baseline generation):
uv sync --extra baseline -
Validate environment:
uv run ruff check . && uv run ruff format --check . -
Run tests:
uv run pytest tests/ -v -
Run with coverage:
uv run pytest tests/ -v --cov-report=term-missing(coverage targetforensicsis configured inpyproject.tomladdopts) -
Optional Textual TUI:
uv sync --extra tuienablestests/test_tui.pyand related progress tests. A plainuv syncskips them; coverage config omitsforensics/tui/*from the aggregate denominator sofail_under=75still passes without the extra. -
Coverage including TUI (optional): after
uv sync --extra dev --extra tui, measure the Textual package with the alternate RC so the denominator includesforensics/tui/*:Terminal window uv run pytest tests/ -v --cov=forensics --cov-config=docs/coverage-tui.toml --cov-report=term-missingCI keeps the default
pyproject.toml[tool.coverage.run]omit list for speed and stablefail_under.
Automated pipeline E2E (tests/integration/test_pipeline_end_to_end.py)
Section titled “Automated pipeline E2E (tests/integration/test_pipeline_end_to_end.py)”- What it covers: Seeded
articles.db(two authors:fixture-target+fixture-control),FORENSICS_CONFIG_FILEpointing attests/integration/fixtures/e2e/config.toml, andimportlib.import_module("forensics.config.settings")+monkeypatchon that module’s_project_rootsoget_settings().db_pathand artifact paths resolve under a disposable workspace (avoids theforensics.config.settingsname shadowing the real settings submodule). Scrape is not run (DB is populated in-process). Stages:extract_all_features(..., skip_embeddings=True)→run_full_analysis(..., exploratory=True)→ optional Quartorun_report(ReportArgs(notebook="index.qmd", report_format="html"))whenquartois onPATH(copies repoindex.qmd+_quarto.ymlinto the temp root first). - Regression gate:
data/analysis/comparison_report.jsonmust contain a non-emptytargets["fixture-target"]entry (guards the “no configured target → empty comparison” failure mode). The seeded corpus asserts changepoint / convergence signal vs control (seetests/integration/fixtures/e2e/corpus_seed.py). - Markers:
@pytest.mark.integrationonly (notslow). Defaultuv run pytest tests/still runs this file as part of the unit job; CI also runs a dedicatedintegrationworkflow job (pytest tests/ -m integration -v --no-covafteruv run python -m spacy download en_core_web_md). For a local slice matching CI:uv run pytest tests/ -m integration -v --no-cov(always pass--no-covwhen selecting only integration tests sofail_underis not measured on a tiny denominator).
Running integration tests locally
Section titled “Running integration tests locally”- Sync deps:
uv sync(and extras your branch’s CI uses — the integration job installs spaCyen_core_web_md). - Install the model CI uses:
uv run python -m spacy download en_core_web_md. - Run the integration-marked suite:
uv run pytest tests/ -m integration -v --no-cov. - Single-file E2E only:
uv run pytest tests/integration/test_pipeline_end_to_end.py -m integration -v --no-cov.
If you need the default addopts out of the way for a one-off: uv run pytest tests/integration/test_pipeline_end_to_end.py -v --override-ini "addopts=-ra -q --strict-markers" --no-cov.
Headless / agent invocation
Section titled “Headless / agent invocation”For scripts, CI, or LLM agents driving the CLI:
- Flags: pass global options before the subcommand:
uv run forensics --output json --non-interactive <subcommand> …. See.claude/skills/forensics-cli/SKILL.md(mirrored under.cursor/skills/forensics-cli/) for recommended command sequences, TUI guardrails, and when to retry. - Exit codes:
docs/EXIT_CODES.mddefines the contract (0ok,2usage,3missing resource,4transient/retry,5conflict/idempotent skip). Parse the process exit code before parsing stdout. - Stdout vs stderr: with
--output json, successful commands emit exactly one JSON envelope line on stdout (ok,type,schemaVersion,dataorerror); status and logs belong on stderr. Do not scrape stdout for prose when in JSON mode. - Discovery:
uv run forensics --output json commandsdumps the full command tree (params, help, examples) for tooling.
GitNexus code graph (reindex)
Section titled “GitNexus code graph (reindex)”After large refactors or merges, refresh the local graph so impact/context tools stay accurate:
- Default:
npx gitnexus analyze(from repo root). - Embeddings preserved: If
.gitnexus/meta.jsonshowsstats.embeddingsgreater than zero, usenpx gitnexus analyze --embeddings. Runninganalyzewithout--embeddingsremoves any previously generated index embeddings (seeAGENTS.md/ CLAUDE GitNexus section).
Phase 17 diagnostic columns (§11.3 notebook)
Section titled “Phase 17 diagnostic columns (§11.3 notebook)”Phase 17 adds exploratory report-side diagnostics alongside FindingStrength (which is unchanged). They are implemented in src/forensics/models/report.py and src/forensics/models/direction_priors.py (re-exported from forensics.models) and surfaced in notebooks/09_full_report.ipynb §11.3.
| Column / symbol | Meaning |
|---|---|
DirectionConcordance (direction_ai, direction_mixed, direction_non_ai, direction_na) | After collapsing to one hypothesis test per feature_name (max |Cohen’s d|), compares the sign of d to AI_TYPICAL_DIRECTION priors. ≥50% of prior-backed features matching the AI-typical direction yields direction_ai; mixed partial agreement yields direction_mixed; no matches with at least one oppose yields direction_non_ai; no usable priors yields direction_na. Thresholds are exploratory until locked in data/preregistration/preregistration_lock.json. |
DirectionBreakdown | Counts dir_match / dir_oppose / dir_no_prior (and optional feature lists) for transparency tables. |
VolumeRampFlag (volume_stable, volume_growth, volume_ramp, volume_decline, volume_unknown) | Uses n_post / n_pre from the first non-degenerate hypothesis row with usable sample counts (n_pre > 0, neither count -1). Bands: stable [0.5, 2.0], growth (2, 5], ramp > 5, decline < 0.5. Confound: a large ratio often reflects corpus expansion or cadence change, not model use; pair with direction columns. The 5× ramp cutoff is exploratory. |
volume_ratio | The ratio used for the flag (or null when unknown). |
CI fixtures: tests/fixtures/phase17/golden_cases.json plus tests/integration/test_phase17_classification.py assert golden direction/volume/strength tuples without reading gitignored data/analysis/. After a local full analyze, you may copy window-scoped rows from data/analysis/<slug>_result.json into that fixture and update expected in the same commit so CI stays deterministic.
TODO (not implemented): optional env/CLI overrides for concordance or volume bands; until then, treat diagnostics as fixed code + pre-reg lock discipline.
Phase 16 hash-break migration
Section titled “Phase 16 hash-break migration”Phase 16 intentionally changes the analysis-config hash, corpus fingerprint, and embedding revision contract. Treat any pre–Phase-16 data/analysis/*_result.json and preregistration locks as stale relative to a Phase-16 config.toml until you re-lock and re-run (see GUARDRAILS Sign: Pre-Phase-16 locked artifacts must be re-locked).
Pre-registration lock (template → confirmatory)
Section titled “Pre-registration lock (template → confirmatory)”TL;DR — confirmatory IS the default. uv run forensics analyze enforces verify_preregistration() at line 507 of cli/analyze.py and refuses to run unless either (a) the lock matches the current config.toml analysis thresholds, or (b) you explicitly opt out with --exploratory. There is no “set it as default” knob — refusing-to-run-without-a-lock is already the wired behavior.
Typical operator workflow:
| Action | Command | When |
|---|---|---|
| Initial lock | uv run forensics lock-preregistration | Once, after the methodology team agrees on thresholds |
| Confirmatory analyze | uv run forensics analyze [--changepoint …] | Every routine run; lock is silently checked first |
| Methodology change | edit config.toml, then uv run forensics --yes lock-preregistration, then re-run analyze | Only when an intentional threshold change has been agreed |
| Sensitivity / dev iteration | uv run forensics analyze --exploratory … | When poking at thresholds before deciding to lock |
To check the live lock state at any time: cat data/preregistration/preregistration_lock.json | jq '{locked_at, content_hash}'. To check whether the most recent run was confirmatory: jq '.preregistration_status, .exploratory' data/analysis/run_metadata.json — ok and false (or absent) means confirmatory.
Preregistration: publication lock checklist (governance)
Section titled “Preregistration: publication lock checklist (governance)”Before treating pipeline outputs or reports as confirmatory for external publication:
- Committed lock —
data/preregistration/preregistration_lock.json(and any amendment notes such asdata/preregistration/amendment_*.mdreferenced by your process) are committed on the branch you are releasing;locked_atis non-null andanalysis+content_hashare present. - Lock matches config — After any intentional
config.tomlanalysis-threshold change, runuv run forensics --yes lock-preregistrationand commit the updated JSON. Local gate (same as CI):uv run python scripts/verify_repo_preregistration_lock.pymust exit0. - Confirmatory runs only — Publication runs use
uv run forensics analyzewithout--exploratory(and the same rule forforensics all/ automation unless explicitly marked exploratory). - Run metadata — After the publication analyze, archive or cite
data/analysis/run_metadata.json:preregistration_statusmust be"ok"andexploratorymust befalseor absent.
CI enforces (2) for every push/PR via the Preregistration lock job in .github/workflows/ci-quality.yml (verify_repo_preregistration_lock.py).
Detail / lifecycle:
- Write or refresh the lock from the current
config.tomlthresholds:uv run forensics lock-preregistration→ updatesdata/preregistration/preregistration_lock.jsonwithlocked_at(UTC ISO),analysissnapshot, andcontent_hash. - Template / exploratory state: the committed repo default is an unfilled lock (
{"locked_at": null}only).verify_preregistrationreportsstatus="missing"— confirmatoryanalyzeexits non-zero until you runlock-preregistrationor pass--exploratory. - Verify after a run: read
data/analysis/run_metadata.json→preregistration_statusisok,missing, ormismatch. A mismatch means the live settings no longer match the lock; confirmatory analyze hard-fails (exit code 1) after writing run metadata underrid=preregistration-blocked.
Pre-publication checklist (confirmatory lock)
Section titled “Pre-publication checklist (confirmatory lock)”Use this before treating any analysis drop as publication-ready (client deliverable, filing, or sworn work product). Exploratory runs (--exploratory) are fine for development; they must not be relabeled as confirmatory without completing the steps below.
- Lock artifacts in version control: commit
data/preregistration/preregistration_lock.jsonand any active amendment or methodology notes underdata/preregistration/(for exampleamendment_phase15.md) on the same branch as the analysis config you intend to ship. - Config parity: the committed
config.toml(orFORENSICS_CONFIG_FILEused in CI) must be the same file that was hashed when the lock was written. After any threshold or analysis-model change, runuv run forensics --yes lock-preregistrationand commit the updated lock. - Confirmatory run: execute
uv run forensics analyzewithout--exploratoryfor the final corpus slice you are publishing. Do not hand-editrun_metadata.json. - Record proof in run metadata: open
data/analysis/run_metadata.jsonfrom that run and confirmpreregistration_statusisok,exploratoryisfalseor absent, andpreregistration_messageis empty or informational (not a mismatch explanation). - Optional sanity:
jq '.preregistration_status, .exploratory' data/analysis/run_metadata.jsonshould printokthenfalse(ornull).
CI automation: the Preregistration lock matches config.toml job in .github/workflows/ci-quality.yml runs scripts/verify_repo_preregistration_lock.py on every push/PR so the committed lock cannot be a template or out of sync with repo config.toml. For helper-level regression coverage, run tests/test_preregistration.py locally or via the main test job. These checks do not substitute for steps 1–4 above — they do not ship your production lock or run your full analyze corpus.
Embeddings (quarantine + re-extract)
Section titled “Embeddings (quarantine + re-extract)”- Default operator policy: when
analysis.embedding_model/embedding_model_version/embedding_model_revisionno longer match the first row ofdata/embeddings/manifest.jsonl, feature extraction archives the entiredata/embeddings/tree todata/embeddings_archive_<UTC>/and starts clean (quarantine + re-extract). Re-runuv run forensics extractafter updating the HF revision pin inconfig.toml. - SentenceTransformer revision: vectors are produced with
SentenceTransformer(model, revision=…)using[analysis] embedding_model_revision(commit SHA or branch). Each manifest row storesmodel_revisionnext to the legacymodel_versionlabel. - Analyze without re-extracting: there is no supported path to silently mix revisions in confirmatory mode. For exploratory runs only,
forensics analyze --exploratory --allow-pre-phase16-embeddingsloads batches whose manifest revision differs from config, emitting a WARNING per article instead of raising. Confirmatory runs (default) always hard-fail on mismatch so drift and downstream statistics cannot blend incompatible embedding spaces.
Corpus custody (corpus_custody.json) — one-cycle v1 / v2
Section titled “Corpus custody (corpus_custody.json) — one-cycle v1 / v2”schema_version: 2:corpus_hashfingerprints the analyzable corpus: non-duplicates only, ordered bycontent_hash(stable under insert order).corpus_hash_v1: legacy fingerprint (ORDER BY id, all rows) kept for one transition cycle so older verification semantics can be compared; see GUARDRAILS for removal timing (Phase 17).verify_corpus_hash: dispatches onschema_version(missing field → treat as v1).forensics analyzecorpus gate:--verify-corpus/--no-verify-corpusare explicit overrides. If neither is passed, analyze uses[chain_of_custody] verify_corpus_hashfromconfig.toml(repo defaulttrue; CI/minimal fixtures often setfalseso tests do not require a pre-seededcorpus_custody.json).- Analyze chain-of-custody CLI overrides:
--verify-raw-archives/--no-verify-raw-archivesand--log-all-generations/--no-log-all-generationsmirror[chain_of_custody]for a single run (same tri-state pattern as corpus verify). - Analyze subcommands:
forensics analyze run …is equivalent to the default callback (explicit entrypoint).forensics analyze compare-only …runs compare-only with the same custody/author/--compare-pairoptions as the main command. - Config audit:
uv run forensics config auditlists analysis fields that differ fromAnalysisConfigdefaults; add--jsonfor machine-readable output. [chain_of_custody] verify_raw_archives: whentrue,scrape --archivelogs a post-condition check after eachdata/raw/{YYYY}.tar.gzis written and SQLite paths are rewritten (non-empty archive on disk + rewrite row count).[chain_of_custody] log_all_generations: whentrue, each baseline article write emits a singleINFOline (custody {…}JSON) fromforensics.baseline.orchestratorfor audit trails.
Quick E2E spot-check (single author, exploratory)
Section titled “Quick E2E spot-check (single author, exploratory)”When data/articles.db already has rows for a slug (skip live scrape if you prefer): uv run forensics extract --author <slug> → uv run forensics analyze --exploratory --author <slug> [--changepoint …] → uv run forensics report (Quarto on PATH). Inspect under data/analysis/: <slug>_result.json (config_hash), corpus_custody.json (schema_version, corpus_hash, corpus_hash_v1), <slug>_hypothesis_tests.json (Phase 16 fields: n_pre, n_post, n_nan_dropped, skipped_reason, degenerate), <slug>_convergence.json (n_rankable_per_family when convergence ran). For HTML-only fetch without discover/metadata/dedup/archive: uv run forensics scrape --fetch (same flag set as FETCH_ONLY in dispatch_scrape).
Dedup performance cliff above hamming_threshold = 3
Section titled “Dedup performance cliff above hamming_threshold = 3”Near-duplicate detection (forensics.scraper.dedup) compares 128-bit simhashes with Hamming distance. The default scraping.simhash_threshold is 3 (aligned with the four 32-bit LSH banding guarantee). Raising the threshold widens the “near duplicate” neighborhood: each increment increases pairwise comparisons and union-find work superlinearly on large corpora. If you need a looser dedup, prefer bounded batches or profiling first — do not raise the threshold on full-site runs without measuring wall time and duplicate-review cost.
Migrating simhash fingerprints after D-01 (NFKC normalization)
Section titled “Migrating simhash fingerprints after D-01 (NFKC normalization)”Fingerprint values are versioned (dedup_simhash_version, current = v2 in code as SIMHASH_FINGERPRINT_VERSION). Rows with a missing version or a version other than v2 are excluded from the cached fingerprint set until recomputed; running dedup without migrating first can admit historical near-duplicates that no longer match stored bands.
- Recompute all stale rows:
uv run forensics dedup recompute-fingerprints(optional--db PATH,--limit Nfor tests). Text (default): prints a one-line human summary on stdout. JSON:uv run forensics --output json dedup recompute-fingerprintsemits one envelope on stdout; read.dataforrecomputed,skipped,errors.
Pipeline Operations
Section titled “Pipeline Operations”- Run full pipeline:
uv run forensics all— implementation:src/forensics/pipeline.py(run_all_pipeline). It runs full scrape (same as bareforensics scrapewhen no scrape flags are set), then extract, thenrun_analyze(AnalyzeRequest(stages=AnalyzeStageFlags(timeseries=True, convergence=True)))(not changepoint/drift unless you change the pipeline), then Quarto report. Seedocs/ARCHITECTURE.md. - Stage-by-stage (recommended when debugging):
uv run forensics scrape(use--discover/--metadata/--fetchetc. as needed; see--help)uv run forensics extractuv run forensics analyze(add--changepoint,--drift, … as needed). Each analyze run callsverify_preregistration(settings)before stages (seesrc/forensics/cli/analyze.py); threshold drift vsdata/preregistration/preregistration_lock.jsonlogs at WARNING, anddata/analysis/run_metadata.jsonrecordspreregistration_status(ok/missing/mismatch). SQLite in analyze (ADR-009 Option A): analyze still opensdata/articles.dbviaRepositoryfor slug ↔author_idand roster wiring only; Parquet /batch.npz/ manifests supply the measured signals. Keep the samearticles.dbthat extract used (do not swap or truncate authors between extract and analyze without re-extracting), or joins and manifest filters can silently drop or mis-attribute rows.uv run forensics report(requires Quarto onPATH; output underdata/reports/per_quarto.yml)
- Extract probability features (Phase 9):
uv run forensics extract --probability - Generate AI baseline (Phase 10):
uv run python scripts/generate_baseline.py --author {slug} - Validate environment before a run:
uv run forensics preflight(pass--strictto promote warnings to failures). Hard-fails on Python < 3.13, missingen_core_web_sm, disk < 5 GB, config parse errors, or placeholder authors; warns for Quarto/Ollama/sentence-transformers cache misses. Machine-readable preflight:uv run forensics --output json preflightprints one JSON envelope on stdout (sort_keys=True; keysok,type,schemaVersion,data). Thedataobject holdsstatus(ok/warn/fail),strict,checks(eachname/status/message),has_warnings, andhas_failures; exit codes match text mode (1only when any check isfail). Global--outputmust appear before the subcommand name. Ifuvever mis-parses flags, useuv run -- forensics --output json preflight. - Lock pre-registration thresholds:
uv run forensics lock-preregistrationwritesdata/preregistration/preregistration_lock.json(SHA256-hashed canonical JSON). Run before firstanalyzeto convert the run from exploratory to confirmatory. If a filled lock already exists, the CLI exits 5 (CONFLICT) unless you pass the global confirm flag before the subcommand:uv run forensics --yes lock-preregistration(notlock-preregistration --yes). Analyze always invokesverify_preregistration(same return statuses). Seesrc/forensics/preregistration.py. - Convergence permutation null (Phase 12 §5b): under
[analysis]inconfig.toml, setconvergence_use_permutation = trueto draw an empirical null for each convergence window (p-values are logged only; detected windows are unchanged). Defaults:convergence_use_permutation = false(CPU),convergence_permutation_iterations = 1000,convergence_permutation_seed = 42. Wired fromsrc/forensics/config/settings.pyintocompute_convergence_scoresinsrc/forensics/analysis/orchestrator/(runner +comparison.py) andsrc/forensics/analysis/comparison.py. - Blind newsroom survey (Phase 12 §1):
uv run forensics surveyruns the full pipeline across every qualified author on the manifest and ranks them by composite AI-adoption signal. Options:--dry-run(list qualified authors, no analysis),--resume <run_id>(skip authors already indata/survey/run_<id>/checkpoint.json),--skip-scrape(reuse existing corpus),--author <slug>(single-author debug run),--min-articles/--min-span-days(override[survey]thresholds). Output lands underdata/survey/run_<id>/withcheckpoint.json(written after each author) andsurvey_results.json(ranked, with the natural control cohort). Thresholds default toSurveyConfiginconfig.toml(min_articles=50,min_span_days=730,min_words_per_article=200,min_articles_per_year=12.0,require_recent_activity=true,recent_activity_days=180). Natural controls are authors whose composite score ≤ 0.2 ANDSignalStrength.NONE; seesrc/forensics/survey/scoring.py::identify_natural_controls. - Survey parallelism: with more than one pending author,
run_surveymay dispatchProcessPoolExecutorworkers sized by envSURVEY_AUTHOR_WORKERSor defaultmin(8, os.cpu_count()). Child processes do not inherit parentpytestmonkeypatches — the survey test stubs setSURVEY_AUTHOR_WORKERS=1to force sequential in-process fakes. - Calibration suite (Phase 12 §4):
uv run forensics calibratevalidates detector accuracy against synthetic ground truth. Options:--positive-trials <n>(spliced-corpus trials, default 5),--negative-trials <n>(unmodified-corpus trials, default 5),--author <slug>(target author; otherwise most prolific),--seed <int>(splice-date RNG, default 42),--output <path>(override report path),--dry-run(emit an empty report without touching the DB — smoke-test only). Positive trials substitute post-splice articles with Phase 10 baseline AI text loaded fromdata/ai_baseline/<slug>/articles.json; missing file triggers a warning and a best-effort no-op splice. Each trial runs in an isolateddata/calibration/run_<ts>/{positive,negative}_NN/tree with its ownarticles.db. Final metrics (sensitivity,specificity,precision,f1_score,median_date_error_days) land indata/calibration/calibration_<ts>.json. A real calibration run is expensive (extract + full analysis per trial); the--help+ pytest suite (tests/test_calibration.py) is the CI smoke test. - Validate config + environment (Phase 12 §7a):
uv run forensics validateparsesconfig.toml, reports author count, runsrun_all_preflight_checks(settings), and prints PASS/WARN/FAIL per check. Exits1when any preflight check hard-fails (spaCy model missing, placeholder authors, disk < 5 GB, config parse error, Python < 3.13). Pass--check-endpointsto also probehttps://www.mediaite.com/wp-json/wp/v2/typesandhttp://localhost:11434/api/tagswith a 3s timeout — endpoint results are reported as PASS/WARN but do not affect the exit code. Use as a pre-commit or CI gate before running the pipeline. Preflight logic lives insrc/forensics/preflight.py; the probes live insrc/forensics/cli/__init__.py::_probe_endpoint. - Single-file DuckDB export (Phase 12 §7b):
uv run forensics export [--output PATH] [--no-features] [--no-analysis]foldsdata/articles.db(authors + articles via DuckDB’ssqliteextension), optionaldata/features/*.parquet, and optionaldata/analysis/*_result.jsoninto a single.duckdbfile (defaultdata/forensics_export.duckdb). Query it with any DuckDB client (duckdb data/forensics_export.duckdbthenSHOW TABLES).ExportReportreturnsoutput_path,bytes_written, and atablesdict of per-table rowcounts. The export lives insrc/forensics/storage/duckdb_queries.py::export_to_duckdb;*.duckdbis gitignored. - Interactive setup wizard (Phase 12 §2):
uv sync --extra tuionce (installstextual>=1.0.0+rich>=13.0), thenuv run forensics setup(or the bundledforensics-setupscript) launches a 5-step Textual wizard: Dependencies (Python / spaCy / sentence-transformers / Quarto / Ollama status with pass/warn/fail icons), Discovery (probesarticles.dbfor existing authors and lets you pick blind-survey vs hand-pick mode), Config (generates a completeconfig.tomlfrom user inputs with timestamped backup of any existing file), Preflight (re-runsrun_all_preflight_checks(settings)against the freshly written config), and Launch (emits the recommended next CLI command —forensics surveyorforensics all— and exits so the user runs it in the shell with live logs). Keybindings:qquit,nnext,bback. Module lives atsrc/forensics/tui/; core helpers (check_dependencies,generate_config,write_config,discover_authors_summary) are unit-testable without the Textual runtime (seetests/test_tui.py). Theforensics setupTyper subcommand exits1when thetuiextra is not installed and prints a friendly install hint. - Survey dashboard + calibration notebooks (Phase 12 §6a+§6c): after
forensics surveyorforensics calibrate, render the team-facing dashboard with Quarto —quarto render notebooks/10_survey_dashboard.ipynb --to html(top-10 ranked authors, composite-score histogram with natural-controls overlay, earliest-convergence-window timeline, preregistration verification) orquarto render notebooks/11_calibration.ipynb --to html(sensitivity/specificity/precision/F1/median-date-error table, confusion-matrix heatmap, date-error histogram). Both notebooks locate the most recentdata/survey/run_*/survey_results.json/data/calibration/calibration_*.jsonautomatically, and degrade gracefully (printing arun forensics survey/calibrate firsthint) when no data is present — safe to re-render at any time. - Per-author drill-down renders (Phase 12 §6b): notebooks 05-07 now carry a
parameters-tagged cell. To render a per-author drill-down, pass the slug via Quarto parameters —quarto render notebooks/05_change_point_detection.ipynb -P author_slug:some-slug --to html(same for06_embedding_drift.ipynb,07_statistical_evidence.ipynb). Default isauthor_slug = "all"so existing renders are unchanged. - Evidence-chain narrative (Phase 12 §6d):
from forensics.reporting.narrative import generate_evidence_narrative; generate_evidence_narrative(analysis_result, "jane-doe")returns a deterministic ~200-400 word factual paragraph suitable for inclusion in the published report. Passscore=,control_count=, andpreregistration=(aVerificationResultfromverify_preregistration()) to enrich the output. The function is pure — same inputs always produce byte-identical text — so it is safe to paste verbatim in confirmatory contexts.
Exit codes and warnings
Section titled “Exit codes and warnings”- Stages return non-zero on fatal errors (scrape failure, missing Quarto, analysis
typer.Exit, report subprocess failure).forensics allpropagates the first non-zero code. forensics allreturns exit code2when preflight hard-fails (distinct from1used by analyze).insert_analysis_runat the start ofall/ scrape / extract / analyze is best-effort: SQLite permission or I/O errors logCould not record analysis_runs rowand the stage still continues where the code path allows.forensics scrapemay exit4(TRANSIENT) when the run recorded at least onescrape_errors.jsonlline, every logged line is classifiedtransient: true(timeouts, exhausted 429/5xx retries, etc.), and there was no successful ingest/fetch outcome for the run (seedocs/EXIT_CODES.md). Each JSONL row now includes a booleantransientfield for downstream tooling.
Expected Artifacts
Section titled “Expected Artifacts”After a successful full run, verify (paths depend on configured authors):
data/articles.db— corpus +analysis_runsdata/authors_manifest.jsonl— post–discover manifestdata/features/{slug}.parquet— per-author featuresdata/embeddings/{slug}/batch.npz— embeddings when not skippeddata/analysis/— per-author*_result.json,run_metadata.json, and other stage JSON as enabled- When present,
run_metadata.json→section_residualized_sensitivity.analysis_diris a project-relative path (e.g.data/analysis/sensitivity/section_residualized), not an absolute path — resolve with repo root when opening artifacts.
- When present,
data/reports/— Quarto HTML/PDF outputs (not a singlereport.mdat repo root)
Phase 9 outputs: data/probability/{author_slug}.parquet, data/probability/model_card.json
Phase 10 outputs: data/ai_baseline/{author_slug}/, data/ai_baseline/generation_manifest.json
Legacy checklists that mention data/raw/documents.json, data/analysis/analysis.json, or data/pipeline/summary.json are obsolete for this codebase.
Ollama Setup (Phase 10)
Section titled “Ollama Setup (Phase 10)”Required for AI baseline generation. Not needed for Phases 1-9.
Peer reviewers (Makefile + CLI)
Section titled “Peer reviewers (Makefile + CLI)”make peer-setup— Installs Python deps withdev+tuiextras, runsforensics validate(includes preflight), thenforensics peer-setupfor copy-pasteuv synctiers and oneollama pull …line per tag in[baseline] models(from the activeconfig.toml).make peer-hints— Runs onlyforensics peer-setup(use when deps are already synced).make install-reviewer/install-baseline/install-probability/install-all-extras— Seemake helpfor the exactuv synclines.forensics peer-setup --check-ollama— Probesbaseline.ollama_base_url/api/tagsfor reachability and that configured models are pulled (does not runollama pullfor you).
# Installbrew install ollama
# Pull models (~14GB total)ollama pull llama3.1:8bollama pull mistral:7bollama pull gemma2:9b
# Verifyollama list
# Preflight check from the pipelineuv run python scripts/generate_baseline.py --preflightHardware: M1 Mac with 32GB unified memory runs all three 7-9B models comfortably (one at a time, ~5GB each). Ollama keeps the last-used model in memory; expect ~10-15s cold load when switching.
Running baseline generation
Section titled “Running baseline generation”uv sync --extra baseline # install pydantic-ai + pydantic-evalsuv run python scripts/generate_baseline.py --preflightuv run python scripts/generate_baseline.py --author <slug> --dry-runuv run python scripts/generate_baseline.py --author <slug> --articles-per-cell 5uv run python scripts/generate_baseline.py --all
# Via the analyze CLI:uv run forensics analyze --ai-baseline --author <slug>uv run forensics analyze --ai-baseline --skip-generation --author <slug>uv run forensics analyze --verify-corpusArtifacts land under data/ai_baseline/{slug}/{model}/{mode}_{temp}/*.json
plus a top-level generation_manifest.json and per-cell embeddings/.
Quality-gate evals
Section titled “Quality-gate evals”uv sync --extra baselineuv run python evals/baseline_quality.py --model llama3.1:8buv run python evals/baseline_quality.py --all-models --output /tmp/reports.jsonThe PerplexityRangeCheck evaluator silently passes when the Phase 9 extra
(probability) is not installed — install both extras together for the full
gate: uv sync --extra probability --extra baseline.
Model Downloads (Phase 9)
Section titled “Model Downloads (Phase 9)”- GPT-2 reference model: ~500MB (auto-downloads on first
--probabilityrun) - Falcon-7B pair (Binoculars, optional): ~28GB full / ~8GB quantized
- Embedding model (all-MiniLM-L6-v2): ~80MB (auto-downloads on first embedding run)
Throughput expectations:
- Article-level perplexity only: ~10 articles/min on CPU with GPT-2.
- Sentence-level perplexity (computed alongside) is ~5× slower because each
sentence triggers its own forward pass; budget ~2 articles/min on CPU for
a full
compute_perplexityrun. GPU throughput is ~50 articles/min. - Binoculars (Falcon-7B pair) is GPU-only for practical runs; on CPU plan for ~1 article/min.
Running probability features
Section titled “Running probability features”uv sync --extra probability # install torch + transformers + accelerateuv run forensics extract --probability --author <slug>uv run forensics extract --probability --no-binoculars --device cpucat data/probability/model_card.json # pinned model revisions + digestArtifacts: data/probability/{author_slug}.parquet + data/probability/model_card.json.
Running tests with the slow gate
Section titled “Running tests with the slow gate”The default uv run pytest run skips tests marked @pytest.mark.slow (real
GPT-2 load + inference). To run them explicitly:
uv run pytest -m slow tests/test_probability.py -vuv run pytest -m "not slow" tests/ -v # default behaviorCommon Issues
Section titled “Common Issues”Command not found: uv
Section titled “Command not found: uv”- Install uv with the official script:
curl -LsSf https://astral.sh/uv/install.sh | sh
- Load uv into the current shell:
source "$HOME/.local/bin/env"
- Verify:
uv --version
Command not found: forensics
Section titled “Command not found: forensics”- Confirm dependencies are synced:
uv sync - Re-run via
uv run forensics --help
Missing output files
Section titled “Missing output files”- Re-run a focused stage command and inspect terminal output.
- Confirm the process has write access to the repository
data/directory.
Failing tests
Section titled “Failing tests”- Start with focused test runs:
uv run pytest tests/unit -vuv run pytest tests/integration -v
- Run specific test:
uv run pytest -k "test_name" -v - For a narrow validation run that should not enforce the repository-wide
coverage threshold, add
--no-cov(for example,uv run pytest --no-cov tests/unit/test_analyze_compare.py::test_name -q). - Fix regressions before adding new feature behavior.
Ollama connection refused
Section titled “Ollama connection refused”- Start the Ollama server:
ollama serve - Check it’s running:
curl http://localhost:11434/api/tags - If port conflict, check for existing process:
lsof -i :11434
uv sync fails on torch/transformers
Section titled “uv sync fails on torch/transformers”- These are large deps (~2GB for torch). Ensure enough disk space.
- On M1 Mac, torch installs the MPS-compatible version automatically.
- If resolution fails, try:
uv sync --refresh
Parquet schema mismatch
Section titled “Parquet schema mismatch”- If feature extraction schema changed, delete old Parquet files:
rm data/features/*.parquet - Re-run extraction:
uv run forensics extract - See GUARDRAILS.md Sign: “Parquet Schema Evolution”
Embedding model version mismatch
Section titled “Embedding model version mismatch”- Embedding model is pinned to
all-MiniLM-L6-v2(384-dim). - If embeddings look wrong, verify model: check
config.toml[features] section. - See GUARDRAILS.md Sign: “Embedding Model Version Mismatch”
Quality Checks
Section titled “Quality Checks”# Full pre-commit validationuv run ruff format .uv run ruff check . --fixuv run pytest tests/ -v
# Coverage reportuv run pytest tests/ -v --cov=src --cov-report=term-missing
# Property-based tests with statsuv run pytest tests/ -v --hypothesis-show-statisticsSection diagnostics (Phase 15 J3 / J6 / J7)
Section titled “Section diagnostics (Phase 15 J3 / J6 / J7)”URL-derived section tags (Phase 15 J1) unlock three diagnostic surfaces on
the analyze sub-app. All commands write deterministic JSON / CSV / Markdown
under data/analysis/; legacy forensics analyze --changepoint etc. still
work unchanged.
# J3 — newsroom-wide section descriptive report + J5 gate verdict.# Persists section_centroids.json, section_distance_matrix.{json,csv},# section_feature_ranking.json, and section_profile_report.md.uv run forensics analyze section-profileuv run forensics analyze section-profile --output /tmp/profile.mduv run forensics analyze section-profile --features-dir path/to/features
# J6 — per-author section-contrast tests (Welch + Mann-Whitney + per-family# BH; Phase 15 C2 helper). Output: data/analysis/<slug>_section_contrast.json.uv run forensics analyze section-contrast # every authoruv run forensics analyze section-contrast --author jane-doe # one author
# J7 — residualize-sections per-run override. Flips# analysis.section_residualize_features for the current process only;# config.toml is NOT modified. Use this for A/B comparisons against the# unadjusted CP run without touching the persisted config.uv run forensics analyze --residualize-sections --changepointuv run forensics analyze all --residualize-sections # via run_analyze()Operational notes:
section-contrastrequires authors to have ≥ 2 sections each with ≥ 30 articles (MIN_SECTION_ARTICLES). Authors below the bar emit{"pairs": [], "disposition": "insufficient_section_volume"}rather than raising — downstream consumers must render “N/A”.- A WARNING is emitted when every PELT feature passes BH for a single pair — wholly different registers across the entire feature set is suspicious and warrants a spot-check.
--residualize-sectionsis a hot-fix knob. Persistent toggling lives inconfig.tomlunder[analysis] section_residualize_features = trueso the change is captured by the config hash + preregistration lock.
Pre-registration lock workflow (confirmatory vs exploratory)
Section titled “Pre-registration lock workflow (confirmatory vs exploratory)”Every analyze run records preregistration_status in
data/analysis/run_metadata.json. The status comes from
verify_preregistration(settings) against
data/preregistration/preregistration_lock.json and is one of:
ok— a filled lock file matches the current analysis thresholds. The run is confirmatory.missing— no lock file (or the committed template, see below). The run is exploratory and any p-values are descriptive only.mismatch— a filled lock file exists but one or more analysis thresholds drifted since the lock. Logged at WARNING with a per-key diff; the run continues so an operator can inspect.
Files in this directory
Section titled “Files in this directory”data/preregistration/preregistration_lock.json— the operator-fillable lock template lives in the repo so a fresh checkout has a non-mismatching exploratory state out of the box. The template carries:preregistration_id— opaque identifier for the run planlocked_at/locked_by— null until the operator fills themconfig_hash— null until the operator fills itamended_from/amendments— pointers to the narrative docs (amendment_phase15.mdetc.) that justify the locked hypotheseshypotheses— H1..Hn list operator must populate before claiming a confirmatory resultexpected_directions— per-feature pre-declared direction of effect
data/preregistration/amendment_phase15.md— phase-amendment narrative (committed). Reference any new hypothesis here before locking.
How the analyze CLI reads the file
Section titled “How the analyze CLI reads the file”verify_preregistration short-circuits the unfilled template (where
locked_at is null AND analysis block is absent) to missing so the
committed template never trips a false mismatch. The first fully-filled
lock — written by uv run forensics lock-preregistration — populates the
canonical analysis snapshot + SHA256 content_hash and converts the
next run from exploratory to confirmatory.
Locking workflow
Section titled “Locking workflow”# 1. Edit the template and commit it (preregistration_id + hypotheses +# expected_directions are the operator-authored fields). Do NOT fill# locked_at / locked_by / config_hash by hand — those are written by# the lock-preregistration command in step 2.$EDITOR data/preregistration/preregistration_lock.jsongit add data/preregistration/preregistration_lock.jsongit commit -m "Pre-register analysis plan for <author> / <window>"
# 2. Snapshot the current thresholds + content hash. This OVERWRITES the# file with the canonical confirmatory lock — keep your template-edit# commit so the hypothesis history stays in git.uv run forensics lock-preregistration
# 3. Run the analysis. ``run_metadata.json::preregistration_status`` lands# as ``ok`` and the narrative report renders as confirmatory.uv run forensics analyzecat data/analysis/run_metadata.json | jq .preregistration_status # → "ok"Re-running step 2 after every config change keeps the lock current. If you change a threshold without re-locking, the next analyze run logs WARNING
- records
preregistration_status: "mismatch"— fix it before publishing.
Migrations (Phase 15)
Section titled “Migrations (Phase 15)”Storage-layer migrations now land through the Typer CLI rather than the old
scripts/ only path. Two entry points, both idempotent:
# SQLite: authors.is_shared_byline, schema_version bookkeepinguv run forensics migrate
# Feature parquets: stamp forensics.schema_version + add section columnuv run forensics features migrate # in place (writes backup copy)uv run forensics features migrate --dry-run # preview only, no writesforensics migratecallsRepository.apply_migrations(); the same runner also fires on everyRepositorycontext-manager open, so operators rarely need to invoke it directly — but it’s the canonical surface for amigrate-then-analyzedeploy script.forensics features migratewalksdata/features/*.parquetand runs the Phase-15 Step-0.3 helper. Backups land underdata/features/_pre_phase15_backup/(filename-preserving). Rollback is a straightmvof the backup copy.- Both commands tolerate missing target dirs (
data/,data/features/) with a friendly stderr message and exit code0.
Phase 15 CLI surface (analyze + survey)
Section titled “Phase 15 CLI surface (analyze + survey)”New flags and subcommands shipped during Phase 15. All are additive;
prior invocations remain valid. See docs/ARCHITECTURE.md for the
behavioural rationale.
# G1 — author-level parallelism (PR #60). Default 1 = serial.uv run forensics analyze --max-workers 8
# D — survey shared-byline filter (PR #71). Default excludes group bylines# (mediaite, mediaite-staff, ...). Pass to include them for transparency.uv run forensics survey --include-shared-bylines
# J2 — advertorial / syndicated section exclusion (PR #76). Default# excludes sponsored, partner-content, crosspost, etc. Both stages take# the same flag so a single override flips the corresponding stage.uv run forensics survey --include-advertorialuv run forensics analyze --include-advertorial
# J3 — newsroom-wide section descriptive diagnostic (PR #75). Writes# data/analysis/section_centroids.json, section_distance_matrix.json# (+ .csv mirror), section_feature_ranking.json, and# section_profile_report.md (J5 gate verdict embedded).uv run forensics analyze section-profileuv run forensics analyze section-profile --output /tmp/section_profile_test.md
# J6 — per-author section-contrast tests (Wave 3.3). Document forward-# compatibly; flag may merge in parallel with this runbook entry.uv run forensics analyze section-contrastuv run forensics analyze section-contrast --author <slug>
# J5 — optional section residualization before BOCPD (Wave 3.3, gated# on J3 verdict against real corpus data). Off by default.uv run forensics analyze all --residualize-sectionsPhase 15 debug + parity recipes
Section titled “Phase 15 debug + parity recipes”# E1 — Pipeline B per-window component DEBUG logs. Useful when# investigating drift / centroid-velocity regressions.FORENSICS_LOG_LEVEL=DEBUG uv run forensics analyze --drift --author <slug>
# H2 — serial vs parallel JSON artifact parity check. Confirms# author-level parallelism is byte-identical to a serial run. The# integration test lives at tests/integration/test_parallel_parity.py# (added by Wave 3.4).uv run forensics analyze # serial baselinemv data/analysis data/analysis_serialuv run forensics analyze --max-workers 4 # parallel rundiff -r data/analysis_serial data/analysis # expected: no output
# Evidence refresh — isolate each author under# data/analysis/parallel/<run_id>/<slug>/, validate per-author artifacts,# promote them to data/analysis/, then rebuild comparison metadata once.# Use this when canonical per-author result hashes are stale and the serial# refresh loop is too slow.uv run forensics analyze --parallel-authors --max-workers 3Phase 15 schema migration + benchmarks
Section titled “Phase 15 schema migration + benchmarks”# Storage migrations (covered above):uv run forensics migrate # SQLite (Phase D1, etc.)uv run forensics features migrate # parquet section columnuv run forensics features migrate --dry-run # preview only
# L1 — pre-Phase-15 wall-clock baseline + phase-by-phase benchmark.uv run python scripts/bench_phase15.py --author mediaite-staffPhase 1 — synthetic PELT null calibration (M-23)
Section titled “Phase 1 — synthetic PELT null calibration (M-23)”# Writes data/provenance/synthetic_null_pelt_calibration.json (Gaussian noise,# fixed penalty). Re-run after changing AnalysisConfig.pelt_penalty materially.uv run python scripts/synthetic_null_pelt_calibration.pyTyper subcommand registration pattern (Phase 15 L6)
Section titled “Typer subcommand registration pattern (Phase 15 L6)”New CLI subcommands follow this pattern so the dispatch table in
src/forensics/cli/__init__.py stays the single registration surface:
from typing import Annotatedimport typer
foo_app = typer.Typer(name="foo", help="One-line description.", no_args_is_help=True)
@foo_app.command(name="bar")def bar( flag: Annotated[bool, typer.Option("--flag", help="...")] = False,) -> None: """Subcommand docstring.""" # imports inside the function body keep CLI startup fast ...
# Simple top-level command (no sub-app):def my_cmd( arg: Annotated[str | None, typer.Option("--arg", help="...")] = None,) -> None: """Docstring — this is what shows in --help.""" ...And register inside src/forensics/cli/__init__.py:
from forensics.cli.foo import foo_app, my_cmd # noqa: E402
app.add_typer(foo_app, name="foo") # nested sub-appapp.command(name="mycmd")(my_cmd) # top-level commandThis is what Phase-15 L6 uses for forensics features migrate (sub-app)
and forensics migrate (top-level).
Git workflow (GitButler)
Section titled “Git workflow (GitButler)”Use GitButler CLI (but) for writes (commit, push, branch, merge, stash, rebase-style edits). The full command map and recipes live in the repo-local skill:
.claude/skills/gitbutler/SKILL.md(Claude Code).cursor/skills/gitbutler/SKILL.md(Cursor — same mirror)
Notion playbook add-on (parallel agents, but status --json, --json --status-after): .claude/skills/gitbutler-workflow/SKILL.md (mirrored under .cursor/skills/gitbutler-workflow/).
Project-specific notes (forge target, PRs) are in AGENTS.md under Learned Workspace Facts (GitButler bullet).
# Preflight before you commit (quality bar — run with git or but read-only)uv run ruff format .uv run ruff check . --fixuv run pytest tests/ -v
# Then use but for the actual commit/push (see gitbutler skill — e.g. but status -fv, but commit ... --status-after; optional JSON flow in gitbutler-workflow skill)
# Conventional commit prefixes for messages# feat: fix: refactor: test: docs: chore:Incident Handoff
Section titled “Incident Handoff”When handing off active work:
- Record the exact command(s) run.
- Capture failing tests or observed error text.
- Add status and next steps in
HANDOFF.md(required — see CLAUDE.md Session Boundaries).
Analysis orchestrator package layout
Section titled “Analysis orchestrator package layout”forensics.analysis.orchestrator is now a package split by concern:
orchestrator/timings.py—AnalysisTimings,_StageTimerorchestrator/per_author.py— per-author feature/drift/convergence/test assemblyorchestrator/parallel.py— process workers + isolated refresh floworchestrator/comparison.py— target/control resolution + compare-only pathorchestrator/sensitivity.py— section-residualized rerun pathorchestrator/staleness.py— stale detection + run-metadata mergeorchestrator/runner.py—run_full_analysisentrypoint
Import surface remains from forensics.analysis.orchestrator import ....
Phase 0 — preregistration lock, comparison report, AI baseline continuity
Section titled “Phase 0 — preregistration lock, comparison report, AI baseline continuity”Order of operations (punch-list Phase 0):
- Amendment (M-05): Append post-hoc threshold documentation to
data/preregistration/amendment_phase15.mdwhen Fix-F / Fix-G (or similar) apply. - Lock (M-01):
uv run forensics lock-preregistration— writesdata/preregistration/preregistration_lock.jsonwithlocked_at,analysis, andcontent_hash. Confirmatoryforensics analyze(without--exploratory) requiresverify_preregistration→ok. - Comparison (M-03): With exactly one
role = "target"inconfig.toml(M-04), runuv run forensics analyze --compareto populatedata/analysis/comparison_report.json. Ifvalidate_analysis_result_config_hashesfails, refresh per-author*_result.jsonunder the current analysis config hash first. - AI baseline metric (M-02): Intended path:
ollama servelocally,uv sync --extra baseline, thenuv run forensics analyze --ai-baseline --author <slug>(see[baseline]inconfig.toml). Cell prompts append a JSON delivery contract so local Llama checkpoints return{"headline","text","actual_word_count"};forensics.baseline.agent.parse_generated_article_textunwraps tool-shaped blobs and tolerates plain-text fallbacks. Stub continuity (local only —data/ai_baseline/is gitignored):uv run python scripts/seed_phase0_ai_baseline_stubs.pyis only for environments without Ollama. After real generation, re-runforensics analyze --drift --author <slug>(add--exploratory --allow-pre-phase16-embeddingsif article embedding manifests still lag the pinned HF revision). - Embedding revision drift: If
EmbeddingRevisionGateErrorappears during--drift, re-extract embeddings for the pinned revision or run drift exploratory withforensics analyze --drift --exploratory --allow-pre-phase16-embeddings(warnings only; not confirmatory).
Punch-list C/D/I — operational notes
Section titled “Punch-list C/D/I — operational notes”- C-06 (analyze vs SQLite): Options and approval gate are documented in
docs/adr/ADR-009-analyze-stage-sqlite-reads.md. No default behavior change until a path is chosen. - Scrape coverage summary (D-03):
forensics.scraper.coverage.write_scrape_coverage_summarycan write a JSON summary next todata/scrape_errors.jsonl(call from a scrape completion path or a one-off script when you need coverage metrics for reports). - Crawl summary (L-04): After
collect_article_metadata, the crawler writescrawl_summary.jsonalongsidescrape_errors.jsonl(per-author error buckets and top messages) viaforensics.scraper.coverage.write_crawl_summary_json. - Run metadata staleness (D-09):
run_metadata.jsonmay includelast_scraped_at(ISO) when scrape artifacts are present; seeforensics.utils.provenance.read_latest_scraped_at_iso. - Parallel analyze promotion (I-06): After a successful
--parallel-authorspromotion,data/analysis/parallel/<run>/parallel_promotion_complete.jsonrecords completion metadata for debugging “partial promote” issues. - Disk preflight (I-05): Helpers live in
forensics.utils.disk(free_disk_bytes,ensure_min_free_disk_bytes); wire into preflight/CLI where you need a hard stop before large writes. - Config fingerprint (I-01): Scraper-affecting fields and analysis seeds (LDA/UMAP/bootstrap, etc.) participate in
compute_model_config_hash/scraper_signal_digest; re-lock preregistration if you change those and run confirmatory analysis.
Documentation site (Astro Starlight)
Section titled “Documentation site (Astro Starlight)”The operator + API + ADR documentation lives under website/
and ships as a single static site at
https://abstract-data.github.io/mediaite-ghostink/. The embedded Quarto
forensic report is rendered into website/public/report/ at build time and
served as static HTML at /mediaite-ghostink/report/index.html (the sidebar and
in-site links use that path so Vite dev and static hosting agree; a bare
/mediaite-ghostink/report/ URL may 404 in local dev but usually resolves on
GitHub Pages).
This site supersedes the prior Cloudflare Pages deploy of the Quarto book
(.github/workflows/deploy.yml was removed when the Starlight site landed).
Local commands
Section titled “Local commands”make docs-cli # regenerate Typer CLI reference under website/src/content/docs/climake docs-python # regenerate Python API reference via pydoc-markdown (needs pydoc-markdown on PATH)make docs-quarto # render the Quarto book into website/public/reportmake docs-dev # start the local Starlight dev server (default http://localhost:4321/mediaite-ghostink/ — next free port if 4321 is taken)make docs-build # full production build: CLI + Quarto + API + Astromake docs-clean # remove generated content (synced, ADRs, CLI, API, embedded report)bun is the package manager (Abstract Data docs theme convention). The
template-provided scripts/build-python-docs.mjs requires pydoc-markdown
on PATH — install with pipx install pydoc-markdown.
Sync pipeline
Section titled “Sync pipeline”website/scripts/sync-docs.mjscopies the canonical operator markdown out ofdocs/into the Starlight content collection. Allow-listed top-level files (ARCHITECTURE.md,RUNBOOK.md,TESTING.md,GUARDRAILS.md,DEPLOYMENTS.md,EXIT_CODES.md) land underwebsite/src/content/docs/synced/; everydocs/adr/*.mdlands underwebsite/src/content/docs/adr/. The script injects YAML frontmatter (title,description,editUrl) and rewrites internal links to base-prefixed Starlight URLs (or to absolute GitHub URLs for off-list files).scripts/generate_cli_docs.pywalks the Typerforensicsapp and emits one page per command/subcommand.- All generated content is gitignored.
CI / hosting cutover
Section titled “CI / hosting cutover”The deploy workflow is
.github/workflows/deploy-docs.yml.
It runs on every push to main and on PRs touching website/, docs/,
notebooks/, _quarto.yml, index.qmd, src/forensics/**, the CLI
docs generator, pyproject.toml, uv.lock, or the workflow itself.
PRs run the build job only; deploys only happen on main pushes
(or explicit workflow_dispatch from main).
The build job mirrors make docs-build end-to-end:
- Checkout with
fetch-depth: 0— required so the per-tag worktrees (Option C versioned docs) can resolve historic release commits. Shallow clones produceworktree addfailures. uv sync --frozen --extra dev(with uv cache).pipx install pydoc-markdownsobun run docs:pythoncan shell out.quarto-actions/setup@v2thenoven-sh/setup-bun@v2.bun install --frozen-lockfileinwebsite/.bun run sync-versions— reads.release-please-manifest.jsonplusgit tag --list 'v*.*.*', keeps the most recent 5 by semver, marks the manifest version asdefault, and writes the resolvedversions[]intowebsite/scripts/python-autodoc.jsonandwebsite/scripts/cli-autodoc.json.bun run docs:cli— per-version CLI reference. For eachversions[]entry the orchestrator (website/scripts/build-cli-docs.mjs): creates agit worktreepinned at the tag, runsuv sync --frozen --extra devinside it, then invokes the current main copy ofscripts/generate_cli_docs.pyfrom that worktree’s venv (uv run --directory <wt>). The default version is also emitted at the un-versioned URL (/cli/forensics-preflight/etc.) by re-running the generator with--version-segment "".quarto render --output-dir website/public/report(embedded report — evergreen, always reflectsmain, not versioned).bun run docs:python— per-version Python API reference. Same worktree-per-tag pattern as CLI, viabuild-python-docs.mjs.bun run build(sync-docs+astro check+astro build).- A smoke-test step asserts the canonical entry points exist in
website/dist/. It resolves the current default safe-tag from the manifest (v0.1.2→0-1-2) so the assertion tracks release-please without manual edits, then verifies all of:- Evergreen:
/,/getting-started/,/synced/architecture/,/synced/runbook/,/adr/,/report/,sitemap-index.xml. - Default-version aliases:
/cli/,/cli/forensics/,/cli/forensics-preflight/,/api/forensics/,/api/forensics_pipeline/. - Per-version subdirs:
/cli/<safeTag>/,/cli/<safeTag>/forensics-preflight/,/api/<safeTag>/,/api/<safeTag>/forensics_pipeline/. If any generator regresses, CI hard-fails before Pages sees anything.
- Evergreen:
Concurrency: builds use deploy-docs-${{ github.ref }} (cancellable for
PR ref churn); the deploy job uses the GitHub-recommended shared pages
group with cancel-in-progress: false so a live deploy is never
interrupted by a newer run.
Versioned documentation (Option C)
Section titled “Versioned documentation (Option C)”The Python API and Typer CLI references are versioned per release tag,
while operator docs, ADRs, getting-started, the landing page, and the
embedded Quarto report stay evergreen (always reflect main).
How versions are resolved
website/scripts/sync-versions.mjs
is the single source of truth. It reads
.release-please-manifest.json to
discover the current package version, cross-references git tag --list 'v*.*.*', sorts by semver (descending), keeps the most recent 5, and
writes the array into both autodoc configs. Re-running it after a new
release-please bump is the only step needed to surface a new version in
the docs:
make docs-versions # or: cd website && bun run sync-versionsmake docs-versions ARGS=--dry-run # preview without writingKEEP_VERSIONS=10 make docs-versions # widen the window from 5 to 10The script is wired into make docs-cli and make docs-python as a
prerequisite, so a typical local rebuild just runs make docs-build and
picks up the latest manifest automatically.
How per-version pages get built
Both website/scripts/build-cli-docs.mjs
and website/scripts/build-python-docs.mjs
follow the same orchestration:
- For each
versions[]entry,git worktree add --detach <tmp> <tag>. uv sync --frozen --extra devinside the worktree (uv’s global wheel cache makes subsequent syncs fast even with multiple tags).- Invoke the generator from the worktree’s venv (
uv run --directory <wt>), pointing at the currentmaincopy of the generator script. This matters because older tags don’t necessarily have the--versionflag or any of the versioning machinery — usingmain’s generator pointed at the tag’s importable code gives consistent output. - The default version is additionally emitted at the un-versioned URL
(
--version-segment ""for CLI,version: nullre-build for API) so existing inbound links like/api/forensics_pipeline/keep resolving to the latest release without redirects. - Worktrees are removed on exit (
git worktree remove --force) so a failed build never leaves stale.git/worktrees/*entries.
Frontmatter contract
Each per-version page gets version:, versionLabel:,
versionDefault: true (default only), and sidebar.hidden: true (so
the sidebar stays clean — versioned pages are reachable only via the
<VersionPicker> dropdown). The default-aliased copies at root carry
no version frontmatter so they participate in normal Starlight
sidebar autogeneration.
<VersionPicker> (from @abstractdata/starlight-theme) auto-discovers
the version list at build time by walking getCollection('docs') for
frontmatter version: fields, deduping by tag, and pre-selecting the
entry that carries versionDefault: true. There’s no second version
list to maintain — the manifest → versions[] flow is the only source.
The override at website/src/components/SocialIcons.astro
renders two pickers (one for /api, one for /cli) because the theme
ships with a single picker bound to one base URL.
Diagnosis quick reference
| Symptom | Likely cause |
|---|---|
[VersionPicker] Auto-discovery found no pages with version: frontmatter | bun run sync-versions didn’t run, or all generators ran in single-version mode (no versions[] in the configs). |
Sidebar shows forensics-preflight N times (one per version) | Generator wasn’t emitting sidebar.hidden: true on versioned pages — re-run make docs-cli && make docs-python. |
starlight-links-validator: invalid link to /api/<tag>/<page>/ | A generator emitted a versioned URL without the /mediaite-ghostink/ base prefix. Common offender: cross-page link construction that forgot cfg.urlBasePrefix. |
CI: git worktree add failed (tag not present) | actions/checkout@v4 ran without fetch-depth: 0. The deploy workflow already pins it; if you copy the workflow elsewhere, copy that too. |
Old tag’s worktree fails uv sync --frozen | A pinned wheel was yanked from PyPI. The orchestrators downgrade this to a warning and skip the version; only the surviving tags ship in that build. |
Maintainer follow-ups for the Cloudflare → GitHub Pages migration:
- Enable GitHub Pages in repo settings (
Settings → Pages → Source: GitHub Actions). - Confirm the first deploy succeeds at
https://abstract-data.github.io/mediaite-ghostink/. - Retire the Cloudflare Pages project
ai-writing-forensicsin the Cloudflare dashboard once the Pages URL is confirmed live. - Remove the
CF_API_TOKENandCF_ACCOUNT_IDrepo secrets (Settings → Secrets and variables → Actions). - The old
make deploytarget (which still runswrangler pages deploy) can be retired in a follow-up change once the CF project is gone.