forensics.pipeline

End-to-end pipeline orchestration (scrape → extract → analyze → report).

forensics.cli.run_all calls run_all_pipeline here. Order of operations:

Audit — PipelineContext records forensics all in analysis_runs (best-effort; failures log a warning and the run continues).
Scrape — asyncio.run(dispatch_scrape(...)) with all boolean stage flags false, which selects the same full scrape handler as a plain forensics scrape (discover → metadata → fetch → dedup → JSONL export). See forensics.cli.scrape.dispatch_scrape.
Extract — extract_all_features for all authors, embeddings on.
Analyze — run_analyze(AnalyzeRequest(timeseries=True, convergence=True)) only (no changepoint, drift, compare-only, or AI baseline unless you edit this module).
Report — run_report with ReportArgs built from get_settings().report.output_format.

Operational detail and artifact layout: docs/RUNBOOK.md, docs/ARCHITECTURE.md.

run_all_pipeline

def run_all_pipeline(*,
                     show_progress: bool = True,
                     observer: PipelineObserver | None = None) -> int

Run the default full pipeline; returns process exit code.

The pipeline refuses to start when preflight checks hard-fail (returns exit code 2) — this prevents cascading errors deeper in the run when the environment is known to be broken.

Arguments:

show_progress - When true and observer is None, attach a ~forensics.progress.RichPipelineObserver for scrape + phase labels.
observer - Optional pre-constructed observer (e.g. Rich session owned by the CLI). When set, show_progress only controls the feature-extract Rich bar, not observer construction.

forensics.pipeline

run_all_pipeline

See also