Skip to content

forensics.pipeline

End-to-end pipeline orchestration (scrape → extract → analyze → report).

forensics.cli.run_all calls run_all_pipeline here. Order of operations:

  1. AuditPipelineContext records forensics all in analysis_runs (best-effort; failures log a warning and the run continues).
  2. Scrapeasyncio.run(dispatch_scrape(...)) with all boolean stage flags false, which selects the same full scrape handler as a plain forensics scrape (discover → metadata → fetch → dedup → JSONL export). See forensics.cli.scrape.dispatch_scrape.
  3. Extractextract_all_features for all authors, embeddings on.
  4. Analyzerun_analyze(AnalyzeRequest(timeseries=True, convergence=True)) only (no changepoint, drift, compare-only, or AI baseline unless you edit this module).
  5. Reportrun_report with ReportArgs built from get_settings().report.output_format.

Operational detail and artifact layout: docs/RUNBOOK.md, docs/ARCHITECTURE.md.

def run_all_pipeline(*,
show_progress: bool = True,
observer: PipelineObserver | None = None) -> int

Run the default full pipeline; returns process exit code.

The pipeline refuses to start when preflight checks hard-fail (returns exit code 2) — this prevents cascading errors deeper in the run when the environment is known to be broken.

Arguments:

  • show_progress - When true and observer is None, attach a ~forensics.progress.RichPipelineObserver for scrape + phase labels.
  • observer - Optional pre-constructed observer (e.g. Rich session owned by the CLI). When set, show_progress only controls the feature-extract Rich bar, not observer construction.