Skip to content

Quickstart

This page walks a new operator from a clean checkout to a passing preflight. It mirrors the Five-minute smoke test section of the repo README.md. For deeper operational guidance, head to the Runbook and the CLI reference.

  • Python 3.13 managed by uv
  • Quarto on PATH (only required for forensics report and forensics all)
  • Optional: pydoc-markdown (for regenerating API pages locally) and spaCy en_core_web_md (for the full extract stage)
  1. Clone and enter the repo.

    Terminal window
    git clone git@github.com:Abstract-Data/mediaite-ghostink.git
    cd mediaite-ghostink
  2. Bootstrap a reviewer environment.

    Terminal window
    make peer-setup
  3. Run preflight.

    Terminal window
    uv run forensics preflight

    This exits 0 when configuration, storage paths, optional model availability, and roster guardrails all pass. Any non-zero exit code is documented in Exit codes.

  4. Browse the CLI surface.

    Terminal window
    uv run forensics --help

    Every subcommand is also documented under CLI reference on this site (auto-generated from the Typer app).

The pipeline writes to data/ and reads operator-facing prose from docs/. Source modules live under src/forensics/.

  • Directorymediaite-ghostink/
    • Directorysrc/forensics/
      • Directoryscraper/ WordPress REST + HTML
      • Directoryfeatures/ Lexical, structural, content, productivity
      • Directoryanalysis/ Change-points, drift, convergence
      • Directoryreporting/ Quarto-aware report stage
      • Directorycli/ Typer command surface
    • Directorydata/ Pipeline artifacts (gitignored)
      • Directoryraw/ Scraped documents
      • Directoryfeatures/ Parquet feature store
      • Directoryanalysis/ Per-author analysis JSON
      • Directoryreports/ Optional local Quarto output
    • Directorydocs/ Canonical operator prose (synced into this site)
      • Directoryadr/ Architecture decision records
      • ARCHITECTURE.md, RUNBOOK.md, TESTING.md, GUARDRAILS.md, …
    • Directorynotebooks/ Quarto chapters for the bound report
    • Directorywebsite/ This documentation site
  • Read the Architecture overview for stage contracts and data model definitions.
  • Skim the Guardrails to see the Signs (failure patterns) maintainers and agents should respect.
  • The Runbook collects every operational quick-reference command, environment knob, and recovery procedure.
  • Each architectural decision is captured under Decision records — start with ADR-001 for the hybrid methodology rationale.