11  Executive summary (full report)

Project: Independent review of writing-style change at Mediaite.com using stylometry, embedding-space drift, and statistical comparisons.

What this chapter contains: A concise reading of the per-author analysis bundles together with headline tables. Earlier chapters document collection, features, change points, drift, tests, and controls.

Provenance: Corpus custody and configuration fingerprints are summarized in chapter 00 and in data/analysis/corpus_custody.json. The closing section of this chapter lists the git revision, analysis-configuration hash, and render timestamp for this build.

11.1 Executive summary

This section states what the bundled analysis files show for the configured author panel. It does not substitute for reading the evidence chapters or the custody record.

Panel highlights

  1. Embedding drift (pb_max). In the current artifact set, all twelve named study authors have a positive embedding-drift summary score. The highest values among those authors include david-gilmour (0.598), colby-hall (0.589), sarah-rumpf (0.587), charlie-nash (0.537), and joe-depaolo (0.536). These figures are descriptive rankings from the stored results, not findings about intent or authorship.

  2. Adjusted hypothesis tests. Under the documented per-family Benjamini–Hochberg adjustment, a subset of tests meets the pre-registered significance and effect-size criteria. Exact headline counts appear in the metrics table, which is recomputed from the same *_result.json inputs as the earlier chapters.

  3. Combined stylometry and drift (colby-hall). For colby-hall, stylometric convergence and embedding drift both exceed the declared thresholds across many windows, including 170 windows where both channels register (ab in passes_via). One representative window runs from 2026-01-08 through 2026-04-08 with five feature families flagged together: ai_markers, entropy, lexical_richness, readability, and self_similarity.

  4. Language-model probability channel. When token-level probability features are not present in the feature store, the probability trajectory score is zero for all authors. For this render, the primary quantitative story is therefore carried by stylometry, embedding drift, and the joint convergence index.

Interpretive limits

  • The metrics describe shifts in measurable text features; they do not, standing alone, establish how any article was written.
  • Whether this render is confirmatory or exploratory is determined by the pre-registration check in the next block.

11.2 Top-line metrics

The next code cell loads each per-author result.json under data/analysis/ and recomputes the headline metrics so the table matches the bundles used elsewhere in this volume.

11.3 Finding strength classification

Each author’s strongest convergence window is graded with classify_finding_strength from forensics.models.report. Hypothesis tests passed to the classifier are window-scoped — only tests whose feature_name appears in the window’s features_converging list count toward the tally. This prevents an author from being credited for significant tests that don’t actually support the convergence window in question.

A strong significant test is one with corrected p < 0.01 and |Cohen’s d| ≥ 0.8.

  • STRONG (red): ≥3 strong significant tests and controls clean (editorial_vs_author_signal > 0.7) and (Pipeline C ok when probability features are available).
  • MODERATE (orange): ≥3 significant tests and ≥1 strong significant test. The ≥1 strong floor distinguishes meaningful from marginal evidence; the ≥3 sig count keeps the bar above the trivial 2-test threshold that previously bucketed authors with vastly different evidence strength into the same tier.
  • WEAK (yellow): ≥1 significant test.
  • NONE (green): no significant tests in this window’s features.

Probability features are unavailable in this run, so the Pipeline C clause is a no-op. The editorial_vs_author_signal is looked up per-target from data/analysis/comparison_report.json (targets.<slug>.editorial_vs_author_signal); controls have no such signal and default to 0.0, which structurally reserves STRONG for designated targets.

Threshold provenance caveat: the MODERATE bar (≥3 sig AND ≥1 strong) was chosen after observing that the prior ≥2 sig bar bucketed authors with up to a 50× range in evidence quality together. This is exploratory; the bar must be locked in data/preregistration/ before any confirmatory claim is made.

11.3.1 §11.3.1 Direction concordance and volume context (exploratory)

The table in the next cell adds two diagnostic columns alongside FindingStrength:

  • direction (direction_ai, direction_mixed, direction_non_ai, direction_na): after collapsing to one hypothesis test per feature_name (largest |Cohen’s d|), each feature with a documented AI-typical direction in AI_TYPICAL_DIRECTION is scored as matching or opposing that prior. The ≥50% rule (among features that have a non-null prior) labels the window direction_ai when at least half of those comparisons match; direction_mixed when some but fewer than half match; direction_non_ai when none match but at least one prior existed; direction_na when no feature had a usable prior. This cutoff is exploratory until it is locked in data/preregistration/preregistration_lock.json.

  • volume_flag (volume_stable, volume_growth, volume_ramp, volume_decline, volume_unknown): uses n_post / n_pre from the first usable row in the window-scoped test list. A ratio above 5× is flagged as volume_ramp because a large baseline-to-window article-count increase is a common confound for stylometric shifts that are not specific to LLM assistance. That threshold is exploratory and must be pre-registered before any confirmatory claim.

Apr 27 2026 qualitative contrast (illustrative, not a separate statistical test): for the designated target window discussed in the review, effects on measured features with priors pointed in the AI-typical direction with a declining article-volume ratio (~0.44×). Several comparison authors graded MODERATE on strength alone showed opposing directions on most prior-scored features together with large volume ramps (on the order of 12×–276×). The strength tier alone does not distinguish those patterns; these columns make the contrast visible.

Read §11.1 caveats on correlation, multiple testing, and confounding before interpreting any row. Nothing in this subsection overrides the exploratory status of the diagnostics or the pre-registration gate described above.

11.4 Limitations and scope

  • Outcomes depend on the corpus snapshot, exclusion rules, and the analysis thresholds recorded in configuration and, when used, the pre-registration lock.
  • The language-model probability layer contributes only when probability features are present in the feature store; otherwise the composite score reflects stylometry, embedding drift, and joint convergence.
  • Natural-author controls and pooling rules follow the methodology described in project documentation.

11.5 Recommendations

  • Lock the direction-priors registry and the 5× volume-ramp threshold in pre-registration before any external publication. The Phase 17 diagnostic columns (direction, volume_flag) in §11.3 are exploratory until the priors and thresholds are committed in data/preregistration/preregistration_lock.json.

11.6 Further documentation

  • Pre-registration and threshold history: docs/pre_registration.md and data/preregistration/.
  • Pipeline design and data contracts: docs/ARCHITECTURE.md.
  • Operator commands and environment notes: docs/RUNBOOK.md.
  • Custody record: data/analysis/corpus_custody.json.

11.7 Provenance

This cell prints the git SHA and the deterministic hash of settings.analysis so the report is auditable downstream.