7  Change-point detection

Question addressed: Where and when statistically gated shifts appear in the tracked writing features.

Inputs: data/analysis/*_result.json (PELT and BOCPD detections) and data/analysis/*_changepoints.json where present.

Outputs: per-author summary tables, timelines for selected authors, and heatmaps of change-point counts by feature.

Methods (summary): - PELT with an l2 cost on z-scored features for numerical stability. - BOCPD with MAP reset so run-length mass does not accumulate across unrelated segments. - Per-family Benjamini–Hochberg adjustment within each of six feature families: lexical_richness, readability, sentence_structure, entropy, self_similarity, ai_markers.

Provenance: auto-populated by the first code cell.

7.0.1 Methodology notes

  • PELT is a batch change-point detector. Here it uses an l2 cost on z-scored feature series for numerical stability.
  • BOCPD is a Bayesian online detector; the implementation MAP-resets run-length mass at each detected change so probability does not accumulate across unrelated segments.
  • Per-family BH (Benjamini–Hochberg). Raw features roll up to six families (lexical_richness, readability, sentence_structure, entropy, self_similarity, ai_markers). Adjustment runs inside each family so correlated features share one rejection budget.
  • The figures above read the same *_result.json bundles written by the analysis stage; re-running analysis and re-rendering refreshes them.