7 Change-point detection
Question addressed: Where and when statistically gated shifts appear in the tracked writing features.
Inputs: data/analysis/*_result.json (PELT and BOCPD detections) and data/analysis/*_changepoints.json where present.
Outputs: per-author summary tables, timelines for selected authors, and heatmaps of change-point counts by feature.
Methods (summary): - PELT with an l2 cost on z-scored features for numerical stability. - BOCPD with MAP reset so run-length mass does not accumulate across unrelated segments. - Per-family Benjamini–Hochberg adjustment within each of six feature families: lexical_richness, readability, sentence_structure, entropy, self_similarity, ai_markers.
Provenance: auto-populated by the first code cell.
7.0.1 Methodology notes
- PELT is a batch change-point detector. Here it uses an
l2cost on z-scored feature series for numerical stability. - BOCPD is a Bayesian online detector; the implementation MAP-resets run-length mass at each detected change so probability does not accumulate across unrelated segments.
- Per-family BH (Benjamini–Hochberg). Raw features roll up to six families (
lexical_richness,readability,sentence_structure,entropy,self_similarity,ai_markers). Adjustment runs inside each family so correlated features share one rejection budget. - The figures above read the same
*_result.jsonbundles written by the analysis stage; re-running analysis and re-rendering refreshes them.