8 Embedding drift
Question addressed: Whether each author’s embedding-space summaries and AI-baseline distance series show persistent shifts over time, as captured by the embedding-drift score and related convergence flags.
What this chapter shows: Using the stored drift bundles and convergence windows under data/analysis/, the chapter summarizes how strongly the embedding-drift channel fires, how often it appears without simultaneous stylometric confirmation (drift_only versus ab in passes_via), and plots velocity summaries. Rankings and counts are descriptive outputs from the artifacts present at render time.
Inputs: per-author drift JSON, baseline curve JSON, centroid archives, and the convergence_windows entries in each *_result.json.
Outputs: summary tables and figures inline with the narrative.
Provenance: filled by the first code cell.
8.1 Methodology — embedding drift score
The embedding-drift channel compares monthly centroid motion, variance trends, and distance to reference centroids (including an AI-style baseline). In the shipped configuration:
- AI-baseline distance is evaluated in percentile form per author so the score reflects each author’s own distribution rather than a single absolute cutoff.
- Drift declaration uses the
pipeline_b_scorethreshold recorded in analysis settings (currently 0.3 in the reference configuration). drift_onlyinpasses_viaallows a window to register on embedding drift alone when the stylometric ratio test does not also pass;abmarks windows where both channels register.
Embedding vectors must be listed in data/embeddings/manifest.jsonl for drift summaries to load. If drift cache files are missing while embeddings exist, the analysis layer logs a warning when drift summaries are read.
8.4 Distribution of pipeline_b_score across persisted windows
Histogram of pipeline_b_score for every persisted convergence window across the study authors, overlaid by author. The configured drift threshold (0.3 in the reference settings) is drawn as a vertical reference; windows to the right qualify as drift-positive. The right-hand tail shows mass that can register through the drift_only path in passes_via as well as through joint ab windows.
8.5 Diagnostic block — drift artifact presence
When drift cache files are missing but embeddings exist, the drift loader emits a warning of the form:
drift summary: missing artifact <label> for slug=<slug> but embeddings exist on disk
This cell checks artifact paths directly so missing files are visible even if the analysis log was not reviewed. A complete run should report 0 missing artifacts.
8.6 Summary
In the current artifact set, embedding-drift windows appear for 12/12 named study authors. Drift-only persisted windows total 8,042 across the panel. Among named authors, pb_max ranges from 0.520 (zachary-leeman) to 0.598 (david-gilmour). tommy-christopher has the largest count of drift-only windows (1,070); michael-luciano shows the largest drift-only volume without simultaneous stylometric confirmation (958 drift-only / 0 ab). colby-hall records 170 windows where both channels register (ab), the highest such count in this cohort.
Together with the feature and hypothesis-test chapters, these results describe an embedding-space channel that can move independently of the stylometric ratio test.