ADR-016: Nested `AnalysisConfig` with flat TOML and stable analysis hash
Status
Section titled “Status”Accepted — 2026-04-27
Context
Section titled “Context”AnalysisConfig had grown into a large flat BaseModel (~40+ fields), mixing
changepoint, BOCPD, convergence, LDA, hypothesis-testing, and embedding knobs.
Operators and config.toml already use a flat [analysis] table.
Preregistration locks, data/analysis/*_result.json config_hash, and
compute_model_config_hash(settings.analysis) must stay bit-for-bit stable
for the same effective settings unless we intentionally version and document
a bump.
Decision
Section titled “Decision”-
Decompose
AnalysisConfiginto nested Pydantic sub-models:PeltConfig,BocpdConfig,ConvergenceConfig,ContentLdaConfig,HypothesisConfig,EmbeddingStackConfig, plus top-level operational fields (rolling_windows,max_workers, etc.). -
Flat TOML compatibility:
AnalysisConfigusesmodel_validator(mode="before")(_lift_flat_analysis_dict) to move legacy flat[analysis]keys into the appropriate sub-dict before validation, so existingconfig.tomlfiles load unchanged. -
Stable analysis hash:
compute_model_config_hashspecial-casesAnalysisConfig: it walks nested models and builds the same flat JSON object (leaf keys identical to the pre-refactor field names) beforejson.dumps(..., sort_keys=True). Preregistration snapshot keys are updated to read through the nested attributes but emit the same JSON keys as before. -
Environment variables: nested settings use the usual
FORENSICS_<SECTION>__<SUB>__<FIELD>pattern, e.g.FORENSICS_ANALYSIS__CONVERGENCE__CONVERGENCE_USE_PERMUTATIONinstead of the former flatFORENSICS_ANALYSIS__CONVERGENCE_USE_PERMUTATION. -
Tests:
tests/unit/test_config_hash.pypins the leaf hash field set viaanalysis_config_hash_field_names()and a golden 16-char hash for defaultAnalysisConfig().
Amendments
Section titled “Amendments”- 2026-04-27:
HypothesisConfig.pipeline_b_modedefault ispercentile(waslegacy). Minimalconfig.tomlwithout an explicit[analysis] pipeline_b_modenow matches the shipped study default; setpipeline_b_mode = "legacy"only when reproducing pre-change convergence scoring. The golden digest intests/unit/test_config_hash.pywas bumped accordingly.
Consequences
Section titled “Consequences”- Call sites use explicit paths (
settings.analysis.hypothesis.significance_threshold). apply_flat_analysis_overridessupports tests and one-off scripts that still think in flat field names.- Operators migrating env overrides must add the sub-model segment in the key.
References
Section titled “References”src/forensics/config/analysis_settings.pysrc/forensics/config/compat_analysis.py(flat TOML lift /_FLAT_TO_GROUP)docs/adr/017-analysis-config-change-control.md(field-growth governance)src/forensics/utils/provenance.py(_build_recursive_hash_payload,analysis_config_hash_field_names)tests/unit/test_config_hash.py(test_default_analysis_config_model_hash_golden)