feat: export and replay effective config locks (#20405)

## Why For reproducibility. A hand-written `config.toml` is not enough to recreate what a Codex session actually ran with because layered config, CLI overrides, defaults, feature aliases, resolved feature config, prompt setup, and model-catalog/session values can all affect the final runtime behavior. This PR adds an effective config lockfile path: one run can export the resolved session config, and a later run can replay that lockfile and fail early if the regenerated effective config drifts. ## What Changed - Add a dedicated `ConfigLockfileToml` wrapper with top-level lockfile metadata plus the replayable config: ```toml version = 1 codex_version = "..." [config] # effective ConfigToml fields ``` - Keep lockfile metadata out of regular `ConfigToml`; replay loads `ConfigLockfileToml` and then uses its nested `config` as the authoritative config layer. - Add `debug.config_lockfile.export_dir` to write `<thread_id>.config.lock.toml` when a root session starts. - Add `debug.config_lockfile.load_path` to replay a saved lockfile and validate the regenerated session lockfile against it. - Add `debug.config_lockfile.allow_codex_version_mismatch` to optionally tolerate Codex binary version drift while still comparing the rest of the lockfile. - Add `debug.config_lockfile.save_fields_resolved_from_model_catalog` so lock creation can either save model-catalog/session-resolved fields or intentionally leave those fields dynamic. - Build lockfiles from the effective config plus resolved runtime values such as model selection, reasoning settings, prompts, service tier, web search mode, feature states/config, memories config, skill instructions, and agent limits. - Materialize feature aliases and custom feature config into the lockfile so replay compares canonical resolved behavior instead of user-authored alias shape. - Strip profile/debug/file-include/environment-specific inputs from generated lockfiles so they contain replayable values rather than the inputs that produced those values. - Surface JSON-RPC server error code/data in app-server client and TUI bootstrap errors so config-lock replay failures include the actual TOML diff. - Regenerate the config schema for the new debug config keys. ## Review Notes The main flow is split across these files: - `config/src/config_toml.rs`: lockfile/debug TOML shapes. - `core/src/config/mod.rs`: loading `debug.config_lockfile.*`, replaying a lockfile as a config layer, and preserving the expected lockfile for validation. - `core/src/session/config_lock.rs`: exporting the current session lockfile and materializing resolved session/config values. - `core/src/config_lock.rs`: lockfile parsing, metadata/version checks, replay comparison, and diff formatting. ## Usage Export a lockfile from a normal session: ```sh codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' ``` Export a lockfile without saving model-catalog/session-resolved fields: ```sh codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' \ -c 'debug.config_lockfile.save_fields_resolved_from_model_catalog=false' ``` Replay a saved lockfile in a later session: ```sh codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' ``` If replay resolves to a different effective config, startup fails with a TOML diff. To tolerate Codex binary version drift during replay: ```sh codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' \ -c 'debug.config_lockfile.allow_codex_version_mismatch=true' ``` ## Limitations This does not support custom rules/network policies. ## Verification - `cargo test -p codex-core config_lock` - `cargo test -p codex-config` - `cargo test -p codex-thread-manager-sample`
2026-06-01 19:02:59 +00:00 · 2026-05-01 17:46:02 +02:00
parent ff27d01676
commit 0b04d1b3cc
17 changed files with 977 additions and 16 deletions
--- a/codex-rs/features/src/tests.rs
+++ b/codex-rs/features/src/tests.rs
@@ -490,6 +490,54 @@ usage_hint_enabled = false
    );
 }

+#[test]
+fn materialize_resolved_enabled_writes_all_features_and_preserves_custom_config() {
+    let mut features = Features::with_defaults();
+    features.enable(Feature::CodeMode);
+    features.enable(Feature::MultiAgentV2);
+    features.disable(Feature::ToolSearch);
+
+    let mut features_toml = FeaturesToml {
+        multi_agent_v2: Some(FeatureToml::Config(crate::MultiAgentV2ConfigToml {
+            enabled: Some(false),
+            min_wait_timeout_ms: Some(2500),
+            ..Default::default()
+        })),
+        entries: BTreeMap::from([("include_apply_patch_tool".to_string(), true)]),
+        ..Default::default()
+    };
+
+    features_toml.materialize_resolved_enabled(&features);
+
+    let entries = features_toml.entries();
+    assert_eq!(entries.get("include_apply_patch_tool"), None);
+    for spec in crate::FEATURES {
+        assert_eq!(
+            entries.get(spec.key),
+            Some(&features.enabled(spec.id)),
+            "{}",
+            spec.key
+        );
+    }
+    assert_eq!(
+        features_toml.multi_agent_v2,
+        Some(FeatureToml::Config(crate::MultiAgentV2ConfigToml {
+            enabled: Some(true),
+            min_wait_timeout_ms: Some(2500),
+            ..Default::default()
+        }))
+    );
+    let replayed = Features::from_sources(
+        FeatureConfigSource {
+            features: Some(&features_toml),
+            ..Default::default()
+        },
+        FeatureConfigSource::default(),
+        FeatureOverrides::default(),
+    );
+    assert_eq!(replayed.enabled(Feature::ApplyPatchFreeform), false);
+}
+
 #[test]
 fn unstable_warning_event_only_mentions_enabled_under_development_features() {
    let mut configured_features = Table::new();