feat: export and replay effective config locks (#20405)

## Why

For reproducibility. A hand-written `config.toml` is not enough to
recreate what a Codex session actually ran with because layered config,
CLI overrides, defaults, feature aliases, resolved feature config,
prompt setup, and model-catalog/session values can all affect the final
runtime behavior.

This PR adds an effective config lockfile path: one run can export the
resolved session config, and a later run can replay that lockfile and
fail early if the regenerated effective config drifts.

## What Changed

- Add a dedicated `ConfigLockfileToml` wrapper with top-level lockfile
metadata plus the replayable config:

  ```toml
  version = 1
  codex_version = "..."

  [config]
  # effective ConfigToml fields
  ```

- Keep lockfile metadata out of regular `ConfigToml`; replay loads
`ConfigLockfileToml` and then uses its nested `config` as the
authoritative config layer.
- Add `debug.config_lockfile.export_dir` to write
`<thread_id>.config.lock.toml` when a root session starts.
- Add `debug.config_lockfile.load_path` to replay a saved lockfile and
validate the regenerated session lockfile against it.
- Add `debug.config_lockfile.allow_codex_version_mismatch` to optionally
tolerate Codex binary version drift while still comparing the rest of
the lockfile.
- Add `debug.config_lockfile.save_fields_resolved_from_model_catalog` so
lock creation can either save model-catalog/session-resolved fields or
intentionally leave those fields dynamic.
- Build lockfiles from the effective config plus resolved runtime values
such as model selection, reasoning settings, prompts, service tier, web
search mode, feature states/config, memories config, skill instructions,
and agent limits.
- Materialize feature aliases and custom feature config into the
lockfile so replay compares canonical resolved behavior instead of
user-authored alias shape.
- Strip profile/debug/file-include/environment-specific inputs from
generated lockfiles so they contain replayable values rather than the
inputs that produced those values.
- Surface JSON-RPC server error code/data in app-server client and TUI
bootstrap errors so config-lock replay failures include the actual TOML
diff.
- Regenerate the config schema for the new debug config keys.

## Review Notes

The main flow is split across these files:

- `config/src/config_toml.rs`: lockfile/debug TOML shapes.
- `core/src/config/mod.rs`: loading `debug.config_lockfile.*`, replaying
a lockfile as a config layer, and preserving the expected lockfile for
validation.
- `core/src/session/config_lock.rs`: exporting the current session
lockfile and materializing resolved session/config values.
- `core/src/config_lock.rs`: lockfile parsing, metadata/version checks,
replay comparison, and diff formatting.

## Usage

Export a lockfile from a normal session:

```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"'
```

Export a lockfile without saving model-catalog/session-resolved fields:

```sh
codex -c 'debug.config_lockfile.export_dir="/tmp/codex-locks"' \
  -c 'debug.config_lockfile.save_fields_resolved_from_model_catalog=false'
```

Replay a saved lockfile in a later session:

```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"'
```

If replay resolves to a different effective config, startup fails with a
TOML diff.

To tolerate Codex binary version drift during replay:

```sh
codex -c 'debug.config_lockfile.load_path="/tmp/codex-locks/<thread_id>.config.lock.toml"' \
  -c 'debug.config_lockfile.allow_codex_version_mismatch=true'
```

## Limitations

This does not support custom rules/network policies.

## Verification

- `cargo test -p codex-core config_lock`
- `cargo test -p codex-config`
- `cargo test -p codex-thread-manager-sample`
This commit is contained in:
jif-oai
2026-05-01 17:46:02 +02:00
committed by GitHub
parent ff27d01676
commit 0b04d1b3cc
17 changed files with 977 additions and 16 deletions

View File

@@ -490,6 +490,54 @@ usage_hint_enabled = false
);
}
#[test]
fn materialize_resolved_enabled_writes_all_features_and_preserves_custom_config() {
let mut features = Features::with_defaults();
features.enable(Feature::CodeMode);
features.enable(Feature::MultiAgentV2);
features.disable(Feature::ToolSearch);
let mut features_toml = FeaturesToml {
multi_agent_v2: Some(FeatureToml::Config(crate::MultiAgentV2ConfigToml {
enabled: Some(false),
min_wait_timeout_ms: Some(2500),
..Default::default()
})),
entries: BTreeMap::from([("include_apply_patch_tool".to_string(), true)]),
..Default::default()
};
features_toml.materialize_resolved_enabled(&features);
let entries = features_toml.entries();
assert_eq!(entries.get("include_apply_patch_tool"), None);
for spec in crate::FEATURES {
assert_eq!(
entries.get(spec.key),
Some(&features.enabled(spec.id)),
"{}",
spec.key
);
}
assert_eq!(
features_toml.multi_agent_v2,
Some(FeatureToml::Config(crate::MultiAgentV2ConfigToml {
enabled: Some(true),
min_wait_timeout_ms: Some(2500),
..Default::default()
}))
);
let replayed = Features::from_sources(
FeatureConfigSource {
features: Some(&features_toml),
..Default::default()
},
FeatureConfigSource::default(),
FeatureOverrides::default(),
);
assert_eq!(replayed.enabled(Feature::ApplyPatchFreeform), false);
}
#[test]
fn unstable_warning_event_only_mentions_enabled_under_development_features() {
let mut configured_features = Table::new();