Files
codex/codex-rs/app-server/tests/suite/v2/remote_thread_store.rs
Michael Bolin 889ee018e7 config: add strict config parsing (#20559)
## Why

Codex intentionally ignores unknown `config.toml` fields by default so
older and newer config files keep working across versions. That leniency
also makes typo detection hard because misspelled or misplaced keys
disappear silently.

This change adds an opt-in strict config mode so users and tooling can
fail fast on unrecognized config fields without changing the default
permissive behavior.

This feature is possible because `serde_ignored` exposes the exact
signal Codex needs: it lets Codex run ordinary Serde deserialization
while recording fields Serde would otherwise ignore. That avoids
requiring `#[serde(deny_unknown_fields)]` across every config type and
keeps strict validation opt-in around the existing config model.

## What Changed

### Added strict config validation

- Added `serde_ignored`-based validation for `ConfigToml` in
`codex-rs/config/src/strict_config.rs`.
- Combined `serde_ignored` with `serde_path_to_error` so strict mode
preserves typed config error paths while also collecting fields Serde
would otherwise ignore.
- Added strict-mode validation for unknown `[features]` keys, including
keys that would otherwise be accepted by `FeaturesToml`'s flattened
boolean map.
- Kept typed config errors ahead of ignored-field reporting, so
malformed known fields are reported before unknown-field diagnostics.
- Added source-range diagnostics for top-level and nested unknown config
fields, including non-file managed preference source names.

### Kept parsing single-pass per source

- Reworked file and managed-config loading so strict validation reuses
the already parsed `TomlValue` for that source.
- For actual config files and managed config strings, the loader now
reads once, parses once, and validates that same parsed value instead of
deserializing multiple times.
- Validated `-c` / `--config` override layers with the same
base-directory context used for normal relative-path resolution, so
unknown override keys are still reported when another override contains
a relative path.

### Scoped `--strict-config` to config-heavy entry points

- Added support for `--strict-config` on the main config-loading entry
points where it is most useful:
  - `codex`
  - `codex resume`
  - `codex fork`
  - `codex exec`
  - `codex review`
  - `codex mcp-server`
  - `codex app-server` when running the server itself
  - the standalone `codex-app-server` binary
  - the standalone `codex-exec` binary
- Commands outside that set now reject `--strict-config` early with
targeted errors instead of accepting it everywhere through shared CLI
plumbing.
- `codex app-server` subcommands such as `proxy`, `daemon`, and
`generate-*` are intentionally excluded from the first rollout.
- When app-server strict mode sees invalid config, app-server exits with
the config error instead of logging a warning and continuing with
defaults.
- Introduced a dedicated `ReviewCommand` wrapper in `codex-rs/cli`
instead of extending shared `ReviewArgs`, so `--strict-config` stays on
the outer config-loading command surface and does not become part of the
reusable review payload used by `codex exec review`.

### Coverage

- Added tests for top-level and nested unknown config fields, unknown
`[features]` keys, typed-error precedence, source-location reporting,
and non-file managed preference source names.
- Added CLI coverage showing invalid `--enable`, invalid `--disable`,
and unknown `-c` overrides still error when `--strict-config` is
present, including compound-looking feature names such as
`multi_agent_v2.subagent_usage_hint_text`.
- Added integration coverage showing both `codex app-server
--strict-config` and standalone `codex-app-server --strict-config` exit
with an error for unknown config fields instead of starting with
fallback defaults.
- Added coverage showing unsupported command surfaces reject
`--strict-config` with explicit errors.

## Example Usage

Run Codex with strict config validation enabled:

```shell
codex --strict-config
```

Strict config mode is also available on the supported config-heavy
subcommands:

```shell
codex --strict-config exec "explain this repository"
codex review --strict-config --uncommitted
codex mcp-server --strict-config
codex app-server --strict-config --listen off
codex-app-server --strict-config --listen off
```

For example, if `~/.codex/config.toml` contains a typo in a key name:

```toml
model = "gpt-5"
approval_polic = "on-request"
```

then `codex --strict-config` reports the misspelled key instead of
silently ignoring it. The path is shortened to `~` here for readability:

```text
$ codex --strict-config
Error loading config.toml:
~/.codex/config.toml:2:1: unknown configuration field `approval_polic`
  |
2 | approval_polic = "on-request"
  | ^^^^^^^^^^^^^^
```

Without `--strict-config`, Codex keeps the existing permissive behavior
and ignores the unknown key.

Strict config mode also validates ad-hoc `-c` / `--config` overrides:

```text
$ codex --strict-config -c foo=bar
Error: unknown configuration field `foo` in -c/--config override

$ codex --strict-config -c features.foo=true
Error: unknown configuration field `features.foo` in -c/--config override
```

Invalid feature toggles are rejected too, including values that look
like nested config paths:

```text
$ codex --strict-config --enable does_not_exist
Error: Unknown feature flag: does_not_exist

$ codex --strict-config --disable does_not_exist
Error: Unknown feature flag: does_not_exist

$ codex --strict-config --enable multi_agent_v2.subagent_usage_hint_text
Error: Unknown feature flag: multi_agent_v2.subagent_usage_hint_text
```

Unsupported commands reject the flag explicitly:

```text
$ codex --strict-config cloud list
Error: `--strict-config` is not supported for `codex cloud`
```

## Verification

The `codex-cli` `strict_config` tests cover invalid `--enable`, invalid
`--disable`, the compound `multi_agent_v2.subagent_usage_hint_text`
case, unknown `-c` overrides, app-server strict startup failure through
`codex app-server`, and rejection for unsupported commands such as
`codex cloud`, `codex mcp`, `codex remote-control`, and `codex
app-server proxy`.

The config and config-loader tests cover unknown top-level fields,
unknown nested fields, unknown `[features]` keys, source-location
reporting, non-file managed config sources, and `-c` validation for keys
such as `features.foo`.

The app-server test suite covers standalone `codex-app-server
--strict-config` startup failure for an unknown config field.

## Documentation

The Codex CLI docs on developers.openai.com/codex should mention
`--strict-config` as an opt-in validation mode for supported
config-heavy entry points once this ships.
2026-05-13 16:08:05 +00:00

288 lines
9.9 KiB
Rust

//! Regression coverage for app-server thread operations backed by a non-local
//! `ThreadStore`.
//!
//! The app-server startup path should honor `experimental_thread_store`
//! by routing all thread persistence through the configured store. This suite uses
//! the thread-store crate's test-only in-memory store to exercise the non-local
//! config-driven selection path without touching local rollout or sqlite storage.
//!
//! The important failure mode is accidentally materializing local persistence
//! while a non-local store is configured. After `thread/start` and a simple turn,
//! the temporary `codex_home` must not contain rollout session files or sqlite
//! state files. This does not observe read-only probes that leave no artifact; it
//! is a stop-gap that prevents additional local persistence writes from slipping
//! in unnoticed.
use std::collections::BTreeSet;
use std::path::Path;
use std::sync::Arc;
use anyhow::Result;
use app_test_support::create_mock_responses_server_repeating_assistant;
use codex_app_server::in_process;
use codex_app_server::in_process::InProcessServerEvent;
use codex_app_server::in_process::InProcessStartArgs;
use codex_app_server_protocol::ClientInfo;
use codex_app_server_protocol::ClientRequest;
use codex_app_server_protocol::InitializeParams;
use codex_app_server_protocol::RequestId;
use codex_app_server_protocol::ServerNotification;
use codex_app_server_protocol::ThreadListParams;
use codex_app_server_protocol::ThreadListResponse;
use codex_app_server_protocol::ThreadStartParams;
use codex_app_server_protocol::ThreadStartResponse;
use codex_app_server_protocol::TurnStartParams;
use codex_app_server_protocol::UserInput as V2UserInput;
use codex_arg0::Arg0DispatchPaths;
use codex_config::CloudRequirementsLoader;
use codex_config::LoaderOverrides;
use codex_config::NoopThreadConfigLoader;
use codex_core::config::ConfigBuilder;
use codex_exec_server::EnvironmentManager;
use codex_feedback::CodexFeedback;
use codex_protocol::protocol::SessionSource;
use codex_thread_store::InMemoryThreadStore;
use pretty_assertions::assert_eq;
use tempfile::TempDir;
use tokio::time::timeout;
use uuid::Uuid;
const DEFAULT_READ_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(10);
#[tokio::test]
async fn thread_start_with_non_local_thread_store_does_not_create_local_persistence() -> Result<()>
{
let server = create_mock_responses_server_repeating_assistant("Done").await;
let codex_home = TempDir::new()?;
let store_id = Uuid::new_v4().to_string();
// Plugin startup warmups may create `.tmp` under codex_home. Disable them
// here so this regression stays focused on thread persistence artifacts.
create_config_toml_with_thread_store(codex_home.path(), &server.uri(), &store_id)?;
let loader_overrides = LoaderOverrides::without_managed_config_for_tests();
let config = ConfigBuilder::default()
.codex_home(codex_home.path().to_path_buf())
.fallback_cwd(Some(codex_home.path().to_path_buf()))
.loader_overrides(loader_overrides.clone())
.build()
.await?;
let thread_store = InMemoryThreadStore::for_id(store_id.clone());
let _in_memory_store = InMemoryThreadStoreId { store_id };
let mut client = in_process::start(InProcessStartArgs {
arg0_paths: Arg0DispatchPaths::default(),
config: Arc::new(config),
cli_overrides: Vec::new(),
loader_overrides,
strict_config: false,
cloud_requirements: CloudRequirementsLoader::default(),
thread_config_loader: Arc::new(NoopThreadConfigLoader),
feedback: CodexFeedback::new(),
log_db: None,
state_db: None,
environment_manager: Arc::new(EnvironmentManager::default_for_tests()),
config_warnings: Vec::new(),
session_source: SessionSource::Cli,
enable_codex_api_key_env: false,
initialize: InitializeParams {
client_info: ClientInfo {
name: "codex-app-server-tests".to_string(),
title: None,
version: "0.1.0".to_string(),
},
capabilities: None,
},
channel_capacity: in_process::DEFAULT_IN_PROCESS_CHANNEL_CAPACITY,
})
.await?;
let response = client
.request(ClientRequest::ThreadStart {
request_id: RequestId::Integer(1),
params: ThreadStartParams::default(),
})
.await?
.expect("thread/start should succeed");
let ThreadStartResponse { thread, .. } =
serde_json::from_value(response).expect("thread/start response should parse");
assert_eq!(thread.path, None);
client
.request(ClientRequest::TurnStart {
request_id: RequestId::Integer(2),
params: TurnStartParams {
thread_id: thread.id.clone(),
input: vec![V2UserInput::Text {
text: "Hello".to_string(),
text_elements: Vec::new(),
}],
..Default::default()
},
})
.await?
.expect("turn/start should succeed");
timeout(DEFAULT_READ_TIMEOUT, async {
loop {
let Some(event) = client.next_event().await else {
anyhow::bail!("in-process app-server stopped before turn/completed");
};
if let InProcessServerEvent::ServerNotification(ServerNotification::TurnCompleted(
completed,
)) = event
&& completed.thread_id == thread.id
{
return Ok::<(), anyhow::Error>(());
}
}
})
.await??;
let response = client
.request(ClientRequest::ThreadList {
request_id: RequestId::Integer(3),
params: ThreadListParams {
cursor: None,
limit: Some(10),
sort_key: None,
sort_direction: None,
model_providers: Some(Vec::new()),
source_kinds: None,
archived: None,
cwd: None,
use_state_db_only: false,
search_term: None,
},
})
.await?
.expect("thread/list should succeed");
let ThreadListResponse { data, .. } =
serde_json::from_value(response).expect("thread/list response should parse");
assert_eq!(data.len(), 1);
assert_eq!(data[0].id, thread.id);
assert_eq!(data[0].path, None);
client.shutdown().await?;
let calls = thread_store.calls().await;
assert_eq!(calls.create_thread, 1);
assert_eq!(calls.list_threads, 1);
assert!(
calls.append_items > 0,
"turn/start should append rollout items through the injected store"
);
assert!(
calls.flush_thread > 0,
"turn completion should flush through the injected store"
);
assert_no_local_persistence_artifacts(codex_home.path())?;
Ok(())
}
fn assert_no_local_persistence_artifacts(codex_home: &Path) -> Result<()> {
// These are the observable tripwires for accidental local persistence. If a
// future code path constructs a local rollout/session store or opens the
// local thread sqlite database, it should leave one of these artifacts in
// the isolated test codex_home.
assert!(
!codex_home.join("sessions").exists(),
"non-local thread persistence should not create local rollout sessions"
);
assert!(
!codex_home.join("archived_sessions").exists(),
"non-local thread persistence should not create archived rollout sessions"
);
assert!(
!codex_state::state_db_path(codex_home).exists(),
"non-local thread persistence should not create local thread sqlite"
);
let sqlite_artifacts = std::fs::read_dir(codex_home)?
.filter_map(std::result::Result::ok)
.map(|entry| entry.path())
.filter(|path| {
path.file_name()
.and_then(|name| name.to_str())
.is_some_and(|name| {
name.ends_with(".sqlite")
|| name.ends_with(".sqlite-shm")
|| name.ends_with(".sqlite-wal")
})
})
.collect::<Vec<_>>();
assert!(
sqlite_artifacts.is_empty(),
"non-local thread persistence should not create sqlite artifacts: {sqlite_artifacts:?}"
);
let mut entries = codex_home_entries(codex_home)?;
// Bazel test runs may initialize shell snapshot storage under codex_home.
// That is not thread persistence; keep the assertion focused on rollout,
// session, sqlite, and other unexpected thread-store artifacts.
entries.remove("shell_snapshots");
assert_eq!(
entries,
BTreeSet::from([
"config.toml".to_string(),
"installation_id".to_string(),
"memories".to_string(),
"skills".to_string(),
]),
"non-local thread persistence should not create unexpected files in codex_home"
);
Ok(())
}
fn codex_home_entries(codex_home: &Path) -> Result<BTreeSet<String>> {
Ok(std::fs::read_dir(codex_home)?
.filter_map(|entry| {
let entry = entry.ok()?;
Some(entry.file_name().to_string_lossy().into_owned())
})
.collect())
}
struct InMemoryThreadStoreId {
store_id: String,
}
impl Drop for InMemoryThreadStoreId {
fn drop(&mut self) {
InMemoryThreadStore::remove_id(&self.store_id);
}
}
fn create_config_toml_with_thread_store(
codex_home: &Path,
server_uri: &str,
store_id: &str,
) -> std::io::Result<()> {
std::fs::write(
codex_home.join("config.toml"),
format!(
r#"
model = "mock-model"
approval_policy = "never"
sandbox_mode = "read-only"
experimental_thread_store = {{ type = "in_memory", id = "{store_id}" }}
model_provider = "mock_provider"
[model_providers.mock_provider]
name = "Mock provider for test"
base_url = "{server_uri}/v1"
wire_api = "responses"
request_max_retries = 0
stream_max_retries = 0
[features]
plugins = false
"#
),
)
}