Files
codex/codex-rs/core/tests/suite/tool_harness.rs
Michael Bolin bfff0c729f config: enforce enterprise feature requirements (#13388)
## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
2026-03-04 04:40:22 +00:00

469 lines
14 KiB
Rust

#![cfg(not(target_os = "windows"))]
use std::fs;
use assert_matches::assert_matches;
use codex_core::features::Feature;
use codex_protocol::plan_tool::StepStatus;
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::EventMsg;
use codex_protocol::protocol::Op;
use codex_protocol::protocol::SandboxPolicy;
use codex_protocol::user_input::UserInput;
use core_test_support::assert_regex_match;
use core_test_support::responses;
use core_test_support::responses::ResponsesRequest;
use core_test_support::responses::ev_apply_patch_function_call;
use core_test_support::responses::ev_assistant_message;
use core_test_support::responses::ev_completed;
use core_test_support::responses::ev_function_call;
use core_test_support::responses::ev_local_shell_call;
use core_test_support::responses::ev_response_created;
use core_test_support::responses::sse;
use core_test_support::responses::start_mock_server;
use core_test_support::skip_if_no_network;
use core_test_support::test_codex::TestCodex;
use core_test_support::test_codex::test_codex;
use core_test_support::wait_for_event;
use serde_json::Value;
use serde_json::json;
fn call_output(req: &ResponsesRequest, call_id: &str) -> (String, Option<bool>) {
let raw = req.function_call_output(call_id);
assert_eq!(
raw.get("call_id").and_then(Value::as_str),
Some(call_id),
"mismatched call_id in function_call_output"
);
let (content_opt, success) = match req.function_call_output_content_and_success(call_id) {
Some(values) => values,
None => panic!("function_call_output present"),
};
let content = match content_opt {
Some(c) => c,
None => panic!("function_call_output content present"),
};
(content, success)
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn shell_tool_executes_command_and_streams_output() -> anyhow::Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_model("gpt-5");
let TestCodex {
codex,
cwd,
session_configured,
..
} = builder.build(&server).await?;
let call_id = "shell-tool-call";
let command = vec!["/bin/echo", "tool harness"];
let first_response = sse(vec![
ev_response_created("resp-1"),
ev_local_shell_call(call_id, "completed", command),
ev_completed("resp-1"),
]);
responses::mount_sse_once(&server, first_response).await;
let second_response = sse(vec![
ev_assistant_message("msg-1", "all done"),
ev_completed("resp-2"),
]);
let second_mock = responses::mount_sse_once(&server, second_response).await;
let session_model = session_configured.model.clone();
codex
.submit(Op::UserTurn {
items: vec![UserInput::Text {
text: "please run the shell command".into(),
text_elements: Vec::new(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_model,
effort: None,
summary: None,
service_tier: None,
collaboration_mode: None,
personality: None,
})
.await?;
wait_for_event(&codex, |event| matches!(event, EventMsg::TurnComplete(_))).await;
let req = second_mock.single_request();
let (output_text, _) = call_output(&req, call_id);
let exec_output: Value = serde_json::from_str(&output_text)?;
assert_eq!(exec_output["metadata"]["exit_code"], 0);
let stdout = exec_output["output"].as_str().expect("stdout field");
assert_regex_match(r"(?s)^tool harness\n?$", stdout);
Ok(())
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn update_plan_tool_emits_plan_update_event() -> anyhow::Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex();
let TestCodex {
codex,
cwd,
session_configured,
..
} = builder.build(&server).await?;
let call_id = "plan-tool-call";
let plan_args = json!({
"explanation": "Tool harness check",
"plan": [
{"step": "Inspect workspace", "status": "in_progress"},
{"step": "Report results", "status": "pending"},
],
})
.to_string();
let first_response = sse(vec![
ev_response_created("resp-1"),
ev_function_call(call_id, "update_plan", &plan_args),
ev_completed("resp-1"),
]);
responses::mount_sse_once(&server, first_response).await;
let second_response = sse(vec![
ev_assistant_message("msg-1", "plan acknowledged"),
ev_completed("resp-2"),
]);
let second_mock = responses::mount_sse_once(&server, second_response).await;
let session_model = session_configured.model.clone();
codex
.submit(Op::UserTurn {
items: vec![UserInput::Text {
text: "please update the plan".into(),
text_elements: Vec::new(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_model,
effort: None,
summary: None,
service_tier: None,
collaboration_mode: None,
personality: None,
})
.await?;
let mut saw_plan_update = false;
wait_for_event(&codex, |event| match event {
EventMsg::PlanUpdate(update) => {
saw_plan_update = true;
assert_eq!(update.explanation.as_deref(), Some("Tool harness check"));
assert_eq!(update.plan.len(), 2);
assert_eq!(update.plan[0].step, "Inspect workspace");
assert_matches!(update.plan[0].status, StepStatus::InProgress);
assert_eq!(update.plan[1].step, "Report results");
assert_matches!(update.plan[1].status, StepStatus::Pending);
false
}
EventMsg::TurnComplete(_) => true,
_ => false,
})
.await;
assert!(saw_plan_update, "expected PlanUpdate event");
let req = second_mock.single_request();
let (output_text, _success_flag) = call_output(&req, call_id);
assert_eq!(output_text, "Plan updated");
Ok(())
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn update_plan_tool_rejects_malformed_payload() -> anyhow::Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex();
let TestCodex {
codex,
cwd,
session_configured,
..
} = builder.build(&server).await?;
let call_id = "plan-tool-invalid";
let invalid_args = json!({
"explanation": "Missing plan data"
})
.to_string();
let first_response = sse(vec![
ev_response_created("resp-1"),
ev_function_call(call_id, "update_plan", &invalid_args),
ev_completed("resp-1"),
]);
responses::mount_sse_once(&server, first_response).await;
let second_response = sse(vec![
ev_assistant_message("msg-1", "malformed plan payload"),
ev_completed("resp-2"),
]);
let second_mock = responses::mount_sse_once(&server, second_response).await;
let session_model = session_configured.model.clone();
codex
.submit(Op::UserTurn {
items: vec![UserInput::Text {
text: "please update the plan".into(),
text_elements: Vec::new(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_model,
effort: None,
summary: None,
service_tier: None,
collaboration_mode: None,
personality: None,
})
.await?;
let mut saw_plan_update = false;
wait_for_event(&codex, |event| match event {
EventMsg::PlanUpdate(_) => {
saw_plan_update = true;
false
}
EventMsg::TurnComplete(_) => true,
_ => false,
})
.await;
assert!(
!saw_plan_update,
"did not expect PlanUpdate event for malformed payload"
);
let req = second_mock.single_request();
let (output_text, success_flag) = call_output(&req, call_id);
assert!(
output_text.contains("failed to parse function arguments"),
"expected parse error message in output text, got {output_text:?}"
);
if let Some(success_flag) = success_flag {
assert!(
!success_flag,
"expected tool output to mark success=false for malformed payload"
);
}
Ok(())
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn apply_patch_tool_executes_and_emits_patch_events() -> anyhow::Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_config(|config| {
config
.features
.enable(Feature::ApplyPatchFreeform)
.expect("test config should allow feature update");
});
let TestCodex {
codex,
cwd,
session_configured,
..
} = builder.build(&server).await?;
let file_name = "notes.txt";
let file_path = cwd.path().join(file_name);
let call_id = "apply-patch-call";
let patch_content = format!(
r#"*** Begin Patch
*** Add File: {file_name}
+Tool harness apply patch
*** End Patch"#
);
let first_response = sse(vec![
ev_response_created("resp-1"),
ev_apply_patch_function_call(call_id, &patch_content),
ev_completed("resp-1"),
]);
responses::mount_sse_once(&server, first_response).await;
let second_response = sse(vec![
ev_assistant_message("msg-1", "patch complete"),
ev_completed("resp-2"),
]);
let second_mock = responses::mount_sse_once(&server, second_response).await;
let session_model = session_configured.model.clone();
codex
.submit(Op::UserTurn {
items: vec![UserInput::Text {
text: "please apply a patch".into(),
text_elements: Vec::new(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_model,
effort: None,
summary: None,
service_tier: None,
collaboration_mode: None,
personality: None,
})
.await?;
let mut saw_patch_begin = false;
let mut patch_end_success = None;
wait_for_event(&codex, |event| match event {
EventMsg::PatchApplyBegin(begin) => {
saw_patch_begin = true;
assert_eq!(begin.call_id, call_id);
false
}
EventMsg::PatchApplyEnd(end) => {
assert_eq!(end.call_id, call_id);
patch_end_success = Some(end.success);
false
}
EventMsg::TurnComplete(_) => true,
_ => false,
})
.await;
assert!(saw_patch_begin, "expected PatchApplyBegin event");
let patch_end_success =
patch_end_success.expect("expected PatchApplyEnd event to capture success flag");
assert!(patch_end_success);
let req = second_mock.single_request();
let (output_text, _success_flag) = call_output(&req, call_id);
let expected_pattern = format!(
r"(?s)^Exit code: 0
Wall time: [0-9]+(?:\.[0-9]+)? seconds
Output:
Success. Updated the following files:
A {file_name}
?$"
);
assert_regex_match(&expected_pattern, &output_text);
let updated_contents = fs::read_to_string(file_path)?;
assert_eq!(
updated_contents, "Tool harness apply patch\n",
"expected updated file content"
);
Ok(())
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn apply_patch_reports_parse_diagnostics() -> anyhow::Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_config(|config| {
config
.features
.enable(Feature::ApplyPatchFreeform)
.expect("test config should allow feature update");
});
let TestCodex {
codex,
cwd,
session_configured,
..
} = builder.build(&server).await?;
let call_id = "apply-patch-parse-error";
let patch_content = r"*** Begin Patch
*** Update File: broken.txt
*** End Patch";
let first_response = sse(vec![
ev_response_created("resp-1"),
ev_apply_patch_function_call(call_id, patch_content),
ev_completed("resp-1"),
]);
responses::mount_sse_once(&server, first_response).await;
let second_response = sse(vec![
ev_assistant_message("msg-1", "failed"),
ev_completed("resp-2"),
]);
let second_mock = responses::mount_sse_once(&server, second_response).await;
let session_model = session_configured.model.clone();
codex
.submit(Op::UserTurn {
items: vec![UserInput::Text {
text: "please apply a patch".into(),
text_elements: Vec::new(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_model,
effort: None,
summary: None,
service_tier: None,
collaboration_mode: None,
personality: None,
})
.await?;
wait_for_event(&codex, |event| matches!(event, EventMsg::TurnComplete(_))).await;
let req = second_mock.single_request();
let (output_text, success_flag) = call_output(&req, call_id);
assert!(
output_text.contains("apply_patch verification failed"),
"expected apply_patch verification failure message, got {output_text:?}"
);
assert!(
output_text.contains("invalid hunk"),
"expected parse diagnostics in output text, got {output_text:?}"
);
if let Some(success_flag) = success_flag {
assert!(
!success_flag,
"expected tool output to mark success=false for parse failures"
);
}
Ok(())
}