mirror of
https://github.com/openai/codex.git
synced 2026-04-28 00:25:56 +00:00
Stabilize Windows cmd-based shell test harnesses (#14958)
## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>
This commit is contained in:
@@ -35,7 +35,7 @@ async fn websocket_test_codex_shell_chain() -> Result<()> {
|
||||
]])
|
||||
.await;
|
||||
|
||||
let mut builder = test_codex();
|
||||
let mut builder = test_codex().with_windows_cmd_shell();
|
||||
|
||||
let test = builder.build_with_websocket_server(&server).await?;
|
||||
test.submit_turn_with_policy(
|
||||
@@ -183,7 +183,7 @@ async fn websocket_v2_test_codex_shell_chain() -> Result<()> {
|
||||
]])
|
||||
.await;
|
||||
|
||||
let mut builder = test_codex().with_config(|config| {
|
||||
let mut builder = test_codex().with_windows_cmd_shell().with_config(|config| {
|
||||
config
|
||||
.features
|
||||
.enable(Feature::ResponsesWebsocketsV2)
|
||||
|
||||
Reference in New Issue
Block a user