Stabilize Windows cmd-based shell test harnesses (#14958)

mirror of https://github.com/openai/codex.git synced 2026-04-28 00:25:56 +00:00

## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:

- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`

## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.

There were multiple independent sources of nondeterminism in that setup:

- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.

So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.

## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.

## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.

---------

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>

This commit is contained in:

Ahmed Ibrahim

2026-03-17 13:21:46 -07:00

committed by

GitHub

parent 683c37ce75

commit b02388672f

9 changed files with 160 additions and 13 deletions

									
										4

codex-rs/core/tests/suite/agent_websocket.rs
									
												View File
												
				@@ -35,7 +35,7 @@ async fn websocket_test_codex_shell_chain() -> Result<()> {

				    ]])

				    .await;

				    let mut builder = test_codex();

				    let mut builder = test_codex().with_windows_cmd_shell();

				    let test = builder.build_with_websocket_server(&server).await?;

				    test.submit_turn_with_policy(

				@@ -183,7 +183,7 @@ async fn websocket_v2_test_codex_shell_chain() -> Result<()> {

				    ]])

				    .await;

				    let mut builder = test_codex().with_config(|config| {

				    let mut builder = test_codex().with_windows_cmd_shell().with_config(|config| {

				        config

				            .features

				            .enable(Feature::ResponsesWebsocketsV2)

Stabilize Windows cmd-based shell test harnesses (#14958)

4 codex-rs/core/tests/suite/agent_websocket.rs Unescape Escape View File

4

codex-rs/core/tests/suite/agent_websocket.rs

View File