codex

mirror of https://github.com/openai/codex.git synced 2026-05-19 18:52:57 +00:00

Files

starr-openai 732b12b1ef Reduce rust-ci-full Windows nextest timeout flakes (#23253 )

## Why
Recent `rust-ci-full` failures were dominated by transient Windows
timeout clusters in process-heavy tests such as `suite::resume`,
`suite::cli_stream`, `suite::auth_env`,
`start_thread_uses_all_default_environments_from_codex_home`, and
`connect_stdio_command_initializes_json_rpc_client_on_windows`.

The goal here is to make those known flaky paths less likely to fail
full CI without relaxing the global nextest timeout policy.

## What changed
- Enable one global nextest retry with `retries = 1` so a single
transient failure can recover.
- Add a `windows_process_heavy` test group with `max-threads = 2` for
the recurring Windows subprocess/session-heavy timeout families.
- Add Windows-only slow-timeout overrides for that process-heavy group.
- Add a narrower Windows-only timeout override for
`start_thread_uses_all_default_environments_from_codex_home`, which
still exceeded the broader Windows bucket in both Windows full-CI lanes.
- Increase the `rust-ci-full` nextest job timeout from `45m` to `60m` so
Windows ARM64 still has job-level headroom after retries and targeted
per-test timeout increases.
- Keep the global `slow-timeout` unchanged at `15s`.

## Validation
Validated through `rust-ci-full` GitHub Actions reruns on this PR.

Observed improvement on the tuned Windows lanes:
- Windows x64 went from `5 timed out` to `0 timed out`.
- Windows ARM64 went from `2 timed out` to `0 timed out`.
- `start_thread_uses_all_default_environments_from_codex_home` recovered
as a flaky pass on Windows ARM64 instead of timing out.

The remaining failing tests in those runs were unrelated hard failures
outside this nextest timeout tuning.

2026-05-18 13:06:39 -07:00