Files
codex/.github/workflows
starr-openai 732b12b1ef Reduce rust-ci-full Windows nextest timeout flakes (#23253)
## Why
Recent `rust-ci-full` failures were dominated by transient Windows
timeout clusters in process-heavy tests such as `suite::resume`,
`suite::cli_stream`, `suite::auth_env`,
`start_thread_uses_all_default_environments_from_codex_home`, and
`connect_stdio_command_initializes_json_rpc_client_on_windows`.

The goal here is to make those known flaky paths less likely to fail
full CI without relaxing the global nextest timeout policy.

## What changed
- Enable one global nextest retry with `retries = 1` so a single
transient failure can recover.
- Add a `windows_process_heavy` test group with `max-threads = 2` for
the recurring Windows subprocess/session-heavy timeout families.
- Add Windows-only slow-timeout overrides for that process-heavy group.
- Add a narrower Windows-only timeout override for
`start_thread_uses_all_default_environments_from_codex_home`, which
still exceeded the broader Windows bucket in both Windows full-CI lanes.
- Increase the `rust-ci-full` nextest job timeout from `45m` to `60m` so
Windows ARM64 still has job-level headroom after retries and targeted
per-test timeout increases.
- Keep the global `slow-timeout` unchanged at `15s`.

## Validation
Validated through `rust-ci-full` GitHub Actions reruns on this PR.

Observed improvement on the tuned Windows lanes:
- Windows x64 went from `5 timed out` to `0 timed out`.
- Windows ARM64 went from `2 timed out` to `0 timed out`.
- `start_thread_uses_all_default_environments_from_codex_home` recovered
as a flaky pass on Windows ARM64 instead of timing out.

The remaining failing tests in those runs were unrelated hard failures
outside this nextest timeout tuning.
2026-05-18 13:06:39 -07:00
..
2026-05-15 12:41:18 -07:00

Workflow Strategy

The workflows in this directory are split so that pull requests get fast, review-friendly signal while main still gets the full cross-platform verification pass.

Pull Requests

  • bazel.yml is the main pre-merge verification path for Rust code. It runs Bazel test and Bazel clippy on the supported Bazel targets, including the generated Rust test binaries needed to lint inline #[cfg(test)] code.
  • rust-ci.yml keeps the Cargo-native PR checks intentionally small:
    • cargo fmt --check
    • cargo shear
    • argument-comment-lint on Linux, macOS, and Windows
    • tools/argument-comment-lint package tests when the lint or its workflow wiring changes

Post-Merge On main

  • bazel.yml also runs on pushes to main. This re-verifies the merged Bazel path and helps keep the BuildBuddy caches warm.
  • rust-ci-full.yml is the full Cargo-native verification workflow. It keeps the heavier checks off the PR path while still validating them after merge:
    • the full Cargo clippy matrix
    • the full Cargo nextest matrix
    • release-profile Cargo builds
    • cross-platform argument-comment-lint
    • Linux remote-env tests

Rule Of Thumb

  • If a build/test/clippy check can be expressed in Bazel, prefer putting the PR-time version in bazel.yml.
  • Keep rust-ci.yml fast enough that it usually does not dominate PR latency.
  • Reserve rust-ci-full.yml for heavyweight Cargo-native coverage that Bazel does not replace yet.