mirror of
https://github.com/openai/codex.git
synced 2026-05-01 18:06:47 +00:00
## Summary Increase `core-all-test`'s Bazel shard count from `8` to `16`. ## Why [#19609](https://github.com/openai/codex/pull/19609) restored `bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s shard count because the bigger timeout risk was not just a cold Windows build. The more common problem was a long `rust_test()` shard failing and getting retried multiple times. Recent `main` runs show that `//codex-rs/core:core-all-test` still has the same shape of problem on Windows: - [Run 24943931330](https://github.com/openai/codex/actions/runs/24943931330) reported `//codex-rs/core:core-all-test` as flaky after first-attempt failures in shard `5/8` and shard `8/8`. - Those retries were driven by `suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override` and `suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`. - The failed shard attempts in that run took `272.61s` and `259.27s` before retrying, which is exactly the sort of wall-clock cost that burns through the 30-minute budget. - [Run 24966332583](https://github.com/openai/codex/actions/runs/24966332583) also retried `//codex-rs/tui:tui-unit-tests` after `app::tests::update_memory_settings_updates_current_thread_memory_mode` failed once on Windows. - [Run 24965527138](https://github.com/openai/codex/actions/runs/24965527138) and its linked [BuildBuddy invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce) show the other half of the problem: when Windows cache reuse is weak, the `bazel test //...` step can already consume `24m11s` on its own, leaving very little headroom for flaky retries. Increasing `core-all-test` to `16` shards does not fix the flaky tests, but it does reduce the wall-clock cost when a single shard has to be retried. That matches the mitigation we already applied to `app-server-all-test` in `#19609`. ## What Changed - Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards instead of `8`. - Leave `core-unit-tests` unchanged. ## Follow-up Work This change is meant to buy back CI headroom while we fix the flaky tests themselves in subsequent commits. The recent Windows retries that look worth addressing directly include: - `suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override` - `suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request` - `app::tests::update_memory_settings_updates_current_thread_memory_mode` ## Verification - Compared `core-all-test`'s current sharding against the `app-server-all-test` precedent in [#19609](https://github.com/openai/codex/pull/19609). - Inspected recent `main` Bazel workflow logs and the linked BuildBuddy invocation to confirm that Windows retries on long shards are still consuming a meaningful fraction of the 30-minute timeout budget. - Did not run local tests for this change because it only adjusts Bazel sharding metadata.
61 lines
1.9 KiB
Python
61 lines
1.9 KiB
Python
load("//:defs.bzl", "codex_rust_crate")
|
|
|
|
filegroup(
|
|
name = "model_availability_nux_fixtures",
|
|
srcs = [
|
|
"tests/cli_responses_fixture.sse",
|
|
],
|
|
visibility = ["//visibility:public"],
|
|
)
|
|
|
|
codex_rust_crate(
|
|
name = "core",
|
|
crate_name = "codex_core",
|
|
compile_data = glob(
|
|
include = ["**"],
|
|
exclude = [
|
|
"**/* *",
|
|
"BUILD.bazel",
|
|
"Cargo.toml",
|
|
],
|
|
allow_empty = True,
|
|
),
|
|
rustc_env = {
|
|
# Keep manifest-root path lookups inside the Bazel execroot for code
|
|
# that relies on env!("CARGO_MANIFEST_DIR").
|
|
"CARGO_MANIFEST_DIR": "codex-rs/core",
|
|
},
|
|
integration_compile_data_extra = [
|
|
"//codex-rs/apply-patch:apply_patch_tool_instructions.md",
|
|
"templates/realtime/backend_prompt.md",
|
|
],
|
|
integration_test_timeout = "long",
|
|
test_data_extra = [
|
|
"config.schema.json",
|
|
] + glob([
|
|
"src/**/snapshots/**",
|
|
]) + [
|
|
# This is a bit of a hack, but empirically, some of our integration tests
|
|
# are relying on the presence of this file as a repo root marker. When
|
|
# running tests locally, this "just works," but in remote execution,
|
|
# the working directory is different and so the file is not found unless it
|
|
# is explicitly added as test data.
|
|
#
|
|
# TODO(aibrahim): Update the tests so that `just bazel-remote-test`
|
|
# succeeds without this workaround.
|
|
"//:AGENTS.md",
|
|
],
|
|
test_shard_counts = {
|
|
"core-all-test": 16,
|
|
"core-unit-tests": 8,
|
|
},
|
|
test_tags = ["no-sandbox"],
|
|
unit_test_timeout = "long",
|
|
extra_binaries = [
|
|
"//codex-rs/linux-sandbox:codex-linux-sandbox",
|
|
"//codex-rs/rmcp-client:test_stdio_server",
|
|
"//codex-rs/rmcp-client:test_streamable_http_server",
|
|
"//codex-rs/cli:codex",
|
|
],
|
|
)
|