core: box wrapper futures to reduce stack pressure (#13429)

Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This
uses the same general fix pattern as
[#12421](https://github.com/openai/codex/pull/12421), but in the
`codex-core` compact/resume/fork path.

## Why

`compact_resume_after_second_compaction_preserves_history` started
overflowing the stack on Windows CI after `#13388`.

The important part is that this was not a compaction-recursion bug. The
test exercises a path with several thin `async fn` wrappers around much
larger thread-spawn, resume, and fork futures. When one `async fn`
awaits another inline, the outer future stores the callee future as part
of its own state machine. In a long wrapper chain, that means a caller
can accidentally inline a lot more state than the source code suggests.

That is exactly what was happening here:

- `ThreadManager` convenience methods such as `start_thread`,
`resume_thread_from_rollout`, and `fork_thread` were inlining the larger
spawn/resume futures beneath them.
- `core_test_support::test_codex` added another wrapper layer on top of
those same paths.
- `compact_resume_fork` adds a few more helpers, and this particular
test drives the resume/fork path multiple times.

On Windows, that was enough to push both the libtest thread and Tokio
worker threads over the edge. The previous 8 MiB test-thread workaround
proved the failure was stack-related, but it did not address the
underlying future size.

## How This Was Debugged

The useful debugging pattern here was to turn the CI-only failure into a
local low-stack repro.

1. First, remove the explicit large-stack harness so the test runs on
the normal `#[tokio::test]` path.
2. Build the test binary normally.
3. Re-run the already-built `tests/all` binary directly with
progressively smaller `RUST_MIN_STACK` values.

Running the built binary directly matters: it keeps the reduced stack
size focused on the test process instead of also applying it to `cargo`
and `rustc`.

That made it possible to answer two questions quickly:

- Does the failure still reproduce without the workaround? Yes.
- Does boxing the wrapper futures actually buy back stack headroom? Also
yes.

After this change, the built test binary passes with
`RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough
evidence to justify removing the explicit 8 MiB override while keeping a
deterministic low-stack repro for future debugging.

If we hit a similar issue again, the first places to inspect are thin
`async fn` wrappers that mostly forward into a much larger async
implementation.

## `Box::pin()` Primer

`async fn` compiles into a state machine. If a wrapper does this:

```rust
async fn wrapper() {
    inner().await;
}
```

then `wrapper()` stores the full `inner()` future inline as part of its
own state.

If the wrapper instead does this:

```rust
async fn wrapper() {
    Box::pin(inner()).await;
}
```

then the child future lives on the heap, and the outer future only
stores a pinned pointer to it. That usually trades one allocation for a
substantially smaller outer future, which is exactly the tradeoff we
want when the problem is stack pressure rather than raw CPU time.

Useful references:

-
[`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin)
- [Async book:
Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html)

## What Changed

- Boxed the wrapper futures in `core/src/thread_manager.rs` around
`start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the
corresponding `ThreadManagerState` spawn helpers so callers no longer
inline the full spawn/resume state machine through multiple layers.
- Boxed the matching test-only wrapper futures in
`core/tests/common/test_codex.rs` and
`core/tests/suite/compact_resume_fork.rs`, which sit directly on top of
the same path.
- Restored `compact_resume_after_second_compaction_preserves_history` in
`core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]`
and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing.
- Simplified a tiny helper in `compact_resume_fork` by making
`fetch_conversation_path()` synchronous, which removes one more
unnecessary future layer from the test path.

## Verification

- `cargo test -p codex-core --test all
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
-- --exact --nocapture`
- `cargo test -p codex-core --test all suite::compact_resume_fork --
--nocapture`
- Re-ran the built `codex-core` `tests/all` binary directly with reduced
stack sizes:
  - `RUST_MIN_STACK=917504` passes
  - `RUST_MIN_STACK=786432` still overflows
- `cargo test -p codex-core`
- Still fails locally in unrelated existing integration areas that
expect the `codex` / `test_stdio_server` binaries or hit the existing
`search_tool` wiremock mismatches.
This commit is contained in:
Michael Bolin
2026-03-03 21:44:52 -08:00
committed by GitHub
parent d622bff384
commit 7134220f3c
3 changed files with 72 additions and 96 deletions

View File

@@ -105,7 +105,7 @@ impl TestCodexBuilder {
Some(home) => home,
None => Arc::new(TempDir::new()?),
};
self.build_with_home(server, home, None).await
Box::pin(self.build_with_home(server, home, None)).await
}
pub async fn build_with_streaming_server(
@@ -117,8 +117,7 @@ impl TestCodexBuilder {
Some(home) => home,
None => Arc::new(TempDir::new()?),
};
self.build_with_home_and_base_url(format!("{base_url}/v1"), home, None)
.await
Box::pin(self.build_with_home_and_base_url(format!("{base_url}/v1"), home, None)).await
}
pub async fn build_with_websocket_server(
@@ -139,8 +138,7 @@ impl TestCodexBuilder {
.enable(Feature::ResponsesWebsockets)
.expect("test config should allow feature update");
}));
self.build_with_home_and_base_url(base_url, home, None)
.await
Box::pin(self.build_with_home_and_base_url(base_url, home, None)).await
}
pub async fn resume(
@@ -149,7 +147,7 @@ impl TestCodexBuilder {
home: Arc<TempDir>,
rollout_path: PathBuf,
) -> anyhow::Result<TestCodex> {
self.build_with_home(server, home, Some(rollout_path)).await
Box::pin(self.build_with_home(server, home, Some(rollout_path))).await
}
async fn build_with_home(
@@ -160,7 +158,7 @@ impl TestCodexBuilder {
) -> anyhow::Result<TestCodex> {
let base_url = format!("{}/v1", server.uri());
let (config, cwd) = self.prepare_config(base_url, &home).await?;
self.build_from_config(config, cwd, home, resume_from).await
Box::pin(self.build_from_config(config, cwd, home, resume_from)).await
}
async fn build_with_home_and_base_url(
@@ -170,7 +168,7 @@ impl TestCodexBuilder {
resume_from: Option<PathBuf>,
) -> anyhow::Result<TestCodex> {
let (config, cwd) = self.prepare_config(base_url, &home).await?;
self.build_from_config(config, cwd, home, resume_from).await
Box::pin(self.build_from_config(config, cwd, home, resume_from)).await
}
async fn build_from_config(
@@ -201,11 +199,14 @@ impl TestCodexBuilder {
let new_conversation = match resume_from {
Some(path) => {
let auth_manager = codex_core::test_support::auth_manager_from_auth(auth);
thread_manager
.resume_thread_from_rollout(config.clone(), path, auth_manager)
.await?
Box::pin(thread_manager.resume_thread_from_rollout(
config.clone(),
path,
auth_manager,
))
.await?
}
None => thread_manager.start_thread(config.clone()).await?,
None => Box::pin(thread_manager.start_thread(config.clone())).await?,
};
Ok(TestCodex {