test: deflake nextest child-process leak in MCP harnesses (#11263)

## Summary
- add deterministic child-process cleanup to both test `McpProcess`
helpers
- keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
polling in `Drop`
- document the failure mode and why this avoids nondeterministic `LEAK`
flakes

## Why
`cargo nextest` leak detection can intermittently report `LEAK` when a
spawned server outlives test teardown, making CI flaky.

## Testing
- `just fmt`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`


## Failing CI Reference
- Original failing job:
https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
This commit is contained in:
Josh McKinney
2026-02-09 19:43:24 -08:00
committed by GitHub
parent 862ab63071
commit de59e550c0
2 changed files with 60 additions and 0 deletions

View File

@@ -862,3 +862,33 @@ impl McpProcess {
}
}
}
impl Drop for McpProcess {
fn drop(&mut self) {
// These tests spawn a `codex-app-server` child process.
//
// We keep that child alive for the test and rely on Tokio's `kill_on_drop(true)` when this
// helper is dropped. Tokio documents kill-on-drop as best-effort: dropping requests
// termination, but it does not guarantee the child has fully exited and been reaped before
// teardown continues.
//
// That makes cleanup timing nondeterministic. Leak detection can occasionally observe the
// child still alive at teardown and report `LEAK`, which makes the test flaky.
//
// Drop can't be async, so we do a bounded synchronous cleanup:
//
// 1. Request termination with `start_kill()`.
// 2. Poll `try_wait()` until the OS reports the child exited, with a short timeout.
let _ = self.process.start_kill();
let start = std::time::Instant::now();
let timeout = std::time::Duration::from_secs(5);
while start.elapsed() < timeout {
match self.process.try_wait() {
Ok(Some(_)) => return,
Ok(None) => std::thread::sleep(std::time::Duration::from_millis(10)),
Err(_) => return,
}
}
}
}

View File

@@ -361,3 +361,33 @@ impl McpProcess {
}
}
}
impl Drop for McpProcess {
fn drop(&mut self) {
// These tests spawn a `codex-mcp-server` child process.
//
// We keep that child alive for the test and rely on Tokio's `kill_on_drop(true)` when this
// helper is dropped. Tokio documents kill-on-drop as best-effort: dropping requests
// termination, but it does not guarantee the child has fully exited and been reaped before
// teardown continues.
//
// That makes cleanup timing nondeterministic. Leak detection can occasionally observe the
// child still alive at teardown and report `LEAK`, which makes the test flaky.
//
// Drop can't be async, so we do a bounded synchronous cleanup:
//
// 1. Request termination with `start_kill()`.
// 2. Poll `try_wait()` until the OS reports the child exited, with a short timeout.
let _ = self.process.start_kill();
let start = std::time::Instant::now();
let timeout = std::time::Duration::from_secs(5);
while start.elapsed() < timeout {
match self.process.try_wait() {
Ok(Some(_)) => return,
Ok(None) => std::thread::sleep(std::time::Duration::from_millis(10)),
Err(_) => return,
}
}
}
}