mirror of
https://github.com/openai/codex.git
synced 2026-04-24 14:45:27 +00:00
test: deflake nextest child-process leak in MCP harnesses (#11263)
## Summary - add deterministic child-process cleanup to both test `McpProcess` helpers - keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()` polling in `Drop` - document the failure mode and why this avoids nondeterministic `LEAK` flakes ## Why `cargo nextest` leak detection can intermittently report `LEAK` when a spawned server outlives test teardown, making CI flaky. ## Testing - `just fmt` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` ## Failing CI Reference - Original failing job: https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
This commit is contained in:
@@ -862,3 +862,33 @@ impl McpProcess {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for McpProcess {
|
||||
fn drop(&mut self) {
|
||||
// These tests spawn a `codex-app-server` child process.
|
||||
//
|
||||
// We keep that child alive for the test and rely on Tokio's `kill_on_drop(true)` when this
|
||||
// helper is dropped. Tokio documents kill-on-drop as best-effort: dropping requests
|
||||
// termination, but it does not guarantee the child has fully exited and been reaped before
|
||||
// teardown continues.
|
||||
//
|
||||
// That makes cleanup timing nondeterministic. Leak detection can occasionally observe the
|
||||
// child still alive at teardown and report `LEAK`, which makes the test flaky.
|
||||
//
|
||||
// Drop can't be async, so we do a bounded synchronous cleanup:
|
||||
//
|
||||
// 1. Request termination with `start_kill()`.
|
||||
// 2. Poll `try_wait()` until the OS reports the child exited, with a short timeout.
|
||||
let _ = self.process.start_kill();
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
let timeout = std::time::Duration::from_secs(5);
|
||||
while start.elapsed() < timeout {
|
||||
match self.process.try_wait() {
|
||||
Ok(Some(_)) => return,
|
||||
Ok(None) => std::thread::sleep(std::time::Duration::from_millis(10)),
|
||||
Err(_) => return,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -361,3 +361,33 @@ impl McpProcess {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Drop for McpProcess {
|
||||
fn drop(&mut self) {
|
||||
// These tests spawn a `codex-mcp-server` child process.
|
||||
//
|
||||
// We keep that child alive for the test and rely on Tokio's `kill_on_drop(true)` when this
|
||||
// helper is dropped. Tokio documents kill-on-drop as best-effort: dropping requests
|
||||
// termination, but it does not guarantee the child has fully exited and been reaped before
|
||||
// teardown continues.
|
||||
//
|
||||
// That makes cleanup timing nondeterministic. Leak detection can occasionally observe the
|
||||
// child still alive at teardown and report `LEAK`, which makes the test flaky.
|
||||
//
|
||||
// Drop can't be async, so we do a bounded synchronous cleanup:
|
||||
//
|
||||
// 1. Request termination with `start_kill()`.
|
||||
// 2. Poll `try_wait()` until the OS reports the child exited, with a short timeout.
|
||||
let _ = self.process.start_kill();
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
let timeout = std::time::Duration::from_secs(5);
|
||||
while start.elapsed() < timeout {
|
||||
match self.process.try_wait() {
|
||||
Ok(Some(_)) => return,
|
||||
Ok(None) => std::thread::sleep(std::time::Duration::from_millis(10)),
|
||||
Err(_) => return,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user