refactor: narrow async lock guard lifetimes (#18211)

Follow-up to https://github.com/openai/codex/pull/18178, where we called out enabling the await-holding lint as a follow-up. The long-term goal is to enable Clippy coverage for async guards held across awaits. This PR is intentionally only the first, low-risk cleanup pass: it narrows obvious lock guard lifetimes and leaves `codex-rs/Cargo.toml` unchanged so the lint is not enabled until the remaining cases are fixed or explicitly justified. It intentionally leaves the active-turn/turn-state locking pattern alone because those checks and mutations need to stay atomic. ## Common fixes used here These are the main patterns reviewers should expect in this PR, and they are also the patterns to reach for when fixing future `await_holding_*` findings: - **Scope the guard to the synchronous work.** If the code only needs data from a locked value, move the lock into a small block, clone or compute the needed values, and do the later `.await` after the block. - **Use direct one-line mutations when there is no later await.** Cases like `map.lock().await.remove(&id)` are acceptable when the guard is only needed for that single mutation and the statement ends before any async work. - **Drain or clone work out of the lock before notifying or awaiting.** For example, the JS REPL drains pending exec senders into a local vector and the websocket writer clones buffered envelopes before it serializes or sends them. - **Use a `Semaphore` only when serialization is intentional across async work.** The test serialization guards intentionally span awaited setup or execution, so using a semaphore communicates "one at a time" without holding a mutex guard. - **Remove the mutex when there is only one owner.** The PTY stdin writer task owns `stdin` directly; the old `Arc<Mutex<_>>` did not protect shared access because nothing else had access to the writer. - **Do not split locks that protect an atomic invariant.** This PR deliberately leaves active-turn/turn-state paths alone because those checks and mutations need to stay atomic. Those cases should be fixed separately with a design change or documented with `#[expect]`. ## What changed - Narrow scoped async mutex guards in app-server, JS REPL, network approval, remote-control websocket, and the RMCP test server. - Replace test-only async mutex serialization guards with semaphores where the guard intentionally lives across async work. - Let the PTY pipe writer task own stdin directly instead of wrapping it in an async mutex. ## Verification - `just fix -p codex-core -p codex-app-server -p codex-rmcp-client -p codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` - `just clippy -p codex-core` - `cargo test -p codex-core -p codex-app-server -p codex-rmcp-client -p codex-shell-escalation -p codex-utils-pty -p codex-utils-readiness` was run; the app-server suite passed, and `codex-core` failed in the local sandbox on six otel approval tests plus `suite::user_shell_cmd::user_shell_command_does_not_set_network_sandbox_env_var`, which appear to depend on local command approval/default rules and `CODEX_SANDBOX_NETWORK_DISABLED=1` in this environment.
2026-05-03 10:56:37 +00:00 · 2026-04-17 14:06:50 -07:00
parent ecc8599c56
commit 1265df0ec2
11 changed files with 194 additions and 162 deletions
--- a/codex-rs/utils/readiness/src/lib.rs
+++ b/codex-rs/utils/readiness/src/lib.rs
@@ -277,17 +277,36 @@ mod tests {

    #[tokio::test]
    async fn subscribe_returns_error_when_lock_is_held() {
-        let flag = ReadinessFlag::new();
-        let _guard = flag
-            .tokens
-            .try_lock()
-            .expect("initial lock acquisition should succeed");
+        let flag = Arc::new(ReadinessFlag::new());
+        let (locked_tx, locked_rx) = std::sync::mpsc::channel();
+        let (release_tx, release_rx) = std::sync::mpsc::channel();
+        let lock_thread = {
+            let flag = Arc::clone(&flag);
+            std::thread::spawn(move || {
+                let _guard = flag.tokens.blocking_lock();
+                locked_tx
+                    .send(())
+                    .expect("test should receive lock acquisition notification");
+                release_rx
+                    .recv()
+                    .expect("test should release held readiness lock");
+            })
+        };
+        locked_rx
+            .recv()
+            .expect("test should observe held readiness lock");

        let err = flag
            .subscribe()
            .await
            .expect_err("contended subscribe should report a lock failure");
        assert_matches!(err, ReadinessError::TokenLockFailed);
+        release_tx
+            .send(())
+            .expect("test should release readiness lock thread");
+        lock_thread
+            .join()
+            .expect("readiness lock thread should not panic");
    }

    #[tokio::test]