Commit Graph

11 Commits

Author SHA1 Message Date
Michael Bolin
491a3058f6 fix(exec-server): retain output until streams close (#18946)
## Why

A Mac Bazel run hit a flake in
`server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes`
where the read path observed process exit but lost the expected buffered
stdout (`first\nsecond\n`). See the [GitHub Actions
job](https://github.com/openai/codex/actions/runs/24758468552/job/72436716505)
and [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/37475a12-4ef2-45fb-ab8a-e49a2aba1d59).

The underlying race is that process exit is not the same thing as
stdout/stderr closure. If a child or grandchild inherits the pipe write
end, or a process duplicates it with `dup2`, the watched process can
exit while the stream is still open and more output can still arrive.
The exec-server was starting exited-process retention cleanup from the
exit event, so the process entry could be removed before the output
streams had actually closed.

While stress-testing the exec-server unit suite,
`server::handler::tests::long_poll_read_fails_after_session_resume`
exposed a separate test race: it started a short-lived command that
could exit and wake the pending long-poll read before the session-resume
assertion observed the resumed-session error. That test is intended to
cover resume eviction, not process-exit delivery, so this change keeps
the process alive and quiet while the second connection resumes the
session.

## What changed

- Keep exec-server process entries retained until stdout/stderr streams
close, then start the post-exit retention timer from the closed event.
- Wake long-poll readers when the closed event is emitted.
- Add focused `local_process` unit coverage that proves late output is
still retained after the short test retention interval has elapsed, and
that closed process entries are eventually evicted.
- Add a local and remote regression test where a parent exits while a
child keeps inherited stdout open. The child waits on an explicit
release file, so the test deterministically observes exit first,
releases the child, then requires a nonzero-wait read from the exit
sequence to receive the late output.
- In `codex-rs/exec-server/src/server/handler/tests.rs`, make
`long_poll_read_fails_after_session_resume` run a long-lived silent
command instead of a short command that prints and exits. This isolates
the test to session-resume behavior and prevents a normal process exit
from satisfying the pending long-poll read first.

## Testing

- `cargo test -p codex-exec-server
exec_process_retains_output_after_exit_until_streams_close`
- `cargo test -p codex-exec-server local_process::tests`
- `cargo test -p codex-exec-server`
- `just fix -p codex-exec-server`
- `bazel test //codex-rs/exec-server:exec-server-unit-tests
//codex-rs/exec-server:exec-server-exec_process-test
//codex-rs/exec-server:exec-server-file_system-test
//codex-rs/exec-server:exec-server-http_client-test
//codex-rs/exec-server:exec-server-initialize-test
//codex-rs/exec-server:exec-server-process-test
//codex-rs/exec-server:exec-server-websocket-test`
- `bazel test --runs_per_test=25
//codex-rs/exec-server:exec-server-unit-tests`

## Documentation

No docs update needed; this is an internal exec-server correctness fix.
2026-04-23 19:49:58 +00:00
Michael Bolin
d3dd0d759b exec-server: expose arg0 alias root to fs sandbox (#19016)
## Why

The post-merge `rust-ci-full` run for #18999 still failed the Ubuntu
remote `suite::remote_env` sandboxed filesystem tests. That run checked
out merge commit `ddde50c611e4800cb805f243ed3c50bbafe7d011`, so the arg0
guard lifetime fix was present.

The Docker-backed failure had two remaining pieces:

- The sandboxed filesystem helper needs to execute Codex through the
`codex-linux-sandbox` arg0 alias path. The helper sandbox was only
granting read access to the real Codex executable parent, so the alias
parent also has to be visible inside the helper sandbox.
- The remote-env tests were building sandbox contexts with
`FileSystemSandboxContext::new()`, which captures the local test runner
cwd. In the Docker remote exec-server, that host checkout path does not
exist, so spawning the filesystem helper failed with `No such file or
directory` before the helper could process the request.

## What Changed

- Track all helper runtime read roots instead of a single root.
- Add both the real Codex executable parent and the
`codex-linux-sandbox` alias parent to sandbox readable roots.
- Avoid sending an unused local cwd in remote filesystem sandbox
contexts when the permission profile has no cwd-dependent entries.
- Build the Docker remote-env test sandbox contexts with a cwd path that
exists inside the container.
- Add unit coverage for the alias-parent root and remote sandbox cwd
handling.

## Verification

- `cargo test -p codex-exec-server`
- `cargo test -p codex-core
remote_test_env_sandboxed_read_allows_readable_root`
- `just fix -p codex-exec-server`
- `just fix -p codex-core`
2026-04-22 21:34:22 +00:00
efrazer-oai
be75785504 fix: fully revert agent identity runtime wiring (#18757)
## Summary

This PR fully reverts the previously merged Agent Identity runtime
integration from the old stack:
https://github.com/openai/codex/pull/17387/changes

It removes the Codex-side task lifecycle wiring, rollout/session
persistence, feature flag plumbing, lazy `auth.json` mutation,
background task auth paths, and request callsite changes introduced by
that stack.

This leaves the repo in a clean pre-AgentIdentity integration state so
the follow-up PRs can reintroduce the pieces in smaller reviewable
layers.

## Stack

1. This PR: full revert
2. https://github.com/openai/codex/pull/18871: move Agent Identity
business logic into a crate
3. https://github.com/openai/codex/pull/18785: add explicit
AgentIdentity auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate auth callsites
through AuthProvider

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-21 14:30:55 -07:00
Ahmed Ibrahim
2ca270d08d [2/8] Support piped stdin in exec process API (#18086)
## Summary
- Add an explicit stdin mode to process/start.
- Keep normal non-interactive exec stdin closed while allowing
pipe-backed processes.

## Stack
```text
o  #18027 [8/8] Fail exec client operations after disconnect
│
o  #18025 [7/8] Cover MCP stdio tests with executor placement
│
o  #18089 [6/8] Wire remote MCP stdio through executor
│
o  #18088 [5/8] Add executor process transport for MCP stdio
│
o  #18087 [4/8] Abstract MCP stdio server launching
│
o  #18020 [3/8] Add pushed exec process events
│
@  #18086 [2/8] Support piped stdin in exec process API
│
o  #18085 [1/8] Add MCP server environment config
│
o  main
```

Co-authored-by: Codex <noreply@openai.com>
2026-04-16 10:30:10 -07:00
jif-oai
bacb92b1d7 Build remote exec env from exec-server policy (#17216)
## Summary
- add an exec-server `envPolicy` field; when present, the server starts
from its own process env and applies the shell environment policy there
- keep `env` as the exact environment for local/embedded starts, but
make it an overlay for remote unified-exec starts
- move the shell-environment-policy builder into `codex-config` so Core
and exec-server share the inherit/filter/set/include behavior
- overlay only runtime/sandbox/network deltas from Core onto the
exec-server-derived env

## Why
Remote unified exec was materializing the shell env inside Core and
forwarding the whole map to exec-server, so remote processes could
inherit the orchestrator machine's `HOME`, `PATH`, etc. This keeps the
base env on the executor while preserving Core-owned runtime additions
like `CODEX_THREAD_ID`, unified-exec defaults, network proxy env, and
sandbox marker env.

## Validation
- `just fmt`
- `git diff --check`
- `cargo test -p codex-exec-server --lib`
- `cargo test -p codex-core --lib unified_exec::process_manager::tests`
- `cargo test -p codex-core --lib exec_env::tests`
- `cargo test -p codex-core --lib exec_env_tests` (compile-only; filter
matched 0 tests)
- `cargo test -p codex-config --lib shell_environment` (compile-only;
filter matched 0 tests)
- `just bazel-lock-update`

## Known local validation issue
- `just bazel-lock-check` is not runnable in this checkout: it invokes
`./scripts/check-module-bazel-lock.sh`, which is missing.

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: pakrym-oai <pakrym@openai.com>
2026-04-13 09:59:08 +01:00
starr-openai
d626dc3895 Run exec-server fs operations through sandbox helper (#17294)
## Summary
- run exec-server filesystem RPCs requiring sandboxing through a
`codex-fs` arg0 helper over stdin/stdout
- keep direct local filesystem execution for `DangerFullAccess` and
external sandbox policies
- remove the standalone exec-server binary path in favor of top-level
arg0 dispatch/runtime paths
- add sandbox escape regression coverage for local and remote filesystem
paths

## Validation
- `just fmt`
- `git diff --check`
- remote devbox: `cd codex-rs && bazel test --bes_backend=
--bes_results_url= //codex-rs/exec-server:all` (6/6 passed)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-12 18:36:03 -07:00
Adrian
39cc85310f Add use_agent_identity feature flag (#17385) 2026-04-11 09:52:06 -07:00
Eric Traut
824ec94eab Fix Windows exec-server output test flake (#17409)
Problem: The Windows exec-server test command could let separator
whitespace become part of `echo` output, making the exact
retained-output assertion flaky.

Solution: Tighten the Windows `cmd.exe` command by placing command
separators directly after the echoed tokens so stdout remains
deterministic while preserving the exact assertion.
2026-04-10 19:24:40 -07:00
jif-oai
085ffb4456 feat: move exec-server ownership (#16344)
This introduces session-scoped ownership for exec-server so ws
disconnects no longer immediately kill running remote exec processes,
and it prepares the protocol for reconnect-based resume.
- add session_id / resume_session_id to the exec-server initialize
handshake
  - move process ownership under a shared session registry
- detach sessions on websocket disconnect and expire them after a TTL
instead of killing processes immediately (we will resume based on this)
- allow a new connection to resume an existing session and take over
notifications/ownership
- I use UUID to make them not predictable as we don't have auth for now
- make detached-session expiry authoritative at resume time so teardown
wins at the TTL boundary
- reject long-poll process/read calls that get resumed out from under an
older attachment

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-10 14:11:47 +01:00
jif-oai
6d2f4aaafc feat: use ProcessId in exec-server (#15866)
Use a full struct for the ProcessId to increase readability and make it
easier in the future to make it evolve if needed
2026-03-26 16:45:36 +01:00
starr-openai
1d210f639e Add exec-server exec RPC implementation (#15090)
Stacked PR 2/3, based on the stub PR.

Adds the exec RPC implementation and process/event flow in exec-server
only.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-19 19:00:36 +00:00