Problem: The Windows exec-server test command could let separator
whitespace become part of `echo` output, making the exact
retained-output assertion flaky.
Solution: Tighten the Windows `cmd.exe` command by placing command
separators directly after the echoed tokens so stdout remains
deterministic while preserving the exact assertion.
This introduces session-scoped ownership for exec-server so ws
disconnects no longer immediately kill running remote exec processes,
and it prepares the protocol for reconnect-based resume.
- add session_id / resume_session_id to the exec-server initialize
handshake
- move process ownership under a shared session registry
- detach sessions on websocket disconnect and expire them after a TTL
instead of killing processes immediately (we will resume based on this)
- allow a new connection to resume an existing session and take over
notifications/ownership
- I use UUID to make them not predictable as we don't have auth for now
- make detached-session expiry authoritative at resume time so teardown
wins at the TTL boundary
- reject long-poll process/read calls that get resumed out from under an
older attachment
---------
Co-authored-by: Codex <noreply@openai.com>
When running with remote executor the cwd is the remote path. Today we
check for existence of a local directory on startup and attempt to load
config from it.
For remote executors don't do that.
## Summary
- add optional `sandboxPolicy` support to the app-server filesystem
request surface
- thread sandbox-aware filesystem options through app-server and
exec-server adapters
- enforce sandboxed read/write access in the filesystem abstraction with
focused local and remote coverage
## Validation
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-exec-server file_system`
- `cargo test -p codex-app-server suite::v2::fs`
---------
Co-authored-by: Codex <noreply@openai.com>
- Migrate apply-patch verification and application internals to use the
async `ExecutorFileSystem` abstraction from `exec-server`.
- Convert apply-patch `cwd` handling to `AbsolutePathBuf` through the
verifier/parser/handler boundary.
Doesn't change how the tool itself works.
## Summary
- make AGENTS.md discovery and loading fully FS-aware and remove the
non-FS discover helper
- migrate remote-aware codex-core tests to use TestEnv workspace setup
instead of syncing a local workspace copy
- add AGENTS.md corner-case coverage, including directory fallbacks and
remote-aware integration coverage
## Testing
- cargo test -p codex-core project_doc -- --nocapture
- cargo test -p codex-core hierarchical_agents -- --nocapture
- cargo test -p codex-core agents_md -- --nocapture
- cargo test -p codex-tui status -- --nocapture
- cargo test -p codex-tui-app-server status -- --nocapture
- just fix
- just fmt
- just bazel-lock-update
- just bazel-lock-check
- just argument-comment-lint
- remote Linux executor tests in progress via scripts/test-remote-env.sh
## Summary
- make `CODEX_EXEC_SERVER_URL=none` map to an explicit disabled
environment mode instead of inferring from a missing URL
- expose environment capabilities (`exec_enabled`, `filesystem_enabled`)
so tool building can gate behavior explicitly and future
multi-environment work has a clearer seam
- suppress env-backed tools when the relevant capability is unavailable,
including exec tools, `js_repl`, `apply_patch`, `list_dir`, and
`view_image`
- keep handler/runtime backstops so disabled environments still reject
execution if a tool path somehow bypasses registration
## Testing
- `just fmt`
- `cargo test -p codex-exec-server`
- `cargo test -p codex-tools
disabled_environment_omits_environment_backed_tools`
- `cargo test -p codex-tools
environment_capabilities_gate_exec_and_filesystem_tools_independently`
- remote devbox Bazel build via `codex-applied-devbox`:
`//codex-rs/cli:cli`
## Summary
- split `models-manager` out of `core` and add `ModelsManagerConfig`
plus `Config::to_models_manager_config()` so model metadata paths stop
depending on `core::Config`
- move login-owned/auth-owned code out of `core` into `codex-login`,
move model provider config into `codex-model-provider-info`, move API
bridge mapping into `codex-api`, move protocol-owned types/impls into
`codex-protocol`, and move response debug helpers into a dedicated
`response-debug-context` crate
- move feedback tag emission into `codex-feedback`, relocate tests to
the crates that now own the code, and keep broad temporary re-exports so
this PR avoids a giant import-only rewrite
## Major moves and decisions
- created `codex-models-manager` as the owner for model
cache/catalog/config/model info logic, including the new
`ModelsManagerConfig` struct
- created `codex-model-provider-info` as the owner for provider config
parsing/defaults and kept temporary `codex-login`/`codex-core`
re-exports for old import paths
- moved `api_bridge` error mapping + `CoreAuthProvider` into
`codex-api`, while `codex-login::api_bridge` temporarily re-exports
those symbols and keeps the `auth_provider_from_auth` wrapper
- moved `auth_env_telemetry` and `provider_auth` ownership to
`codex-login`
- moved `CodexErr` ownership to `codex-protocol::error`, plus
`StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
protocol-owned modules
- created `codex-response-debug-context` for
`extract_response_debug_context`, `telemetry_transport_error_message`,
and related response-debug plumbing instead of leaving that behavior in
`core`
- moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
`emit_feedback_request_tags_with_auth_env` to `codex-feedback`
- deferred removal of temporary re-exports and the mechanical import
rewrites to a stacked follow-up PR so this PR stays reviewable
## Test moves
- moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
`login/tests/suite/auth_refresh.rs`
- moved text encoding coverage from
`core/tests/suite/text_encoding_fix.rs` to
`protocol/src/exec_output_tests.rs`
- moved model info override coverage from
`core/tests/suite/model_info_overrides.rs` to
`models-manager/src/model_info_overrides_tests.rs`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.
## What changed
- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`
That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.
## Validation
- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`
## Follow-up
- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.
This PR partially rebase `unified_exec` on the `exec-server` and adapt
the `exec-server` accordingly.
## What changed in `exec-server`
1. Replaced the old "broadcast-driven; process-global" event model with
process-scoped session events. The goal is to be able to have dedicated
handler for each process.
2. Add to protocol contract to support explicit lifecycle status and
stream ordering:
- `WriteResponse` now returns `WriteStatus` (Accepted, UnknownProcess,
StdinClosed, Starting) instead of a bool.
- Added seq fields to output/exited notifications.
- Added terminal process/closed notification.
3. Demultiplexed remote notifications into per-process channels. Same as
for the event sys
4. Local and remote backends now both implement ExecBackend.
5. Local backend wraps internal process ID/operations into per-process
ExecProcess objects.
6. Remote backend registers a session channel before launch and
unregisters on failed launch.
## What changed in `unified_exec`
1. Added unified process-state model and backend-neutral process
wrapper. This will probably disappear in the future, but it makes it
easier to keep the work flowing on both side.
- `UnifiedExecProcess` now handles both local PTY sessions and remote
exec-server processes through a shared `ProcessHandle`.
- Added `ProcessState` to track has_exited, exit_code, and terminal
failure message consistently across backends.
2. Routed write and lifecycle handling through process-level methods.
## Some rationals
1. The change centralizes execution transport in exec-server while
preserving policy and orchestration ownership in core, avoiding
duplicated launch approval logic. This comes from internal discussion.
2. Session-scoped events remove coupling/cross-talk between processes
and make stream ordering and terminal state explicit (seq, closed,
failed).
3. The failure-path surfacing (remote launch failures, write failures,
transport disconnects) makes command tool output and cleanup behavior
deterministic
## Follow-ups:
* Unify the concept of thread ID behind an obfuscated struct
* FD handling
* Full zsh-fork compatibility
* Full network sandboxing compatibility
* Handle ws disconnection
Add environment manager that is a singleton and is created early in
app-server (before skill manager, before config loading).
Use an environment variable to point to a running exec server.
## Summary
- match the exec-process structure to filesystem PR #15232
- expose `ExecProcess` on `Environment`
- make `LocalProcess` the real implementation and `RemoteProcess` a thin
network proxy over `ExecServerClient`
- make `ProcessHandler` a thin RPC adapter delegating to `LocalProcess`
- add a shared local/remote process test
## Validation
- `just fmt`
- `CARGO_TARGET_DIR=~/.cache/cargo-target/codex cargo test -p
codex-exec-server`
- `just fix -p codex-exec-server`
---------
Co-authored-by: Codex <noreply@openai.com>
For each feature we have:
1. Trait exposed on environment
2. **Local Implementation** of the trait
3. Remote implementation that uses the client to proxy via network
4. Handler implementation that handles PRC requests and calls into
**Local Implementation**
- Move core/src/terminal.rs and its tests into a standalone
terminal-detection workspace crate.
- Update direct consumers to depend on codex-terminal-detection and
import terminal APIs directly.
---------
Co-authored-by: Codex <noreply@openai.com>
Stacked PR 2/3, based on the stub PR.
Adds the exec RPC implementation and process/event flow in exec-server
only.
---------
Co-authored-by: Codex <noreply@openai.com>
The idea is that codex-exec exposes an Environment struct with services
on it. Each of those is a trait.
Depending on construction parameters passed to Environment they are
either backed by local or remote server but core doesn't see these
differences.
Summary
- delete the deprecated stdio transport plumbing from the exec server
stack
- add a basic `exec_server()` harness plus test utilities to start a
server, send requests, and await events
- refresh exec-server dependencies, configs, and documentation to
reflect the new flow
Testing
- Not run (not requested)
---------
Co-authored-by: starr-openai <starr@openai.com>
Co-authored-by: Codex <noreply@openai.com>
Stacked PR 1/3.
This is the initialize-only exec-server stub slice: binary/client
scaffolding and protocol docs, without exec/filesystem implementation.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
We already plan to remove the shell-tool MCP path, and doing that
cleanup first makes the follow-on `shell-escalation` work much simpler.
This change removes the last remaining reason to keep
`codex-rs/exec-server` around by moving the `codex-execve-wrapper`
binary and shared shell test fixtures to the crates/tests that now own
that functionality.
## What Changed
### Delete `codex-rs/exec-server`
- Remove the `exec-server` crate, including the MCP server binary,
MCP-specific modules, and its test support/test suite
- Remove `exec-server` from the `codex-rs` workspace and update
`Cargo.lock`
### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
- Move the wrapper implementation into `shell-escalation`
(`src/unix/execve_wrapper.rs`)
- Add the `codex-execve-wrapper` binary entrypoint under
`shell-escalation/src/bin/`
- Update `shell-escalation` exports/module layout so the wrapper
entrypoint is hosted there
- Move the wrapper README content from `exec-server` to
`shell-escalation/README.md`
### Move shared shell test fixtures to `app-server`
- Move the DotSlash `bash`/`zsh` test fixtures from
`exec-server/tests/suite/` to `app-server/tests/suite/`
- Update `app-server` zsh-fork tests to reference the new fixture paths
### Keep `shell-tool-mcp` as a shell-assets package
- Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
artifact contains only patched Bash/Zsh payloads (no Rust binaries)
- Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
and docs to reflect the shell-assets-only package shape
- `shell-tool-mcp-ci.yml` does not need changes because it is already
JS-only
## Verification
- `cargo shear`
- `cargo clippy -p codex-shell-escalation --tests`
- `just clippy`
## Why
Shell execution refactoring in `exec-server` had become split between
duplicated code paths, which blocked a clean introduction of the new
reusable shell escalation flow. This commit creates a dedicated
foundation crate so later shell tooling changes can share one
implementation.
## What changed
- Added the `codex-shell-escalation` crate and moved the core escalation
pieces (`mcp` protocol/socket/session flow, policy glue) that were
previously in `exec-server` into it.
- Normalized `exec-server` Unix structure under a dedicated `unix`
module layout and kept non-Unix builds narrow.
- Wired crate/build metadata so `shell-escalation` is a first-class
workspace dependency for follow-on integration work.
## Verification
- Built and linted the stack at this commit point with `just clippy`.
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12556).
* #12584
* #12583
* __->__ #12556
## Why
The current escalate path in `codex-rs/exec-server` still had policy
creation coupled to MCP details, which makes it hard to reuse the shell
execution flow outside the MCP server. This change is part of a broader
goal to split MCP-specific behavior from shared escalation execution so
other handlers (for example a future `ShellCommandHandler`) can reuse it
without depending on MCP request context types.
## What changed
- Added a new `EscalationPolicyFactory` abstraction in `mcp.rs`:
- `crate`-relative path: `codex-rs/exec-server/src/posix/mcp.rs`
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L87-L107
- Made `run_escalate_server` in `mcp.rs` accept a policy factory instead
of constructing `McpEscalationPolicy` directly.
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L178-L201
- Introduced `McpEscalationPolicyFactory` that stores MCP-only state
(`RequestContext`, `preserve_program_paths`) and implements the new
trait.
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L100-L117
- Updated `shell()` to pass a `McpEscalationPolicyFactory` instance into
`run_escalate_server`, so the server remains the MCP-specific wiring
layer.
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L163-L170
## Verification
- Build and test execution was not re-run in this pass; changes are
limited to `mcp.rs` and preserve the existing escalation flow semantics
by only extracting policy construction behind a factory.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12555).
* #12556
* __->__ #12555
## Why
The zsh integration tests were still brittle in two ways:
- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI
In particular, the Linux failures were not just test noise:
- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm
This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.
## What Changed
- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
- tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
- resolving an absolute `git` path
- running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text
## Verification
- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
## Why
`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.
This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.
## What Changed
- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
- `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
- `codex_protocol::protocol`
- `codex_protocol::config_types`
- `codex_protocol::models`
- `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
- `codex-utils-approval-presets`
- `codex-utils-cli`
## Verification
- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.
- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.
Example:
- 3 calls:
- a.com (line 443)
- b.com (line 443)
- a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443
## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
### Description
#### Summary
Introduces the core plumbing required for structured network approvals
#### What changed
- Added structured network policy decision modeling in core.
- Added approval payload/context types needed for network approval
semantics.
- Wired shell/unified-exec runtime plumbing to consume structured
decisions.
- Updated related core error/event surfaces for structured handling.
- Updated protocol plumbing used by core approval flow.
- Included small CLI debug sandbox compatibility updates needed by this
layer.
#### Why
establishes the minimal backend foundation for network approvals without
yet changing high-level orchestration or TUI behavior.
#### Notes
- Behavior remains constrained by existing requirements/config gating.
- Follow-up PRs in the stack handle orchestration, UX, and app-server
integration.
---------
Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.
It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.
## What
- Added `ReadOnlyAccess` in protocol with:
- `Restricted { include_platform_defaults, readable_roots }`
- `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
- `ReadOnly { access: ReadOnlyAccess }`
- `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.
## Compatibility / rollout
- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
This PR adds the following field to `Config`:
```rust
pub network: Option<NetworkProxy>,
```
Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).
This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
## Summary
This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
curent Linux sandbox path relies on in-process restrictions (including
Landlock). Bubblewrap gives us a more uniform filesystem isolation
model, especially explicit writable roots with the option to make some
directories read-only and granular network controls.
This is behind a feature flag so we can validate behavior safely before
making it the default.
- Added temporary rollout flag:
- `features.use_linux_sandbox_bwrap`
- Preserved existing default path when the flag is off.
- In Bubblewrap mode:
- Added internal retry without /proc when /proc mount is not permitted
by the host/container.
Load requirements from Codex Backend. It only does this for enterprise
customers signed in with ChatGPT.
Todo in follow-up PRs:
* Add to app-server and exec too
* Switch from fail-open to fail-closed on failure
This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).
If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:
- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:
```
failures:
suite::compact::manual_compact_twice_preserves_latest_user_messages
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```
- `just build-for-release` to build release binaries for all
platforms/architectures remotely
To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)
## CI
In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.
The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:
```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```
Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:
- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.
Release builds will also continue to be done by Cargo for now.
Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504
---------
Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
https://github.com/openai/codex/pull/8879 introduced the
`find_resource!` macro, but now that I am about to use it in more
places, I realize that it should take care of this normalization case
for callers.
Note the `use $crate::path_absolutize::Absolutize;` line is there so
that users of `find_resource!` do not have to explicitly include
`path-absolutize` to their own `Cargo.toml`.
To support Bazelification in https://github.com/openai/codex/pull/8875,
this PR introduces a new `find_resource!` macro that we use in place of
our existing logic in tests that looks for resources relative to the
compile-time `CARGO_MANIFEST_DIR` env var.
To make this work, we plan to add the following to all `rust_library()`
and `rust_test()` Bazel rules in the project:
```
rustc_env = {
"BAZEL_PACKAGE": native.package_name(),
},
```
Our new `find_resource!` macro reads this value via
`option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
using `find_resource!`_ is injected into the code expanded from the
macro. (If `find_resource()` were a function, then
`option_env!("BAZEL_PACKAGE")` would always be
`codex-rs/utils/cargo-bin`, which is not what we want.)
Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
environment variable is set at runtime, indicating that the test is
being run by Bazel. In this case, we have to concatenate the runtime
`RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
path to the resource.
In testing this change, I discovered one funky edge case in
`codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
(but not canonicalize!) the result from `find_resource!` because the
path contains a `common/..` component that does not exist on disk when
the test is run under Bazel, so it must be semantically normalized using
the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
before it is passed to `dotslash fetch`.
Because this new behavior may be non-obvious, this PR also updates
`AGENTS.md` to make humans/Codex aware that this API is preferred.
The Bazelification work in-flight over at
https://github.com/openai/codex/pull/8832 needs this fix so that Bazel
can find the path to the DotSlash file for `bash`.
With this change, the following almost works:
```
bazel test --test_output=errors //codex-rs/exec-server:exec-server-all-test
```
That is, now the `list_tools` test passes, but
`accept_elicitation_for_prompt_rule` still fails because it runs
Seatbelt itself, so it needs to be run outside Bazel's local sandboxing.
This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.
As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.
See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.
Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.
My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
https://github.com/openai/codex/pull/8354 added support for in-repo
`.config/` files, so this PR updates the logic for loading `*.rules`
files to load `*.rules` files from all relevant layers. The main change
to the business logic is `load_exec_policy()` in
`codex-rs/core/src/exec_policy.rs`.
Note this adds a `config_folder()` method to `ConfigLayerSource` that
returns `Option<AbsolutePathBuf>` so that it is straightforward to
iterate over the sources and get the associated config folder, if any.
Historically, `accept_elicitation_for_prompt_rule()` was flaky because
we were using a notification to update the sandbox followed by a `shell`
tool request that we expected to be subject to the new sandbox config,
but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate
each incoming message to a new Tokio task, messages are not guaranteed
to be processed in order, so sometimes the `shell` tool call would run
before the notification was processed.
Prior to this PR, we relied on a generous `sleep()` between the
notification and the request to reduce the change of the test flaking
out.
This PR implements a proper fix, which is to use a _request_ instead of
a notification for the sandbox update so that we can wait for the
response to the sandbox request before sending the request to the
`shell` tool call. Previously, `rmcp` did not support custom requests,
but I fixed that in
https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it
into the `0.12.0` release (see #8288).
This PR updates `shell-tool-mcp` to expect
`"codex/sandbox-state/update"` as a _request_ instead of a notification
and sends the appropriate ack. Note this behavior is tied to our custom
`codex/sandbox-state` capability, which Codex honors as an MCP client,
which is why `core/src/mcp_connection_manager.rs` had to be updated as
part of this PR, as well.
This PR also updates the docs at `shell-tool-mcp/README.md`.
The existing version of `shell-tool-mcp/README.md` was not written in a
way that was meant to be consumed by end-users. This is now fixed.
Added `codex-rs/exec-server/README.md` for the more technical bits.
We decided that `*.rules` is a more fitting (and concise) file extension
than `*.codexpolicy`, so we are changing the file extension for the
"execpolicy" effort. We are also changing the subfolder of `$CODEX_HOME`
from `policy` to `rules` to match.
This PR updates the in-repo docs and we will update the public docs once
the next CLI release goes out.
Locally, I created `~/.codex/rules/default.rules` with the following
contents:
```
prefix_rule(pattern=["gh", "pr", "view"])
```
And then I asked Codex to run:
```
gh pr view 7888 --json title,body,comments
```
and it was able to!