Load requirements from Codex Backend. It only does this for enterprise
customers signed in with ChatGPT.
Todo in follow-up PRs:
* Add to app-server and exec too
* Switch from fail-open to fail-closed on failure
This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).
If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:
- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:
```
failures:
suite::compact::manual_compact_twice_preserves_latest_user_messages
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```
- `just build-for-release` to build release binaries for all
platforms/architectures remotely
To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)
## CI
In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.
The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:
```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```
Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:
- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.
Release builds will also continue to be done by Cargo for now.
Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504
---------
Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
https://github.com/openai/codex/pull/8879 introduced the
`find_resource!` macro, but now that I am about to use it in more
places, I realize that it should take care of this normalization case
for callers.
Note the `use $crate::path_absolutize::Absolutize;` line is there so
that users of `find_resource!` do not have to explicitly include
`path-absolutize` to their own `Cargo.toml`.
To support Bazelification in https://github.com/openai/codex/pull/8875,
this PR introduces a new `find_resource!` macro that we use in place of
our existing logic in tests that looks for resources relative to the
compile-time `CARGO_MANIFEST_DIR` env var.
To make this work, we plan to add the following to all `rust_library()`
and `rust_test()` Bazel rules in the project:
```
rustc_env = {
"BAZEL_PACKAGE": native.package_name(),
},
```
Our new `find_resource!` macro reads this value via
`option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
using `find_resource!`_ is injected into the code expanded from the
macro. (If `find_resource()` were a function, then
`option_env!("BAZEL_PACKAGE")` would always be
`codex-rs/utils/cargo-bin`, which is not what we want.)
Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
environment variable is set at runtime, indicating that the test is
being run by Bazel. In this case, we have to concatenate the runtime
`RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
path to the resource.
In testing this change, I discovered one funky edge case in
`codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
(but not canonicalize!) the result from `find_resource!` because the
path contains a `common/..` component that does not exist on disk when
the test is run under Bazel, so it must be semantically normalized using
the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
before it is passed to `dotslash fetch`.
Because this new behavior may be non-obvious, this PR also updates
`AGENTS.md` to make humans/Codex aware that this API is preferred.
The Bazelification work in-flight over at
https://github.com/openai/codex/pull/8832 needs this fix so that Bazel
can find the path to the DotSlash file for `bash`.
With this change, the following almost works:
```
bazel test --test_output=errors //codex-rs/exec-server:exec-server-all-test
```
That is, now the `list_tools` test passes, but
`accept_elicitation_for_prompt_rule` still fails because it runs
Seatbelt itself, so it needs to be run outside Bazel's local sandboxing.
This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.
As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.
See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.
Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.
My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
https://github.com/openai/codex/pull/8354 added support for in-repo
`.config/` files, so this PR updates the logic for loading `*.rules`
files to load `*.rules` files from all relevant layers. The main change
to the business logic is `load_exec_policy()` in
`codex-rs/core/src/exec_policy.rs`.
Note this adds a `config_folder()` method to `ConfigLayerSource` that
returns `Option<AbsolutePathBuf>` so that it is straightforward to
iterate over the sources and get the associated config folder, if any.
Historically, `accept_elicitation_for_prompt_rule()` was flaky because
we were using a notification to update the sandbox followed by a `shell`
tool request that we expected to be subject to the new sandbox config,
but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate
each incoming message to a new Tokio task, messages are not guaranteed
to be processed in order, so sometimes the `shell` tool call would run
before the notification was processed.
Prior to this PR, we relied on a generous `sleep()` between the
notification and the request to reduce the change of the test flaking
out.
This PR implements a proper fix, which is to use a _request_ instead of
a notification for the sandbox update so that we can wait for the
response to the sandbox request before sending the request to the
`shell` tool call. Previously, `rmcp` did not support custom requests,
but I fixed that in
https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it
into the `0.12.0` release (see #8288).
This PR updates `shell-tool-mcp` to expect
`"codex/sandbox-state/update"` as a _request_ instead of a notification
and sends the appropriate ack. Note this behavior is tied to our custom
`codex/sandbox-state` capability, which Codex honors as an MCP client,
which is why `core/src/mcp_connection_manager.rs` had to be updated as
part of this PR, as well.
This PR also updates the docs at `shell-tool-mcp/README.md`.
The existing version of `shell-tool-mcp/README.md` was not written in a
way that was meant to be consumed by end-users. This is now fixed.
Added `codex-rs/exec-server/README.md` for the more technical bits.
We decided that `*.rules` is a more fitting (and concise) file extension
than `*.codexpolicy`, so we are changing the file extension for the
"execpolicy" effort. We are also changing the subfolder of `$CODEX_HOME`
from `policy` to `rules` to match.
This PR updates the in-repo docs and we will update the public docs once
the next CLI release goes out.
Locally, I created `~/.codex/rules/default.rules` with the following
contents:
```
prefix_rule(pattern=["gh", "pr", "view"])
```
And then I asked Codex to run:
```
gh pr view 7888 --json title,body,comments
```
and it was able to!
Let's see if this `sleep()` call is good enough to fix the test
flakiness we currently see in CI. It will take me some time to upstream
a proper fix, and I would prefer not to disable this test in the
interim.
When I originally introduced `accept_elicitation_for_prompt_rule()` in
https://github.com/openai/codex/pull/7617, it worked for me locally
because I had run `codex-rs/exec-server/tests/suite/bash` once myself,
which had the side-effect of installing the corresponding DotSlash
artifact.
In CI, I added explicit logic to do this as part of
`.github/workflows/rust-ci.yml`, which meant the test also passed in CI,
but this logic should have been done as part of the test so that it
would work locally for devs who had not installed the DotSlash artifact
for `codex-rs/exec-server/tests/suite/bash` before. This PR updates the
test to do this (and deletes the setup logic from `rust-ci.yml`),
creating a new `DOTSLASH_CACHE` in a temp directory so that this is
handled independently for each test.
While here, also added a check to ensure that the `codex` binary has
been built prior to running the test, as we have to ensure it is
symlinked as `codex-linux-sandbox` on Linux in order for the integration
test to work on that platform.
helpful in the future if we want more granularity for requesting
escalated permissions:
e.g when running in readonly sandbox, model can request to escalate to a
sandbox that allows writes
This changes our default Landlock policy to allow `sendmsg(2)` and
`recvmsg(2)` syscalls. We believe these were originally denied out of an
abundance of caution, but given that `send(2)` nor `recv(2)` are allowed
today [which provide comparable capability to the `*msg` equivalents],
we do not believe allowing them grants any privileges beyond what we
already allow.
Rather than using the syscall as the security boundary, preventing
access to the potentially hazardous file descriptor in the first place
seems like the right layer of defense.
In particular, this makes it possible for `shell-tool-mcp` to run on
Linux when using a read-only sandbox for the Bash process, as
demonstrated by `accept_elicitation_for_prompt_rule()` now succeeding in
CI.
This PR introduces integration tests that run
[codex-shell-tool-mcp](https://www.npmjs.com/package/@openai/codex-shell-tool-mcp)
as a user would. Note that this requires running our fork of Bash, so we
introduce a [DotSlash](https://dotslash-cli.com/) file for `bash` so
that we can run the integration tests on multiple platforms without
having to check the binaries into the repository. (As noted in the
DotSlash file, it is slightly more heavyweight than necessary, which may
be worth addressing as disk space in CI is limited:
https://github.com/openai/codex/pull/7678.)
To start, this PR adds two tests:
- `list_tools()` makes the `list_tools` request to the MCP server and
verifies we get the expected response
- `accept_elicitation_for_prompt_rule()` defines a `prefix_rule()` with
`decision="prompt"` and verifies the elicitation flow works as expected
Though the `accept_elicitation_for_prompt_rule()` test **only works on
Linux**, as this PR reveals that there are currently issues when running
the Bash fork in a read-only sandbox on Linux. This will have to be
fixed in a follow-up PR.
Incidentally, getting this test run to correctly on macOS also requires
a recent fix we made to `brew` that hasn't hit a mainline release yet,
so getting CI green in this PR required
https://github.com/openai/codex/pull/7680.
Previous to this change, large `EscalateRequest` payloads exceeded the
kernel send buffer, causing our single `sendmsg(2)` call (with attached
FDs) to be split and retried without proper control handling; this led
to `EINVAL`/broken pipe in the
`handle_escalate_session_respects_run_in_sandbox_decision()` test when
using an `env` with large contents.
**Before:** `AsyncSocket::send_with_fds()` called `send_json_message()`,
which called `send_message_bytes()`, which made one `socket.sendmsg()`
call followed by additional `socket.send()` calls, as necessary:
2e4a402521/codex-rs/exec-server/src/posix/socket.rs (L198-L209)
**After:** `AsyncSocket::send_with_fds()` now calls
`send_stream_frame()`, which calls `send_stream_chunk()` one or more
times. Each call to `send_stream_chunk()` calls `socket.sendmsg()`.
In the previous implementation, the subsequent `socket.send()` writes
had no control information associated with them, whereas in the new
`send_stream_chunk()` implementation, a fresh `MsgHdr` (using
`with_control()`, as appropriate) is created for `socket.sendmsg()` each
time.
Additionally, with this PR, stream sending attaches `SCM_RIGHTS` only on
the first chunk, and omits control data when there are no FDs, allowing
oversized payloads to deliver correctly while preserving FD limits and
error checks.
I find it helpful to easily verify which version is running.
Tested:
```shell
~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --help
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.19s
Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --help`
Usage: codex-exec-mcp-server [OPTIONS]
Options:
--execve <EXECVE_WRAPPER> Executable to delegate execve(2) calls to in Bash
--bash <BASH_PATH> Path to Bash that has been patched to support execve() wrapping
-h, --help Print help
-V, --version Print version
~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --version
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --version`
codex-exec-server 0.0.0
```
This introduces a new feature to Codex when it operates as an MCP
_client_ where if an MCP _server_ replies that it has an entry named
`"codex/sandbox-state"` in its _server capabilities_, then Codex will
send it an MCP notification with the following structure:
```json
{
"method": "codex/sandbox-state/update",
"params": {
"sandboxPolicy": {
"type": "workspace-write",
"network-access": false,
"exclude-tmpdir-env-var": false
"exclude-slash-tmp": false
},
"codexLinuxSandboxExe": null,
"sandboxCwd": "/Users/mbolin/code/codex2"
}
}
```
or with whatever values are appropriate for the initial `sandboxPolicy`.
**NOTE:** Codex _should_ continue to send the MCP server notifications
of the same format if these things change over the lifetime of the
thread, but that isn't wired up yet.
The result is that `shell-tool-mcp` can consume these values so that
when it calls `codex_core::exec::process_exec_tool_call()` in
`codex-rs/exec-server/src/posix/escalate_server.rs`, it is now sure to
call it with the correct values (whereas previously we relied on
hardcoded values).
While I would argue this is a supported use case within the MCP
protocol, the `rmcp` crate that we are using today does not support
custom notifications. As such, I had to patch it and I submitted it for
review, so hopefully it will be accepted in some form:
https://github.com/modelcontextprotocol/rust-sdk/pull/556
To test out this change from end-to-end:
- I ran `cargo build` in `~/code/codex2/codex-rs/exec-server`
- I built the fork of Bash in `~/code/bash/bash`
- I added the following to my `~/.codex/config.toml`:
```toml
# Use with `codex --disable shell_tool`.
[mcp_servers.execshell]
args = ["--bash", "/Users/mbolin/code/bash/bash"]
command = "/Users/mbolin/code/codex2/codex-rs/target/debug/codex-exec-mcp-server"
```
- From `~/code/codex2/codex-rs`, I ran `just codex --disable shell_tool`
- When the TUI started up, I verified that the sandbox mode is
`workspace-write`
- I ran `/mcp` to verify that the shell tool from the MCP is there:
<img width="1387" height="1400" alt="image"
src="https://github.com/user-attachments/assets/1a8addcc-5005-4e16-b59f-95cfd06fd4ab"
/>
- Then I asked it:
> what is the output of `gh issue list`
because this should be auto-approved with our existing dummy policy:
af63e6eccc/codex-rs/exec-server/src/posix.rs (L157-L164)
And it worked:
<img width="1387" height="1400" alt="image"
src="https://github.com/user-attachments/assets/7568d2f7-80da-4d68-86d0-c265a6f5e6c1"
/>
`process_exec_tool_call()` was taking `SandboxType` as a param, but in
practice, the only place it was constructed was in
`codex_message_processor.rs` where it was derived from the other
`sandbox_policy` param, so this PR inlines the logic that decides the
`SandboxType` into `process_exec_tool_call()`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
* #7112
* __->__ #7122
The unified exec tool has a `login` option that defaults to `true`:
3bdcbc7292/codex-rs/core/src/tools/handlers/unified_exec.rs (L35-L36)
This updates the `ExecParams` for `shell-tool-mcp` to support the same
parameter. Note it is declared as `Option<bool>` to ensure it is marked
optional in the generated JSON schema.
Previously, we were running into an issue where we would run the `shell`
tool call with a timeout of 10s, but it fired an elicitation asking for
user approval, the time the user took to respond to the elicitation was
counted agains the 10s timeout, so the `shell` tool call would fail with
a timeout error unless the user is very fast!
This PR addresses this issue by introducing a "stopwatch" abstraction
that is used to manage the timeout. The idea is:
- `Stopwatch::new()` is called with the _real_ timeout of the `shell`
tool call.
- `process_exec_tool_call()` is called with the `Cancellation` variant
of `ExecExpiration` because it should not manage its own timeout in this
case
- the `Stopwatch` expiration is wired up to the `cancel_rx` passed to
`process_exec_tool_call()`
- when an elicitation for the `shell` tool call is received, the
`Stopwatch` pauses
- because it is possible for multiple elicitations to arrive
concurrently, it keeps track of the number of "active pauses" and does
not resume until that counter goes down to zero
I verified that I can test the MCP server using
`@modelcontextprotocol/inspector` and specify `git status` as the
`command` with a timeout of 500ms and that the elicitation pops up and I
have all the time in the world to respond whereas previous to this PR,
that would not have been possible.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6973).
* #7005
* __->__ #6973
* #6972
This updates `ExecParams` so that instead of taking `timeout_ms:
Option<u64>`, it now takes a more general cancellation mechanism,
`ExecExpiration`, which is an enum that includes a
`Cancellation(tokio_util::sync::CancellationToken)` variant.
If the cancellation token is fired, then `process_exec_tool_call()`
returns in the same way as if a timeout was exceeded.
This is necessary so that in #6973, we can manage the timeout logic
external to the `process_exec_tool_call()` because we want to "suspend"
the timeout when an elicitation from a human user is pending.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
* #7005
* #6973
* __->__ #6972
This PR reorganizes things slightly so that:
- Instead of a single multitool executable, `codex-exec-server`, we now
have two executables:
- `codex-exec-mcp-server` to launch the MCP server
- `codex-execve-wrapper` is the `execve(2)` wrapper to use with the
`BASH_EXEC_WRAPPER` environment variable
- `BASH_EXEC_WRAPPER` must be a single executable: it cannot be a
command string composed of an executable with args (i.e., it no longer
adds the `escalate` subcommand, as before)
- `codex-exec-mcp-server` takes `--bash` and `--execve` as options.
Though if `--execve` is not specified, the MCP server will check the
directory containing `std::env::current_exe()` and attempt to use the
file named `codex-execve-wrapper` within it. In development, this works
out since these executables are side-by-side in the `target/debug`
folder.
With respect to testing, this also fixes an important bug in
`dummy_exec_policy()`, as I was using `ends_with()` as if it applied to
a `String`, but in this case, it is used with a `&Path`, so the
semantics are slightly different.
Putting this all together, I was able to test this by running the
following:
```
~/code/codex/codex-rs$ npx @modelcontextprotocol/inspector \
./target/debug/codex-exec-mcp-server --bash ~/code/bash/bash
```
If I try to run `git status` in `/Users/mbolin/code/codex` via the
`shell` tool from the MCP server:
<img width="1589" height="1335" alt="image"
src="https://github.com/user-attachments/assets/9db6aea8-7fbc-4675-8b1f-ec446685d6c4"
/>
then I get prompted with the following elicitation, as expected:
<img width="1589" height="1335" alt="image"
src="https://github.com/user-attachments/assets/21b68fe0-494d-4562-9bad-0ddc55fc846d"
/>
Though a current limitation is that the `shell` tool defaults to a
timeout of 10s, which means I only have 10s to respond to the
elicitation. Ideally, the time spent waiting for a response from a human
should not count against the timeout for the command execution. I will
address this in a subsequent PR.
---
Note `~/code/bash/bash` was created by doing:
```
cd ~/code
git clone https://github.com/bminor/bash
cd bash
git checkout a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b
<apply the patch below>
./configure
make
```
The patch:
```
diff --git a/execute_cmd.c b/execute_cmd.c
index 070f5119..d20ad2b9 100644
--- a/execute_cmd.c
+++ b/execute_cmd.c
@@ -6129,6 +6129,19 @@ shell_execve (char *command, char **args, char **env)
char sample[HASH_BANG_BUFSIZ];
size_t larray;
+ char* exec_wrapper = getenv("BASH_EXEC_WRAPPER");
+ if (exec_wrapper && *exec_wrapper && !whitespace (*exec_wrapper))
+ {
+ char *orig_command = command;
+
+ larray = strvec_len (args);
+
+ memmove (args + 2, args, (++larray) * sizeof (char *));
+ args[0] = exec_wrapper;
+ args[1] = orig_command;
+ command = exec_wrapper;
+ }
+
```
This PR introduces an extra layer of abstraction to prepare us for the
migration to execpolicy2:
- introduces a new trait, `EscalationPolicy`, whose `determine_action()`
method is responsible for producing the `EscalateAction`
- the existing `ExecPolicy` typedef is changed to return an intermediate
`ExecPolicyOutcome` instead of `EscalateAction`
- the default implementation of `EscalationPolicy`,
`McpEscalationPolicy`, composes `ExecPolicy`
- the `ExecPolicyOutcome` includes `codex_execpolicy2::Decision`, which
has a `Prompt` variant
- when `McpEscalationPolicy` gets `Decision::Prompt` back from
`ExecPolicy`, it prompts the user via an MCP elicitation and maps the
result into an `ElicitationAction`
- now that the end user can reply to an elicitation with `Decline` or
`Cancel`, we introduce a new variant, `EscalateAction::Deny`, which the
client handles by returning exit code `1` without running anything
Note the way the elicitation is created is still not quite right, but I
will fix that once we have things running end-to-end for real in a
follow-up PR.