mirror of https://github.com/openai/codex.git synced 2026-02-01 14:44:17 +00:00

Files

zbarsky-openai 2a06d64bc9 feat: add support for building with Bazel (#8875 )

This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).

If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:

- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:

```
failures:
    suite::compact::manual_compact_twice_preserves_latest_user_messages
    suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
    suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```

- `just build-for-release` to build release binaries for all
platforms/architectures remotely

To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)

## CI

In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.

The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:

```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```

Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:

- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875

For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.

Release builds will also continue to be done by Cargo for now.

Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504

---------

Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>

2026-01-09 11:09:43 -08:00

src

migrating execpolicy -> execpolicy-legacy and execpolicy2 -> execpolicy (#6956 )

2025-11-19 19:14:10 -08:00

tests

migrating execpolicy -> execpolicy-legacy and execpolicy2 -> execpolicy (#6956 )

2025-11-19 19:14:10 -08:00

BUILD.bazel

feat: add support for building with Bazel (#8875 )

2026-01-09 11:09:43 -08:00

build.rs

migrating execpolicy -> execpolicy-legacy and execpolicy2 -> execpolicy (#6956 )

2025-11-19 19:14:10 -08:00

Cargo.toml

chore: add cargo-deny configuration (#7119 )

2025-11-24 12:22:18 -08:00

README.md

migrating execpolicy -> execpolicy-legacy and execpolicy2 -> execpolicy (#6956 )

2025-11-19 19:14:10 -08:00

README.md

codex-execpolicy-legacy

This crate hosts the original execpolicy implementation. The newer prefix-rule engine lives in codex-execpolicy.

The goal of this library is to classify a proposed execv(3) command into one of the following states:

safe The command is safe to run (*).
match The command matched a rule in the policy, but the caller should decide whether it is safe to run based on the files it will write.
forbidden The command is not allowed to be run.
unverified The safety cannot be determined: make the user decide.

(*) Whether an execv(3) call should be considered "safe" often requires additional context beyond the arguments to execv() itself. For example, if you trust an autonomous software agent to write files in your source tree, then deciding whether /bin/cp foo bar is "safe" depends on getcwd(3) for the calling process as well as the realpath of foo and bar when resolved against getcwd(). To that end, rather than returning a boolean, the validator returns a structured result that the client is expected to use to determine the "safety" of the proposed execv() call.

For example, to check the command ls -l foo, the checker would be invoked as follows:

cargo run -p codex-execpolicy-legacy -- check ls -l foo | jq

It will exit with 0 and print the following to stdout:

{
  "result": "safe",
  "match": {
    "program": "ls",
    "flags": [
      {
        "name": "-l"
      }
    ],
    "opts": [],
    "args": [
      {
        "index": 1,
        "type": "ReadableFile",
        "value": "foo"
      }
    ],
    "system_path": ["/bin/ls", "/usr/bin/ls"]
  }
}

Of note:

foo is tagged as a ReadableFile, so the caller should resolve foo relative to getcwd() and realpath it (as it may be a symlink) to determine whether foo is safe to read.
While the specified executable is ls, "system_path" offers /bin/ls and /usr/bin/ls as viable alternatives to avoid using whatever ls happens to appear first on the user's $PATH. If either exists on the host, it is recommended to use it as the first argument to execv(3) instead of ls.

Further, "safety" in this system is not a guarantee that the command will execute successfully. As an example, cat /Users/mbolin/code/codex/README.md may be considered "safe" if the system has decided the agent is allowed to read anything under /Users/mbolin/code/codex, but it will fail at runtime if README.md does not exist. (Though this is "safe" in that the agent did not read any files that it was not authorized to read.)

Policy

Currently, the default policy is defined in default.policy within the crate.

The system uses Starlark as the file format because, unlike something like JSON or YAML, it supports "macros" without compromising on safety or reproducibility. (Under the hood, we use starlark-rust as the specific Starlark implementation.)

This policy contains "rules" such as:

define_program(
    program="cp",
    options=[
        flag("-r"),
        flag("-R"),
        flag("--recursive"),
    ],
    args=[ARG_RFILES, ARG_WFILE],
    system_path=["/bin/cp", "/usr/bin/cp"],
    should_match=[
        ["foo", "bar"],
    ],
    should_not_match=[
        ["foo"],
    ],
)

This rule means that:

cp can be used with any of the following flags (where "flag" means "an option that does not take an argument"): -r, -R, --recursive.
The initial ARG_RFILES passed to args means that it expects one or more arguments that correspond to "readable files"
The final ARG_WFILE passed to args means that it expects exactly one argument that corresponds to a "writeable file."
As a means of a lightweight way of including a unit test alongside the definition, the should_match list is a list of examples of execv(3) args that should match the rule and should_not_match is a list of examples that should not match. These examples are verified when the .policy file is loaded.

Note that the language of the .policy file is still evolving, as we have to continue to expand it so it is sufficiently expressive to accept all commands we want to consider "safe" without allowing unsafe commands to pass through.

The integrity of default.policy is verified via unit tests.

Further, the CLI supports a --policy option to specify a custom .policy file for ad-hoc testing.

Output Type: `match`

Going back to the cp example, because the rule matches an ARG_WFILE, it will return match instead of safe:

cargo run -p codex-execpolicy-legacy -- check cp src1 src2 dest | jq

If the caller wants to consider allowing this command, it should parse the JSON to pick out the WriteableFile arguments and decide whether they are safe to write:

{
  "result": "match",
  "match": {
    "program": "cp",
    "flags": [],
    "opts": [],
    "args": [
      {
        "index": 0,
        "type": "ReadableFile",
        "value": "src1"
      },
      {
        "index": 1,
        "type": "ReadableFile",
        "value": "src2"
      },
      {
        "index": 2,
        "type": "WriteableFile",
        "value": "dest"
      }
    ],
    "system_path": ["/bin/cp", "/usr/bin/cp"]
  }
}

Note the exit code is still 0 for a match unless the --require-safe flag is specified, in which case the exit code is 12.

Output Type: `forbidden`

It is also possible to define a rule that, if it matches a command, should flag it as forbidden. For example, we do not want agents to be able to run applied deploy ever, so we define the following rule:

define_program(
    program="applied",
    args=["deploy"],
    forbidden="Infrastructure Risk: command contains 'applied deploy'",
    should_match=[
        ["deploy"],
    ],
    should_not_match=[
        ["lint"],
    ],
)

Note that for a rule to be forbidden, the forbidden keyword arg must be specified as the reason the command is forbidden. This will be included in the output:

cargo run -p codex-execpolicy-legacy -- check applied deploy | jq

{
  "result": "forbidden",
  "reason": "Infrastructure Risk: command contains 'applied deploy'",
  "cause": {
    "Exec": {
      "exec": {
        "program": "applied",
        "flags": [],
        "opts": [],
        "args": [
          {
            "index": 0,
            "type": {
              "Literal": "deploy"
            },
            "value": "deploy"
          }
        ],
        "system_path": []
      }
    }
  }
}

README.md

codex-execpolicy-legacy

Policy

Output Type: match

Output Type: forbidden

Output Type: `match`

Output Type: `forbidden`