## Why
This continues the `codex-tools` migration by moving another passive
tool-spec layer out of `codex-core`.
After `ToolSpec` moved into `codex-tools`, `codex-core` still owned
`ConfiguredToolSpec` and `create_tools_json_for_responses_api()`. Both
are data-model and serialization helpers rather than runtime
orchestration, so keeping them in `core/src/tools/registry.rs` and
`core/src/tools/spec.rs` left passive tool-definition code coupled to
`codex-core` longer than necessary.
## What changed
- moved `ConfiguredToolSpec` into `codex-rs/tools/src/tool_spec.rs`
- moved `create_tools_json_for_responses_api()` into
`codex-rs/tools/src/tool_spec.rs`
- re-exported the new surface from `codex-rs/tools/src/lib.rs`, which
remains exports-only
- updated `core/src/client.rs`, `core/src/tools/registry.rs`, and
`core/src/tools/router.rs` to consume the extracted types and serializer
from `codex-tools`
- moved the tool-list serialization test into
`codex-rs/tools/src/tool_spec_tests.rs`
- added focused unit coverage for `ConfiguredToolSpec::name()`
- simplified `core/src/tools/spec_tests.rs` to use the extracted
`ConfiguredToolSpec::name()` directly and removed the now-redundant
local `tool_name()` helper
- updated `codex-rs/tools/README.md` so the crate boundary reflects the
newly extracted tool-spec wrapper and serialization helper
## Test plan
- `cargo test -p codex-tools`
- `CARGO_TARGET_DIR=/tmp/codex-core-configured-spec cargo test -p
codex-core --lib tools::spec::`
- `CARGO_TARGET_DIR=/tmp/codex-core-configured-spec cargo test -p
codex-core --lib client::`
- `just fix -p codex-tools -p codex-core`
- `just argument-comment-lint`
## References
- #15923
- #15928
- #15944
- #15953
- #16031
- #16047
## Why
This continues the `codex-tools` migration by moving another passive
tool-definition layer out of `codex-core`.
After `ResponsesApiTool` and the lower-level schema adapters moved into
`codex-tools`, `core/src/client_common.rs` was still owning `ToolSpec`
and the web-search request wire types even though they are serialized
data models rather than runtime orchestration. Keeping those types in
`codex-core` makes the crate boundary look smaller than it really is and
leaves non-runtime tool-shape code coupled to core.
## What changed
- moved `ToolSpec`, `ResponsesApiWebSearchFilters`, and
`ResponsesApiWebSearchUserLocation` into
`codex-rs/tools/src/tool_spec.rs`
- added focused unit tests in `codex-rs/tools/src/tool_spec_tests.rs`
for:
- `ToolSpec::name()`
- web-search config conversions
- `ToolSpec` serialization for `web_search` and `tool_search`
- kept `codex-rs/tools/src/lib.rs` exports-only by re-exporting the new
module from `lib.rs`
- reduced `core/src/client_common.rs` to a compatibility shim that
re-exports the extracted tool-spec types for current core call sites
- updated `core/src/tools/spec_tests.rs` to consume the extracted
web-search types directly from `codex-tools`
- updated `codex-rs/tools/README.md` so the crate contract reflects that
`codex-tools` now owns the passive tool-spec request models in addition
to the lower-level Responses API structs
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
- `cargo test -p codex-core --lib client_common::`
- `just fix -p codex-tools -p codex-core`
- `just argument-comment-lint`
## References
- #15923
- #15928
- #15944
- #15953
- #16031
## Summary
- remove protocol and core support for discovering and listing custom
prompts
- simplify the TUI slash-command flow and command popup to built-in
commands only
- delete obsolete custom prompt tests, helpers, and docs references
- clean up downstream event handling for the removed protocol events
## Why
`argument-comment-lint` had become a PR bottleneck because the repo-wide
lane was still effectively running a `cargo dylint`-style flow across
the workspace instead of reusing Bazel's Rust dependency graph. That
kept the lint enforced, but it threw away the main benefit of moving
this job under Bazel in the first place: metadata reuse and cacheable
per-target analysis in the same shape as Clippy.
This change moves the repo-wide lint onto a native Bazel Rust aspect so
Linux and macOS can lint `codex-rs` without rebuilding the world
crate-by-crate through the wrapper path.
## What Changed
- add a nightly Rust toolchain with `rustc-dev` for Bazel and a
dedicated crate-universe repo for `tools/argument-comment-lint`
- add `tools/argument-comment-lint/driver.rs` and
`tools/argument-comment-lint/lint_aspect.bzl` so Bazel can run the lint
as a custom `rustc_driver`
- switch repo-wide `just argument-comment-lint` and the Linux/macOS
`rust-ci` lanes to `bazel build --config=argument-comment-lint
//codex-rs/...`
- keep the Python/DotSlash wrappers as the package-scoped fallback path
and as the current Windows CI path
- gate the Dylint entrypoint behind a `bazel_native` feature so the
Bazel-native library avoids the `dylint_*` packaging stack
- update the aspect runtime environment so the driver can locate
`rustc_driver` correctly under remote execution
- keep the dedicated `tools/argument-comment-lint` package tests and
wrapper unit tests in CI so the source and packaged entrypoints remain
covered
## Verification
- `python3 -m unittest discover -s tools/argument-comment-lint -p
'test_*.py'`
- `cargo test` in `tools/argument-comment-lint`
- `bazel build
//tools/argument-comment-lint:argument-comment-lint-driver
--@rules_rust//rust/toolchain/channel=nightly`
- `bazel build --config=argument-comment-lint
//codex-rs/utils/path-utils:all`
- `bazel build --config=argument-comment-lint
//codex-rs/rollout:rollout`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16106).
* #16120
* __->__ #16106
This is a follow-up to https://github.com/openai/codex/pull/15922. That
previous PR deleted the old `tui` directory and left the new
`tui_app_server` directory in place. This PR renames `tui_app_server` to
`tui` and fixes up all references.
## Summary
- Moves status surface refresh (`refresh_status_surfaces` /
`refresh_status_line`) from `App` event handlers into `ChatWidget`
setters via a new `refresh_model_dependent_surfaces()` method
- Ensures model-dependent UI stays in sync whenever collaboration mode,
model, or reasoning effort changes, including the footer and terminal
title in both `tui` and `tui_app_server`
- Applies the fix to both `tui` and `tui_app_server` widgets
#15961
## Test plan
- [x] Added snapshot test
`status_line_model_with_reasoning_plan_mode_footer` verifying footer
renders correctly in plan mode
- [x] Added
`terminal_title_model_updates_on_model_change_without_manual_refresh` in
`tui_app_server`
- [ ] Verify switching collaboration modes updates the footer in real
TUI
- [ ] Verify model/reasoning effort changes reflect in the status bar
and terminal title
---------
Co-authored-by: Eric Traut <etraut@openai.com>
## Why
The initial `argument-comment-lint` rollout left Windows on
default-target coverage because there were still Windows-only callsites
failing under `--all-targets`. This follow-up cleans up those remaining
Windows-specific violations so the Windows CI lane can enforce the same
stricter coverage, leaving Linux as the remaining platform-specific
follow-up.
## What changed
- switched the Windows `rust-ci` argument-comment-lint step back to the
default wrapper invocation so it runs full-target coverage again
- added the required `/*param_name*/` annotations at Windows-gated
literal callsites in:
- `codex-rs/windows-sandbox-rs/src/lib.rs`
- `codex-rs/windows-sandbox-rs/src/elevated_impl.rs`
- `codex-rs/tui_app_server/src/multi_agents.rs`
- `codex-rs/network-proxy/src/proxy.rs`
## Validation
- Windows `argument comment lint` CI on this PR
## Why
`//codex-rs/shell-command:shell-command-unit-tests` became a real
bottleneck in the Windows Bazel lane because repeated calls to
`is_safe_command_windows()` were starting a fresh PowerShell parser
process for every `powershell.exe -Command ...` assertion.
PR #16056 was motivated by that same bottleneck, but its test-only
shortcut was the wrong layer to optimize because it weakened the
end-to-end guarantee that our runtime path really asks PowerShell to
parse the command the way we expect.
This PR attacks the actual cost center instead: it keeps the real
PowerShell parser in the loop, but turns that parser into a long-lived
helper process so both tests and the runtime safe-command path can reuse
it across many requests.
## What Changed
- add `shell-command/src/command_safety/powershell_parser.rs`, which
keeps one mutex-protected parser process per PowerShell executable path
and speaks a simple JSON-over-stdio request/response protocol
- turn `shell-command/src/command_safety/powershell_parser.ps1` into a
long-running parser server with comments explaining the protocol, the
AST-shape restrictions, and why unsupported constructs are rejected
conservatively
- keep request ids and a one-time respawn path so a dead or
desynchronized cached child fails closed instead of silently returning
mixed parser output
- preserve separate parser processes for `powershell.exe` and
`pwsh.exe`, since they do not accept the same language surface
- avoid a direct `PipelineChainAst` type reference in the PowerShell
script so the parser service still runs under Windows PowerShell 5.1 as
well as newer `pwsh`
- make `shell-command/src/command_safety/windows_safe_commands.rs`
delegate to the new parser utility instead of spawning a fresh
PowerShell process for every parse
- add a Windows-only unit test that exercises multiple sequential
requests against the same parser process
## Testing
- adds a Windows-only parser-reuse unit test in `powershell_parser.rs`
- the main end-to-end verification for this change is the Windows CI
lane, because the new service depends on real `powershell.exe` /
`pwsh.exe` behavior
# Summary
Claude Code supports a useful prompt-plus-stdin workflow:
```bash
echo "complex input..." | claude -p "summarize concisely"
```
Codex previously did not support the equivalent `codex exec` form. While
`codex exec` could read the prompt from stdin, it could not combine
piped input with an explicit prompt argument.
This change adds that missing workflow:
```bash
echo "complex input..." | codex exec "summarize concisely"
```
With this change, when `codex exec` receives both a positional prompt
and piped stdin, the prompt remains the instruction and stdin is passed
along as structured `<stdin>...</stdin>` context.
Example:
```bash
curl https://jsonplaceholder.typicode.com/comments \
| ./target/debug/codex exec --skip-git-repo-check "format the top 20 items into a markdown table" \
> table.md
```
This PR also adds regression coverage for:
- prompt argument + piped stdin
- legacy stdin-as-prompt behavior
- `codex exec -` forced-stdin behavior
- empty-stdin error cases
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.
## What changed
- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`
That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.
## Validation
- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`
## Follow-up
- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.
Addresses #15992
The app-server TUI was treating tracked agent threads as closed based on
listener-task bookkeeping that does not reflect live thread state during
normal thread switching. That caused the `/agent` picker to gray out
live agents and could show a false "Agent thread ... is closed" replay
message after switching branches.
This PR fixes the picker refresh path to query the app server for each
tracked thread and derive closed vs loaded state from `thread/read`
status, while preserving cached agent metadata for replay-only threads.
Addresses #16049
`codex resume <name>` and `/resume <name>` could fail in the app-server
TUI path because name lookup pre-filtered `thread/list` with the backend
`search_term`, but saved thread names are hydrated after listing and are
not part of that search index. Resolve names by scanning listed threads
client-side instead, and add a regression test for saved sessions whose
rollout title does not match the thread name.
This is the part 1 of 2 PRs that will delete the `tui` /
`tui_app_server` split. This part simply deletes the existing `tui`
directory and marks the `tui_app_server` feature flag as removed. I left
the `tui_app_server` feature flag in place for now so its presence
doesn't result in an error. It is simply ignored.
Part 2 will rename the `tui_app_server` directory `tui`. I did this as
two parts to reduce visible code churn.
apply_patch sometimes provides additional parent dir as a writable root
when it is already writable. This is mostly a no-op on Mac/Linux but
causes actual ACL churn on Windows that is best avoided. We are also
seeing some actual failures with these ACLs in the wild, which I haven't
fully tracked down, but it's safe/best to avoid doing it altogether.
- [x] Auto / unspecified approval mode: read-only tools now skip before
guardian routing.
- [x] Approve / always-allow mode: read-only tools still skip, now via
the shared early return.
- [x] Prompt mode: read-only tools no longer skip; they continue to
approval.
Addresses #16019
`tui_app_server` renders completed assistant messages from item
notifications, but it only updated `/copy` state from `turn/completed`.
After the app-server migration, turn completion no longer repeats the
final assistant text, so `/copy` could stay unavailable even after the
first normal response.
This PR track the last completed final-answer agent message during an
active app-server turn and promote it into the `/copy` cache when the
turn completes. This restores the pre-migration behavior without
changing rollback handling.
Addresses #15984
HookStarted/HookCompleted notifications were being translated through a
fragile JSON bridge, so hook status/output never reached the renderer.
Early hook notifications could also be dropped during session refresh
before replay.
This PR fixes `tui_app_server` by mapping app-server hook notifications
into TUI hook events explicitly and preserving buffered hook
notifications across refresh, so cold-start and resumed sessions render
the same hook UI as the legacy TUI.
## Why
The previous extraction steps moved shared tool-schema parsing into
`codex-tools`, but `codex-core` still owned the generic Responses API
tool models and the last adapter layer that turned parsed tool
definitions into `ResponsesApiTool` values.
That left `core/src/tools/spec.rs` and `core/src/client_common.rs`
holding a chunk of tool-shaping code that does not need session state,
runtime plumbing, or any other `codex-core`-specific dependency. As a
result, `codex-tools` owned the parsed tool definition, but `codex-core`
still owned the generic wire model that those definitions are converted
into.
This change moves that boundary one step further. `codex-tools` now owns
the reusable Responses/tool wire structs and the shared conversion
helpers for dynamic tools, MCP tools, and deferred MCP aliases.
`codex-core` continues to own `ToolSpec` orchestration and the remaining
web-search-specific request shapes.
## What changed
- added `tools/src/responses_api.rs` to own `ResponsesApiTool`,
`FreeformTool`, `ToolSearchOutputTool`, namespace output types, and the
shared `ToolDefinition -> ResponsesApiTool` adapter helpers
- added `tools/src/responses_api_tests.rs` for deferred-loading
behavior, adapter coverage, and namespace serialization coverage
- rewired `core/src/tools/spec.rs` to use the extracted dynamic/MCP
adapter helpers instead of defining those conversions locally
- rewired `core/src/tools/handlers/tool_search.rs` to use the extracted
deferred MCP adapter and namespace output types directly
- slimmed `core/src/client_common.rs` so it now keeps `ToolSpec` and the
web-search-specific wire types, while reusing the extracted tool models
from `codex-tools`
- moved the extracted seam tests out of `core` and updated
`codex-rs/tools/README.md` plus `tools/src/lib.rs` to reflect the
expanded `codex-tools` boundary
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
- `cargo test -p codex-core --lib tools::handlers::tool_search::`
- `just fix -p codex-tools -p codex-core`
- `just argument-comment-lint`
## References
- [#15923](https://github.com/openai/codex/pull/15923) `codex-tools:
extract shared tool schema parsing`
- [#15928](https://github.com/openai/codex/pull/15928) `codex-tools:
extract MCP schema adapters`
- [#15944](https://github.com/openai/codex/pull/15944) `codex-tools:
extract dynamic tool adapters`
- [#15953](https://github.com/openai/codex/pull/15953) `codex-tools:
introduce named tool definitions`
## Summary
- add `self_serve_business_usage_based` and `enterprise_cbp_usage_based`
to the public/internal plan enums and regenerate the app-server + Python
SDK artifacts
- map both plans through JWT login and backend rate-limit payloads, then
bucket them with the existing Team/Business entitlement behavior in
cloud requirements, usage-limit copy, tooltips, and status display
- keep the earlier display-label remap commit on this branch so the new
Team-like and Business-like plans render consistently in the UI
## Testing
- `just write-app-server-schema`
- `uv run --project sdk/python python
sdk/python/scripts/update_sdk_artifacts.py generate-types`
- `just fix -p codex-protocol -p codex-login -p codex-core -p
codex-backend-client -p codex-cloud-requirements -p codex-tui -p
codex-tui-app-server -p codex-backend-openapi-models`
- `just fmt`
- `just argument-comment-lint`
- `cargo test -p codex-protocol
usage_based_plan_types_use_expected_wire_names`
- `cargo test -p codex-login usage_based`
- `cargo test -p codex-backend-client usage_based`
- `cargo test -p codex-cloud-requirements usage_based`
- `cargo test -p codex-core usage_limit_reached_error_formats_`
- `cargo test -p codex-tui plan_type_display_name_remaps_display_labels`
- `cargo test -p codex-tui remapped`
- `cargo test -p codex-tui-app-server
plan_type_display_name_remaps_display_labels`
- `cargo test -p codex-tui-app-server remapped`
- `cargo test -p codex-tui-app-server
preserves_usage_based_plan_type_wire_name`
## Notes
- a broader multi-crate `cargo test` run still hits unrelated existing
guardian-approval config failures in
`codex-rs/core/src/config/config_tests.rs`
1. Keep curated plugin staging directories under TempDir ownership until
activation succeeds, so failed git/HTTP sync attempts do not leak
plugins-clone-*.
2. Best-effort clean up stale plugins-clone-* directories before
creating a new staged repo, using a conservative age threshold.
3. Emit OTEL counters for curated plugin startup sync transport attempts
and final outcome across git and HTTP paths.
#15999 introduced a Windows-only `\r\n` mismatch in review-exit template
handling. This PR normalizes those template newlines and separates that
fix from [#16014](https://github.com/openai/codex/pull/16014) so it can
be reviewed independently.
## Why
This continues the `codex-tools` migration by moving one more piece of
generic tool-definition bookkeeping out of `codex-core`.
The earlier extraction steps moved shared schema parsing into
`codex-tools`, but `core/src/tools/spec.rs` still had to supply tool
names separately and perform ad hoc rewrites for deferred MCP aliases.
That meant the crate boundary was still awkward: the parsed shape coming
back from `codex-tools` was missing part of the definition that
`codex-core` ultimately needs to assemble a `ResponsesApiTool`.
This change introduces a named `ToolDefinition` in `codex-tools` so both
MCP tools and dynamic tools cross the crate boundary in the same
reusable model. `codex-core` still owns the final `ResponsesApiTool`
assembly, but less of the generic tool-definition shaping logic stays
behind in `core`.
## What changed
- replaced `ParsedToolDefinition` with a named `ToolDefinition` in
`codex-rs/tools/src/tool_definition.rs`
- added `codex-rs/tools/src/tool_definition_tests.rs` for `renamed()`
and `into_deferred()`
- updated `parse_dynamic_tool()` and `parse_mcp_tool()` to return
`ToolDefinition`
- simplified `codex-rs/core/src/tools/spec.rs` so it adapts
`ToolDefinition` into `ResponsesApiTool` instead of rewriting names and
deferred fields inline
- updated parser tests and `codex-rs/tools/README.md` to reflect the
named tool-definition model
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
## Why
`bazel.yml` already builds and tests the Bazel graph, but `rust-ci.yml`
still runs `cargo clippy` separately. This PR starts the transition to a
Bazel-backed lint lane for `codex-rs` so we can eventually replace the
duplicate Rust build, test, and lint work with Bazel while explicitly
keeping the V8 Bazel path out of scope for now.
To make that lane practical, the workflow also needs to look like the
Bazel job we already trust. That means sharing the common Bazel setup
and invocation logic instead of hand-copying it, and covering the arm64
macOS path in addition to Linux.
Landing the workflow green also required fixing the first lint findings
that Bazel surfaced and adding the matching local entrypoint.
## What changed
- add a reusable `build:clippy` config to `.bazelrc` and export
`codex-rs/clippy.toml` from `codex-rs/BUILD.bazel` so Bazel can run the
repository's existing Clippy policy
- add `just bazel-clippy` so the local developer entrypoint matches the
new CI lane
- extend `.github/workflows/bazel.yml` with a dedicated Bazel clippy job
for `codex-rs`, scoped to `//codex-rs/... -//codex-rs/v8-poc:all`
- run that clippy job on Linux x64 and arm64 macOS
- factor the shared Bazel workflow setup into
`.github/actions/setup-bazel-ci/action.yml` and the shared Bazel
invocation logic into `.github/scripts/run-bazel-ci.sh` so the clippy
and build/test jobs stay aligned
- fix the first Bazel-clippy findings needed to keep the lane green,
including the cross-target `cmsghdr::cmsg_len` normalization in
`codex-rs/shell-escalation/src/unix/socket.rs` and the no-`voice-input`
dead-code warnings in `codex-rs/tui` and `codex-rs/tui_app_server`
## Verification
- `just bazel-clippy`
- `RUNNER_OS=macOS ./.github/scripts/run-bazel-ci.sh -- build
--config=clippy --build_metadata=COMMIT_SHA=local-check
--build_metadata=TAG_job=clippy -- //codex-rs/...
-//codex-rs/v8-poc:all`
- `bazel build --config=clippy
//codex-rs/shell-escalation:shell-escalation`
- `CARGO_TARGET_DIR=/tmp/codex4-shell-escalation-test cargo test -p
codex-shell-escalation`
- `ruby -e 'require "yaml";
YAML.load_file(".github/workflows/bazel.yml");
YAML.load_file(".github/actions/setup-bazel-ci/action.yml")'`
## Notes
- `CARGO_TARGET_DIR=/tmp/codex4-tui-app-server-test cargo test -p
codex-tui-app-server` still hits existing guardian-approvals test and
snapshot failures unrelated to this PR's Bazel-clippy changes.
Related: #15954
## Why
`codex-tools` already owned the shared JSON schema parser and the MCP
tool schema adapter, but `core/src/tools/spec.rs` still parsed dynamic
tools directly.
That left the tool-schema boundary split in two different ways:
- MCP tools flowed through `codex-tools`, while dynamic tools were still
parsed in `codex-core`
- the extracted dynamic-tool path initially introduced a
dynamic-specific parsed shape even though `codex-tools` already had very
similar MCP adapter output
This change finishes that extraction boundary in one step. `codex-core`
still owns `ResponsesApiTool` assembly, but both MCP tools and dynamic
tools now enter that layer through `codex-tools` using the same parsed
tool-definition shape.
## What changed
- added `tools/src/dynamic_tool.rs` and sibling
`tools/src/dynamic_tool_tests.rs`
- introduced `parse_dynamic_tool()` in `codex-tools` and switched
`core/src/tools/spec.rs` to use it for dynamic tools
- added `tools/src/parsed_tool_definition.rs` so both MCP and dynamic
adapters return the same `ParsedToolDefinition`
- updated `core/src/tools/spec.rs` to build `ResponsesApiTool` through a
shared local adapter helper instead of separate MCP and dynamic assembly
paths
- expanded `core/src/tools/spec_tests.rs` so the dynamic-tool adapter
test asserts the full converted `ResponsesApiTool`, including
`defer_loading`
- updated `codex-rs/tools/README.md` to reflect the shared parsed
tool-definition boundary
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/15944).
* #15953
* __->__ #15944
## Summary
- split the joined `PATH` before running system `bwrap` lookup
- keep the existing workspace-local `bwrap` skip behavior intact
- add regression tests that exercise real multi-entry search paths
## Why
The PATH-based lookup added in #15791 still wrapped the raw `PATH`
environment value as a single `PathBuf` before passing it through
`join_paths()`. On Unix, a normal multi-entry `PATH` contains `:`, so
that wrapper path is invalid as one path element and the lookup returns
`None`.
That made Codex behave as if no system `bwrap` was installed even when
`bwrap` was available on `PATH`, which is what users in #15340 were
still hitting on `0.117.0-alpha.25`.
## Impact
System `bwrap` discovery now works with normal multi-entry `PATH` values
instead of silently falling back to the vendored binary.
Fixes#15340.
## Validation
- `just fmt`
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox`
- `just fix -p codex-sandboxing`
- `just argument-comment-lint`
## Why
`codex-utils-pty` and `codex-windows-sandbox` were the remaining crates
in `codex-rs` that still overrode the workspace's Rust 2024 edition.
Moving them forward in a separate PR keeps the baseline edition update
isolated from the follow-on Bazel clippy workflow in #15955, while
making linting and formatting behavior consistent with the rest of the
workspace.
This PR also needs Cargo and Bazel to agree on the edition for
`codex-windows-sandbox`. Without the Bazel-side sync, the experimental
Bazel app-server builds fail once they compile `windows-sandbox-rs`.
## What changed
- switch `codex-rs/utils/pty` and `codex-rs/windows-sandbox-rs` to
`edition = "2024"`
- update `codex-utils-pty` callsites and tests to use the collapsed `if
let` form that Clippy expects under the new edition
- fix the Rust 2024 fallout in `windows-sandbox-rs`, including the
reserved `gen` identifier, `unsafe extern` requirements, and new Clippy
findings that surfaced under the edition bump
- keep the edition bump separate from a larger unsafe cleanup by
temporarily allowing `unsafe_op_in_unsafe_fn` in the Windows entrypoint
modules that now report it under Rust 2024
- update `codex-rs/windows-sandbox-rs/BUILD.bazel` to `crate_edition =
"2024"` so Bazel compiles the crate with the same edition as Cargo
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/15954).
* #15976
* #15955
* __->__ #15954
## Problem
App-server clients could only initiate ChatGPT login through the browser
callback flow, even though the shared login crate already supports
device-code auth. That left VS Code, Codex App, and other app-server
clients without a first-class way to use the existing device-code
backend when browser redirects are brittle or when the client UX wants
to own the login ceremony.
## Mental model
This change adds a second ChatGPT login start path to app-server:
clients can now call `account/login/start` with `type:
"chatgptDeviceCode"`. App-server immediately returns a `loginId` plus
the device-code UX payload (`verificationUrl` and `userCode`), then
completes the login asynchronously in the background using the existing
`codex_login` polling flow. Successful device-code login still resolves
to ordinary `chatgpt` auth, and completion continues to flow through the
existing `account/login/completed` and `account/updated` notifications.
## Non-goals
This does not introduce a new auth mode, a new account shape, or a
device-code eligibility discovery API. It also does not add automatic
fallback to browser login in core; clients remain responsible for
choosing when to request device code and whether to retry with a
different UX if the backend/admin policy rejects it.
## Tradeoffs
We intentionally keep `login_chatgpt_common` as a local validation
helper instead of turning it into a capability probe. Device-code
eligibility is checked by actually calling `request_device_code`, which
means policy-disabled cases surface as an immediate request error rather
than an async completion event. We also keep the active-login state
machine minimal: browser and device-code logins share the same public
cancel contract, but device-code cancellation is implemented with a
local cancel token rather than a larger cross-crate refactor.
## Architecture
The protocol grows a new `chatgptDeviceCode` request/response variant in
app-server v2. On the server side, the new handler reuses the existing
ChatGPT login precondition checks, calls `request_device_code`, returns
the device-code payload, and then spawns a background task that waits on
either cancellation or `complete_device_code_login`. On success, it
reuses the existing auth reload and cloud-requirements refresh path
before emitting `account/login/completed` success and `account/updated`.
On failure or cancellation, it emits only `account/login/completed`
failure. The existing `account/login/cancel { loginId }` contract
remains unchanged and now works for both browser and device-code
attempts.
## Tests
Added protocol serialization coverage for the new request/response
variant, plus app-server tests for device-code success, failure, cancel,
and start-time rejection behavior. Existing browser ChatGPT login
coverage remains in place to show that the callback-based flow is
unchanged.
## Summary
This PR replaces the legacy network allow/deny list model with explicit
rule maps for domains and unix sockets across managed requirements,
permissions profiles, the network proxy config, and the app server
protocol.
Concretely, it:
- introduces typed domain (`allow` / `deny`) and unix socket permission
(`allow` / `none`) entries instead of separate `allowed_domains`,
`denied_domains`, and `allow_unix_sockets` lists
- updates config loading, managed requirements merging, and exec-policy
overlays to read and upsert rule entries consistently
- exposes the new shape through protocol/schema outputs, debug surfaces,
and app-server config APIs
- rejects the legacy list-based keys and updates docs/tests to reflect
the new config format
## Why
The previous representation split related network policy across multiple
parallel lists, which made merging and overriding rules harder to reason
about. Moving to explicit keyed permission maps gives us a single source
of truth per host/socket entry, makes allow/deny precedence clearer, and
gives protocol consumers access to the full rule state instead of
derived projections only.
## Backward Compatibility
### Backward compatible
- Managed requirements still accept the legacy
`experimental_network.allowed_domains`,
`experimental_network.denied_domains`, and
`experimental_network.allow_unix_sockets` fields. They are normalized
into the new canonical `domains` and `unix_sockets` maps internally.
- App-server v2 still deserializes legacy `allowedDomains`,
`deniedDomains`, and `allowUnixSockets` payloads, so older clients can
continue reading managed network requirements.
- App-server v2 responses still populate `allowedDomains`,
`deniedDomains`, and `allowUnixSockets` as legacy compatibility views
derived from the canonical maps.
- `managed_allowed_domains_only` keeps the same behavior after
normalization. Legacy managed allowlists still participate in the same
enforcement path as canonical `domains` entries.
### Not backward compatible
- Permissions profiles under `[permissions.<profile>.network]` no longer
accept the legacy list-based keys. Those configs must use the canonical
`[domains]` and `[unix_sockets]` tables instead of `allowed_domains`,
`denied_domains`, or `allow_unix_sockets`.
- Managed `experimental_network` config cannot mix canonical and legacy
forms in the same block. For example, `domains` cannot be combined with
`allowed_domains` or `denied_domains`, and `unix_sockets` cannot be
combined with `allow_unix_sockets`.
- The canonical format can express explicit `"none"` entries for unix
sockets, but those entries do not round-trip through the legacy
compatibility fields because the legacy fields only represent allow/deny
lists.
## Testing
`/target/debug/codex sandbox macos --log-denials /bin/zsh -c 'curl
https://www.example.com' ` gives 200 with config
```
[permissions.workspace.network.domains]
"www.example.com" = "allow"
```
and fails when set to deny: `curl: (56) CONNECT tunnel failed, response
403`.
Also tested backward compatibility path by verifying that adding the
following to `/etc/codex/requirements.toml` works:
```
[experimental_network]
allowed_domains = ["www.example.com"]
```
- introduces `ClientResponse` as the symmetrical typed response union to
`ClientRequest` for app-server-protocol
- enables scalable event stream ingestion for use cases such as
analytics
- no runtime behavior changes, protocol/schema plumbing only
## Why
`codex-tools` already owns the shared tool input schema model and parser
from the first extraction step, but `core/src/tools/spec.rs` still owned
the MCP-specific adapter that normalizes `rmcp::model::Tool` schemas and
wraps `structuredContent` into the call result output schema.
Keeping that adapter in `codex-core` means the reusable MCP schema path
is still split across crates, and the unit tests for that logic stay
anchored in `codex-core` even though the runtime orchestration does not
need to move yet.
This change takes the next small step by moving the reusable MCP schema
adapter into `codex-tools` while leaving `ResponsesApiTool` assembly in
`codex-core`.
## What changed
- added `tools/src/mcp_tool.rs` and sibling
`tools/src/mcp_tool_tests.rs`
- introduced `ParsedMcpTool`, `parse_mcp_tool()`, and
`mcp_call_tool_result_output_schema()` in `codex-tools`
- updated `core/src/tools/spec.rs` to consume parsed MCP tool parts from
`codex-tools`
- removed the now-redundant MCP schema unit tests from
`core/src/tools/spec_tests.rs`
- expanded `codex-rs/tools/README.md` to describe this second migration
step
## Test plan
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
## Summary
This PR makes Windows sandbox proxying enforceable by routing proxy-only
runs through the existing `offline` sandbox user and reserving direct
network access for the existing `online` sandbox user.
In brief:
- if a Windows sandbox run should be proxy-enforced, we run it as the
`offline` user
- the `offline` user gets firewall rules that block direct outbound
traffic and only permit the configured localhost proxy path
- if a Windows sandbox run should have true direct network access, we
run it as the `online` user
- no new sandbox identity is introduced
This brings Windows in line with the intended model: proxy use is not
just env-based, it is backed by OS-level egress controls. Windows
already has two sandbox identities:
- `offline`: intended to have no direct network egress
- `online`: intended to have full network access
This PR makes proxy-enforced runs use that model directly.
### Proxy-enforced runs
When proxy enforcement is active:
- the run is assigned to the `offline` identity
- setup extracts the loopback proxy ports from the sandbox env
- Windows setup programs firewall rules for the `offline` user that:
- block all non-loopback outbound traffic
- block loopback UDP
- block loopback TCP except for the configured proxy ports
- optionally allow broader localhost access when `allow_local_binding=1`
So the sandboxed process can only talk to the local proxy. It cannot
open direct outbound sockets or do local UDP-based DNS on its own.The
proxy then performs the real outbound network access outside that
restricted sandbox identity.
### Direct-network runs
When proxy enforcement is not active and full network access is allowed:
- the run is assigned to the `online` identity
- no proxy-only firewall restrictions are applied
- the process gets normal direct network access
### Unelevated vs elevated
The restricted-token / unelevated path cannot enforce per-identity
firewall policy by itself.
So for Windows proxy-enforced runs, we transparently use the logon-user
sandbox path under the hood, even if the caller started from the
unelevated mode. That keeps enforcement real instead of best-effort.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`PermissionProfile` should only describe the per-command permissions we
still want to grant dynamically. Keeping
`MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
approval, protocol, schema, and TUI branches for a capability we no
longer want to expose.
## What changed
- Removed the macOS-specific permission-profile types from
`codex-protocol`, the app-server v2 API, and the generated
schema/TypeScript artifacts.
- Deleted the core and sandboxing plumbing that threaded
`MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
construction.
- Simplified macOS seatbelt generation so it always includes the fixed
read-only preferences allowlist instead of carrying a configurable
profile extension.
- Removed the macOS additional-permissions UI/docs/test coverage and
deleted the obsolete macOS permission modules.
- Tightened `request_permissions` intersection handling so explicitly
empty requested read lists are preserved only when that field was
actually granted, avoiding zero-grant responses being stored as active
permissions.
## Why
`parse_tool_input_schema` and the supporting `JsonSchema` model were
living in `core/src/tools/spec.rs`, but they already serve callers
outside `codex-core`.
Keeping that shared schema parsing logic inside `codex-core` makes the
crate boundary harder to reason about and works against the guidance in
`AGENTS.md` to avoid growing `codex-core` when reusable code can live
elsewhere.
This change takes the first extraction step by moving the schema parsing
primitive into its own crate while keeping the rest of the tool-spec
assembly in `codex-core`.
## What changed
- added a new `codex-tools` crate under `codex-rs/tools`
- moved the shared tool input schema model and sanitizer/parser into
`tools/src/json_schema.rs`
- kept `tools/src/lib.rs` exports-only, with the module-level unit tests
split into `json_schema_tests.rs`
- updated `codex-core` to use `codex-tools::JsonSchema` and re-export
`parse_tool_input_schema`
- updated `codex-app-server` dynamic tool validation to depend on
`codex-tools` directly instead of reaching through `codex-core`
- wired the new crate into the Cargo workspace and Bazel build graph
## Summary
Fail closed when the network proxy's local/private IP pre-check hits a
DNS lookup error or timeout, instead of treating the hostname as public
and allowing the request.
## Root cause
`host_resolves_to_non_public_ip()` returned `false` on resolver failure,
which created a fail-open path in the `allow_local_binding = false`
boundary. The eventual connect path performs its own DNS resolution
later, so a transient pre-check failure is not evidence that the
destination is public.
## Changes
- Treat DNS lookup errors/timeouts as local/private for blocking
purposes
- Add a regression test for an allowlisted hostname that fails DNS
resolution
## Validation
- `cargo test -p codex-network-proxy`
- `cargo clippy -p codex-network-proxy --all-targets -- -D warnings`
- `just fmt`
- `just argument-comment-lint`
## Why
This is effectively a follow-up to
[#15812](https://github.com/openai/codex/pull/15812). That change
removed the special skill-script exec path, but `skill_metadata` was
still being threaded through command-approval payloads even though the
approval flow no longer uses it to render prompts or resolve decisions.
Keeping it around added extra protocol, schema, and client surface area
without changing behavior.
Removing it keeps the command-approval contract smaller and avoids
carrying a dead field through app-server, TUI, and MCP boundaries.
## What changed
- removed `ExecApprovalRequestSkillMetadata` and the corresponding
`skillMetadata` field from core approval events and the v2 app-server
protocol
- removed the generated JSON and TypeScript schema output for that field
- updated app-server, MCP server, TUI, and TUI app-server approval
plumbing to stop forwarding the field
- cleaned up tests that previously constructed or asserted
`skillMetadata`
## Testing
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-test-client`
- `cargo test -p codex-mcp-server`
- `just argument-comment-lint`
## Summary
- move the bwrap PATH lookup and warning helpers out of config/mod.rs
- move the related tests into a dedicated bwrap_tests.rs file
## Validation
- git diff --check
- skipped heavier local tests per request
Follow-up to #15791.
## Why
`SandboxCommand.program` represents an executable path, but keeping it
as `String` forced path-backed callers to run `to_string_lossy()` before
the sandbox layer ever touched the command. That loses fidelity earlier
than necessary and adds avoidable conversions in runtimes that already
have a `PathBuf`.
## What changed
- Changed `SandboxCommand.program` to `OsString`.
- Updated `SandboxManager::transform` to keep the program and argv in
`OsString` form until the `SandboxExecRequest` conversion boundary.
- Switched the path-backed `apply_patch` and `js_repl` runtimes to pass
`into_os_string()` instead of `to_string_lossy()`.
- Updated the remaining string-backed builders and tests to match the
new type while preserving the existing Linux helper `arg0` behavior.
## Verification
- `cargo test -p codex-sandboxing`
- `just argument-comment-lint -p codex-core -p codex-sandboxing`
- `cargo test -p codex-core` currently fails in unrelated existing
config tests: `config::tests::approvals_reviewer_*` and
`config::tests::smart_approvals_alias_*`