CXC-410 Emit Env Var Status with `/feedback` report
Add more observability on top of #14611
[Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)
[Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
<img width="1063" height="610" alt="image"
src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
/>
###### Summary
- Adds auth-env telemetry that records whether key auth-related env
overrides were present on session start and request paths.
- Threads those auth-env fields through `/responses`, websocket, and
`/models` telemetry and feedback metadata.
- Buckets custom provider `env_key` configuration to a safe
`"configured"` value instead of emitting raw config text.
- Keeps the slice observability-only: no raw token values or raw URLs
are emitted.
###### Rationale (from spec findings)
- 401 and auth-path debugging needs a way to distinguish env-driven auth
paths from sessions with no auth env override.
- Startup and model-refresh failures need the same auth-env diagnostics
as normal request failures.
- Feedback and Sentry tags need the same auth-env signal as OTel events
so reports can be triaged consistently.
- Custom provider config is user-controlled text, so the telemetry
contract must stay presence-only / bucketed.
###### Scope
- Adds a small `AuthEnvTelemetry` bundle for env presence collection and
threads it through the main request/session telemetry paths.
- Does not add endpoint/base-url/provider-header/geo routing attribution
or broader telemetry API redesign.
###### Trade-offs
- `provider_env_key_name` is bucketed to `"configured"` instead of
preserving the literal configured env var name.
- `/models` is included because startup/model-refresh auth failures need
the same diagnostics, but broader parity work remains out of scope.
- This slice keeps the existing telemetry APIs and layers auth-env
fields onto them rather than redesigning the metadata model.
###### Client follow-up
- Add the separate endpoint/base-url attribution slice if routing-source
diagnosis is still needed.
- Add provider-header or residency attribution only if auth-env presence
proves insufficient in real reports.
- Revisit whether any additional auth-related env inputs need safe
bucketing after more 401 triage data.
###### Testing
- `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
- `cargo test -p codex-core
collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
- `cargo test -p codex-core
models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_api_request_auth_observability --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_connect_auth_observability
-- --nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_request_transport_observability
-- --nocapture`
- `cargo test -p codex-core --no-run --message-format short`
- `cargo test -p codex-otel --no-run --message-format short`
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`
Testing
- Not run (not requested)
## What is flaky
The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
CI even though the approval logic is unchanged, because the test
delegates the file write and readback to shell parsing instead of
deterministic file I/O.
## Why it was flaky
The test generated a command shaped like `printf ... > file && cat
file`. That means the scenario depended on shell quoting, redirection,
newline handling, and encoding behavior in addition to the approval
system it was actually trying to validate. If the shell interpreted the
payload differently, the test would report an approval failure even
though the product logic was fine.
That also made failures hard to diagnose, because the test did not log
the exact generated command or the parsed result payload.
## How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
`python3 -c` script that writes the file with `Path.write_text(...,
encoding='utf-8')` and then reads it back with the same UTF-8 path. It
also logs the generated command and the resulting exit code/stdout for
the approval scenario so any future failure is directly attributable.
## Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection
semantics. The file contents are produced and read through explicit
UTF-8 file I/O, so the approval test is measuring approval behavior
instead of shell behavior. The added diagnostics mean a future failure
will show the exact command/result pair instead of looking like a
generic intermittent mismatch.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:
- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`
## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.
There were multiple independent sources of nondeterminism in that setup:
- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.
So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.
## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.
## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.
---------
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
It now supports:
- Connectors that are from installed and enabled plugins that are not
installed yet
- Plugins that are on the allowlist that are not installed yet.
### Summary
The goal is for us to get the latest turn model and reasoning effort on
thread/resume is no override is provided on the thread/resume func call.
This is the part 1 which we write the model and reasoning effort for a
thread to the sqlite db and there will be a followup PR to consume the
two new fields on thread/resume.
[part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
and this one can be merged independently.
## Description
This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.
Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back
## Why
The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.
## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
## Summary
- make `report_agent_job_result` atomically transition an item from
running to completed while storing `result_json`
- remove brittle finalization grace-sleep logic and make finished-item
cleanup idempotent
- replace blind fixed-interval waiting with status-subscription-based
waiting for active worker threads
- add state runtime tests for atomic completion and late-report
rejection
## Why
This addresses the race and polling concerns in #13948 by removing
timing-based correctness assumptions and reducing unnecessary status
polling churn.
## Validation
- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs`
- `cd codex-rs && cargo test`
- fails in an unrelated app-server tracing test:
`message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children`
timed out waiting for response
## Notes
- This PR supersedes #14129 with the same agent-jobs fix on a clean
branch from `main`.
- The earlier PR branch was stacked on unrelated history, which made the
review diff include unrelated commits.
Fixes#13948
## Problem
On Linux, Codex can be launched from a workspace path that is a symlink
(for example, a symlinked checkout or a symlinked parent directory).
Our sandbox policy intentionally canonicalizes writable/readable roots
to the real filesystem path before building the bubblewrap mounts. That
part is correct and needed for safety.
The remaining bug was that bubblewrap could still inherit the helper
process's logical cwd, which might be the symlinked alias instead of the
mounted canonical path. In that case, the sandbox starts in a cwd that
does not exist inside the sandbox namespace even though the real
workspace is mounted. This can cause sandboxed commands to fail in
symlinked workspaces.
## Fix
This PR keeps the sandbox policy behavior the same, but separates two
concepts that were previously conflated:
- the canonical cwd used to define sandbox mounts and permissions
- the caller's logical cwd used when launching the command
On the Linux bubblewrap path, we now thread the logical command cwd
through the helper explicitly and only add `--chdir <canonical path>`
when the logical cwd differs from the mounted canonical path.
That means:
- permissions are still computed from canonical paths
- bubblewrap starts the command from a cwd that definitely exists inside
the sandbox
- we do not widen filesystem access or undo the earlier symlink
hardening
## Why This Is Safe
This is a narrow Linux-only launch fix, not a policy change.
- Writable/readable root canonicalization stays intact.
- Protected metadata carveouts still operate on canonical roots.
- We only override bubblewrap's inherited cwd when the logical path
would otherwise point at a symlink alias that is not mounted in the
sandbox.
## Tests
- kept the existing protocol/core regression coverage for symlink
canonicalization
- added regression coverage for symlinked cwd handling in the Linux
bubblewrap builder/helper path
Local validation:
- `just fmt`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
normalize_additional_permissions_canonicalizes_symlinked_write_paths`
- `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core
--tests -- -D warnings`
- `cargo build --bin codex`
## Context
This is related to #14694. The earlier writable-root symlink fix
addressed the mount/permission side; this PR fixes the remaining
symlinked-cwd launch mismatch in the Linux sandbox path.
## Stack Position
3/4. Top-of-stack sibling built on #14830.
## Base
- #14830
## Sibling
- #14827
## Scope
- Extend the realtime startup context with a bounded summary of the
latest thread turns for continuity.
---------
Co-authored-by: Codex <noreply@openai.com>
This change adds Jason to codex-core's built-in subagent nickname pool
so spawned agents can pick it without any custom role configuration. The
default list was simply missing that predefined name (a grave mistake).
## Stack Position
2/4. Built on top of #14828.
## Base
- #14828
## Unblocks
- #14829
- #14827
## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.
---------
Co-authored-by: Codex <noreply@openai.com>
- Added forceRemoteSync to plugin/install and plugin/uninstall.
- With forceRemoteSync=true, we update the remote plugin status first,
then apply the local change only if the backend call succeeds.
- Kept plugin/list(forceRemoteSync=true) as the main recon path, and for
now it treats remote enabled=false as uninstall. We
will eventually migrate to plugin/installed for more precise state
handling.
## Why
Once the repo-local lint exists, `codex-rs` needs to follow the
checked-in convention and CI needs to keep it from drifting. This commit
applies the fallback `/*param*/` style consistently across existing
positional literal call sites without changing those APIs.
The longer-term preference is still to avoid APIs that require comments
by choosing clearer parameter types and call shapes. This PR is
intentionally the mechanical follow-through for the places where the
existing signatures stay in place.
After rebasing onto newer `main`, the rollout also had to cover newly
introduced `tui_app_server` call sites. That made it clear the first cut
of the CI job was too expensive for the common path: it was spending
almost as much time installing `cargo-dylint` and re-testing the lint
crate as a representative test job spends running product tests. The CI
update keeps the full workspace enforcement but trims that extra
overhead from ordinary `codex-rs` PRs.
## What changed
- keep a dedicated `argument_comment_lint` job in `rust-ci`
- mechanically annotate remaining opaque positional literals across
`codex-rs` with exact `/*param*/` comments, including the rebased
`tui_app_server` call sites that now fall under the lint
- keep the checked-in style aligned with the lint policy by using
`/*param*/` and leaving string and char literals uncommented
- cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
registry/git metadata in the lint job
- split changed-path detection so the lint crate's own `cargo test` step
runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
- continue to run the repo wrapper over the `codex-rs` workspace, so
product-code enforcement is unchanged
Most of the code changes in this commit are intentionally mechanical
comment rewrites or insertions driven by the lint itself.
## Verification
- `./tools/argument-comment-lint/run.sh --workspace`
- `cargo test -p codex-tui-app-server -p codex-tui`
- parsed `.github/workflows/rust-ci.yml` locally with PyYAML
---
* -> #14652
* #14651
- **Summary**
- expose `exit` through the code mode bridge and module so scripts can
stop mid-flight
- surface the helper in the description documentation
- add a regression test ensuring `exit()` terminates execution cleanly
- **Testing**
- Not run (not requested)
###### Why/Context/Summary
- Exclude injected AGENTS.md instructions and standalone skill payloads
from memory stage 1 inputs so memory generation focuses on conversation
content instead of prompt scaffolding.
- Strip only the AGENTS fragment from mixed contextual user messages
during stage-1 serialization, which preserves environment context in the
same message.
- Keep subagent notifications in the memory input, and add focused unit
coverage for the fragment classifier, rollout policy, and stage-1
serialization path.
###### Test plan
- `just fmt`
- `cargo test -p codex-core --lib contextual_user_message`
- `cargo test -p codex-core --lib rollout::policy`
- `cargo test -p codex-core --lib memories::phase1`
This PR replicates the `tui` code directory and creates a temporary
parallel `tui_app_server` directory. It also implements a new feature
flag `tui_app_server` to select between the two tui implementations.
Once the new app-server-based TUI is stabilized, we'll delete the old
`tui` directory and feature flag.
Make `interrupted` an agent state and make it not final. As a result, a
`wait` won't return on an interrupted agent and no notification will be
send to the parent agent.
The rationals are:
* If a user interrupt a sub-agent for any reason, you don't want the
parent agent to instantaneously ask the sub-agent to restart
* If a parent agent interrupt a sub-agent, no need to add a noisy
notification in the parent agen
Fix https://github.com/openai/codex/issues/14161
This fixes sub-agent [[skills.config]] overrides being ignored when
parent and child share the same cwd. The root cause was that turn skill
loading rebuilt from cwd-only state and reused a cwd-scoped cache, so
role-local skill enable/disable overrides did not reliably affect the
spawned agent's effective skill set.
This change switches turn construction to use the effective per-turn
config and adds a config-aware skills cache keyed by skill roots plus
final disabled paths.
## Summary
- reuse a guardian subagent session across approvals so reviews keep a
stable prompt cache key and avoid one-shot startup overhead
- clear the guardian child history before each review so prior guardian
decisions do not leak into later approvals
- include the `smart_approvals` -> `guardian_approval` feature flag
rename in the same PR to minimize release latency on a very tight
timeline
- add regression coverage for prompt-cache-key reuse without
prior-review prompt bleed
## Request
- Bug/enhancement request: internal guardian prompt-cache and latency
improvement request
---------
Co-authored-by: Codex <noreply@openai.com>
### Motivation
- Interrupting a running turn (Ctrl+C / Esc) currently also terminates
long‑running background shells, which is surprising for workflows like
local dev servers or file watchers.
- The existing cleanup command name was confusing; callers expect an
explicit command to stop background terminals rather than a UI clear
action.
- Make background‑shell termination explicit and surface a clearer
command name while preserving backward compatibility.
### Description
- Renamed the background‑terminal cleanup slash command from `Clean`
(`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
command parsing/visibility layer, updated the user descriptions and
command popup wiring accordingly.
- Updated the unified‑exec footer text and snapshots to point to `/stop`
(and trimmed corresponding snapshot output to match the new label).
- Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
no longer closes or clears tracked unified exec / background terminal
processes in the TUI or core cleanup path; background shells are now
preserved after an interrupt.
- Updated protocol/docs to clarify that `turn/interrupt` (or
`Op::Interrupt`) interrupts the active turn but does not terminate
background terminals, and that `thread/backgroundTerminals/clean` is the
explicit API to stop those shells.
- Updated unit/integration tests and insta snapshots in the TUI and core
unified‑exec suites to reflect the new semantics and command name.
### Testing
- Ran formatting with `just fmt` in `codex-rs` (succeeded).
- Ran `cargo test -p codex-protocol` (succeeded).
- Attempted `cargo test -p codex-tui` but the build could not complete
in this environment due to a native build dependency that requires
`libcap` development headers (the `codex-linux-sandbox` vendored build
step); install `libcap-dev` / make `libcap.pc` available in
`PKG_CONFIG_PATH` to run the TUI test suite locally.
- Updated and accepted the affected `insta` snapshots for the TUI
changes so visual diffs reflect the new `/stop` wording and preserved
interrupt behavior.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
- [x] Bypass tool search and stuff tool specs directly into model
context when either a. Tool search is not available for the model or b.
There are not that many tools to search for.
CXC-392
[With
401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream)
<img width="1909" height="555" alt="401 auth tags in Sentry"
src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3"
/>
- auth_401_*: preserved facts from the latest unauthorized response snapshot
- auth_*: latest auth-related facts from the latest request attempt
- auth_recovery_*: unauthorized recovery state and follow-up result
Without 401
<img width="1917" height="522" alt="happy-path auth tags in Sentry"
src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f"
/>
###### Summary
- Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation.
- Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
- Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
- Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate.
###### Rationale (from spec findings)
- The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
- Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
- Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
- The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.
###### Scope
- Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing.
- Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.
###### Trade-offs
- This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
- This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
- Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.
###### Client follow-up
- PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing.
- A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
- `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path.
###### Testing
- `cargo test -p codex-core refresh_available_models_sorts_by_priority`
- `cargo test -p codex-core emit_feedback_request_tags_`
- `cargo test -p codex-core emit_feedback_auth_recovery_tags_`
- `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
- `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers`
- `cargo test -p codex-core identity_auth_details`
- `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details`
- `cargo test -p codex-core --all-features --no-run`
- `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability`
- `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability`
- `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
## Summary
- normalize effective readable, writable, and unreadable sandbox roots
after resolving special paths so symlinked roots use canonical runtime
paths
- add a protocol regression test for a symlinked writable root with a
denied child and update protocol expectations to canonicalized effective
paths
- update macOS seatbelt tests to assert against effective normalized
roots produced by the shared policy helpers
## Testing
- just fmt
- cargo test -p codex-protocol
- cargo test -p codex-core explicit_unreadable_paths_are_excluded_
- cargo clippy -p codex-protocol -p codex-core --tests -- -D warnings
## Notes
- This is intended to fix the symlinked TMPDIR bind failure in
bubblewrap described in #14672.
Fixes#14672
This extends dynamic_tool_calls to allow us to hide a tool from the
model context but still use it as part of the general tool calling
runtime (for ex from js_repl/code_mode)
make plugins' `defaultPrompt` an array, but keep backcompat for strings.
the array is limited by app-server to 3 entries of up to 128 chars
(drops extra entries, `None`s-out ones that are too long) without
erroring if those invariants are violating.
added tests, tested locally.
We receive bug reports from users who attempt to override one of the
three built-in model providers (openai, ollama, or lmstuio). Currently,
these overrides are silently ignored. This PR makes it an error to
override them.
## Summary
- add validation for `model_providers` so `openai`, `ollama`, and
`lmstudio` keys now produce clear configuration errors instead of being
silently ignored
Move the general `Apps`, `Skills` and `Plugins` instructions blocks out
of `user_instructions` and into the developer message, with new `Apps ->
Skills -> Plugins` order for better clarity.
Also wrap those sections in stable XML-style instruction tags (like
other sections) and update prompt-layout tests/snapshots. This makes the
tests less brittle in snapshot output (we can parse the sections), and
it consolidates the capability instructions in one place.
#### Tests
Updated snapshots, added tests.
`<AGENTS_MD>` disappearing in snapshots is expected: before this change,
the wrapped user-instructions message was kept alive by `Skills`
content. Now that `Skills` and `Plugins` are in the developer message,
that wrapper only appears when there is real
project-doc/user-instructions content.
---------
Co-authored-by: Charley Cunningham <ccunningham@openai.com>
## Summary
- reapply the live split filesystem and network sandbox policies when
building spawned subagent configs
- keep spawned child sessions aligned with the parent turn after
role-layer config reloads
- add regression coverage for both config construction and spawned
child-turn inheritance
## Summary
- apply persisted execpolicy network rules when booting the managed
network proxy
- pass the current execpolicy into managed proxy startup so host
approvals selected with "allow this host in the future" survive new
sessions
## Summary
- reuse rollout reconstruction when applying a backtrack rollback so
`reference_context_item` is restored from persisted rollout state
- build rollback replay from the flushed rollout items plus the rollback
marker, avoiding the extra reread/fallback path
- add regression coverage for rollback after compaction so turn-context
diffing stays aligned after backtracking
Co-authored-by: Codex <noreply@openai.com>
- Normalize guardian assessment path serialization to use forward
slashes for cross-platform stability.
- Seed workspace-write defaults in the Smart Approvals
override-turn-context test so Windows and non-Windows selection flows
are consistent.
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
We regularly get bug reports from users who mistakenly have the
`OPENAI_BASE_URL` environment variable set. This PR deprecates this
environment variable in favor of a top-level config key
`openai_base_url` that is used for the same purpose. By making it a
config key, it will be more visible to users. It will also participate
in all of the infrastructure we've added for layered and managed
configs.
Summary
- introduce the `openai_base_url` top-level config key, update
schema/tests, and route the built-in openai provider through it while
- fall back to deprecated `OPENAI_BASE_URL` env var but warn user of
deprecation when no `openai_base_url` config key is present
- update CLI, SDK, and TUI code to prefer the new config path (with a
deprecated env-var fallback) and document the SDK behavior change
## Why
The unified-exec path was carrying zsh-fork state in a partially
flattened way.
First, the decision about whether zsh-fork was active came from feature
selection in `ToolsConfig`, while the real prerequisites lived in
session state. That left the handler and runtime defending against
partially configured cases later.
Second, once zsh-fork was active, its two runtime-only paths were
threaded through the runtime as separate arguments even though they form
one coherent piece of configuration.
This change keeps unified-exec on a single session-derived source of
truth and bundles the zsh-fork-specific paths into a named config type
so the runtime can pass them around as one unit.
In particular, this PR introduces this enum so the `ZshFork` variant can
carry the appropriate state with it:
```rust
#[derive(Debug, Clone, Eq, PartialEq)]
pub enum UnifiedExecShellMode {
Direct,
ZshFork(ZshForkConfig),
}
#[derive(Debug, Clone, Eq, PartialEq)]
pub struct ZshForkConfig {
pub(crate) shell_zsh_path: AbsolutePathBuf,
pub(crate) main_execve_wrapper_exe: AbsolutePathBuf,
}
```
This cleanup was done in preparation for
https://github.com/openai/codex/pull/13432.
## What Changed
- Replaced the feature-only `UnifiedExecBackendConfig` split with
`UnifiedExecShellMode` in `codex-rs/core/src/tools/spec.rs`.
- Derived the unified-exec mode from session-backed inputs when building
turn `ToolsConfig`, and preserved that mode across model switches and
review turns.
- Introduced `ZshForkConfig`, which stores the resolved zsh-fork
`AbsolutePathBuf` values for the configured `zsh` binary and `execve`
wrapper.
- Threaded `ZshForkConfig` through unified-exec command construction and
the zsh-fork preparation path so zsh-fork-specific runtime code consumes
a single config object instead of separate path arguments.
- Added focused tests for constructing zsh-fork mode only when session
prerequisites are available, and updated the zsh-fork expectations to be
target-platform aware.
## Testing
- `cargo test -p codex-core zsh_fork --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/14633).
* #13432
* __->__ #14633
Stop hooks now receive `stop_hook_active` and enable stop hooks to loop
forever if they'd like to. In the initial hooks PR, we implemented a
simpler mechanic that the stop-blocking could only happen once in a row
- support stop hook adding a continuation prompt to add a further task
- if multiple stop-blocks happen that have continuation prompts, they
are concatenated
example run:
```
› hey :)
• Running SessionStart hook: lighting the observatory
SessionStart hook (completed)
warning: Hi, I'm a session start hook for wizard-tower (startup).
hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace
• Aloha :) Happy to jam with you. What are we building today?
• Running Stop hook: updating the guards
Stop hook (blocked)
warning: Wizard Tower Stop hook continuing conversation
feedback: cook the stonpet
• Aloha, here’s the hyperspace move for cooking a wimboltine stonpet:
1. Sear the stonpet in a hot pan with moon-oil until the edges shimmer.
2. Add star-lime, black salt, and a little fermented nebula paste.
3. Lower the heat and let it braise for 8 cosmic minutes with a splash of comet broth.
4. Finish with sky herbs and serve over warm asteroid rice.
The vibe: crispy outside, tender center, deep interdimensional savor.
• Running Stop hook: updating the guards
Stop hook (completed)
warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
```
.codex/config.toml
```
[features]
codex_hooks = true
```
.codex/hooks.json
```
{
"hooks": {
"SessionStart": [
{
"matcher": "startup|resume",
"hooks": [
{
"type": "command",
"command": "/usr/bin/python3 .codex/hooks/session_start_demo.py",
"timeoutSec": 10,
"statusMessage": "lighting the observatory"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "/usr/bin/python3 .codex/hooks/stop_demo_block.py",
"timeoutSec": 10,
"statusMessage": "updating the guards"
}
]
}
]
}
}
```
.codex/hooks/session_start_demo.py
```
#!/usr/bin/env python3
import json
import sys
from pathlib import Path
def main() -> int:
payload = json.load(sys.stdin)
cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"
source = payload.get("source", "startup")
source_label = "resume" if source == "resume" else "startup"
source_prefix = (
"Resume context:"
if source == "resume"
else "Startup context:"
)
output = {
"systemMessage": (
f"Hi, I'm a session start hook for {cwd} ({source_label})."
),
"hookSpecificOutput": {
"hookEventName": "SessionStart",
"additionalContext": (
f"{source_prefix} A wimboltine stonpet is an exotic cuisine from hyperspace"
),
},
}
print(json.dumps(output))
return 0
if __name__ == "__main__":
raise SystemExit(main())
```
.codex/hooks/stop_demo_block.py
```
#!/usr/bin/env python3
import json
import sys
def main() -> int:
payload = json.load(sys.stdin)
stop_hook_active = payload.get("stop_hook_active", False)
last_assistant_message = payload.get("last_assistant_message") or ""
char_count = len(last_assistant_message.strip())
if stop_hook_active:
system_message = (
"Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop."
)
print(json.dumps({"systemMessage": system_message}))
else:
system_message = (
f"Wizard Tower Stop hook continuing conversation"
)
print(json.dumps({"systemMessage": system_message, "decision": "block", "reason": "cook the stonpet"}))
return 0
if __name__ == "__main__":
raise SystemExit(main())
```
## Summary
- replace the stale `create_wait_tool()` reference in `spec_tests.rs`
- use `create_wait_agent_tool()` to match the actual multi-agent tool
rename from `#14631`
- fix the resulting `codex-core` spec-test compile failure on current
`main`
## Context
`#14631` renamed the model-facing multi-agent tool from `wait` to
`wait_agent` and renamed the corresponding spec helper to
`create_wait_agent_tool()`.
One `spec_tests.rs` call site was left behind, so current `main` fails
to compile `codex-core` tests with:
- `cannot find function create_wait_tool`
Using `create_wait_agent_tool()` is the correct fix here;
`create_exec_wait_tool()` would point at the separate exec wait tool and
would not match the renamed multi-agent toolset.
## Testing
- not rerun locally after the rebase
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved
## Runtime model
This PR does not introduce a new `approval_policy`.
Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
- `user`
- `guardian_subagent`
`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.
The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.
When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`
Users can still change `/approvals` afterward.
Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior
ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.
## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.
- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface
## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.
It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`
with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`
`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`
`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.
These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.
This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.
## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
- resolved approval/denial state in history
## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)
Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- update stale core tool-spec expectations from `wait` to `wait_agent`
- update the prompt-caching tool-name assertion to match the renamed
tool
- fix the Bazel regressions introduced after #14631 renamed the
multi-agent wait tool
## Testing
- cargo test -p codex-core tools::spec::tests
- cargo test -p codex-core
suite::prompt_caching::prompt_tools_are_consistent_across_requests
Co-authored-by: Codex <noreply@openai.com>