Use the existing model_info::with_config_overrides path for alias-specific context window and auto-compact overrides instead of mutating those fields inline in ModelsManager.
The custom alias path now folds alias-level override values into a temporary config and reuses the centralized override helper, which keeps the precedence behavior unchanged while removing duplicated override logic.
List custom model aliases before bundled models in the picker while keeping default-model selection anchored to the bundled priority order when bundled presets exist.
Also add a regression test covering the new ordering.
Accept custom model aliases from [[custom_models]] entries so user config matches the documented TOML shape.
Also add explicit alias names plus duplicate-alias validation and refresh the generated schema/docs to match.
This adds config.toml-defined model aliases that map to provider model slugs while applying alias-specific context settings for the active session.
- added custom_models config entries plus schema and docs coverage
- taught ModelsManager to resolve aliases to a provider-facing request_model while preserving the user-facing alias slug
- applied alias-specific context_window and model_auto_compact_token_limit overrides during model info resolution
- updated session/test plumbing and added regression coverage for alias resolution with local and remote model catalogs
Model selection and per-session context overrides already flow through ModelsManager and Config. Resolving aliases there keeps the provider slug separate from the user-facing alias while reusing the existing override plumbing.
- just write-config-schema
- just fmt
- cargo test -p codex-app-server -p codex-api -p codex-exec -p codex-mcp-server
- cargo test -p codex-tui
- cargo test -p codex-protocol -p codex-core (same two existing seatbelt failures remained in this environment: create_seatbelt_args_with_read_only_git_pointer_file and create_seatbelt_args_with_read_only_git_and_codex_subpaths)
- just fix -p codex-core -p codex-protocol -p codex-app-server -p codex-api -p codex-exec -p codex-mcp-server -p codex-tui
## Summary
Clarify the `js_repl` prompt guidance around persistent bindings and
redeclaration recovery.
This updates the generated `js_repl` instructions in
`core/src/project_doc.rs` to prefer this order when a name is already
bound:
1. Reuse the existing binding
2. Reassign a previously declared `let`
3. Pick a new descriptive name
4. Use `{ ... }` only for short-lived scratch scope
5. Reset the kernel only when a clean state is actually needed
The prompt now also explicitly warns against wrapping an entire cell in
block scope when the goal is to reuse names across later cells.
## Why
The previous wording still left too much room for low-value workarounds
like whole-cell block wrapping. In downstream browser rollouts, that
pattern was adding tokens and preventing useful state reuse across
`js_repl` cells.
This change makes the preferred behavior more explicit without changing
runtime semantics.
## Scope
- Prompt/documentation change only
- No runtime behavior changes
- Updates the matching string-backed `project_doc` tests
Enhance pty utils:
* Support closing stdin
* Separate stderr and stdout streams to allow consumers differentiate them
* Provide compatibility helper to merge both streams back into combined one
* Support specifying terminal size for pty, including on-demand resizes while process is already running
* Support terminating the process while still consuming its outputs
## Problem
Browser login failures historically leave support with an incomplete
picture. HARs can show that the browser completed OAuth and reached the
localhost callback, but they do not explain why the native client failed
on the final `/oauth/token` exchange. Direct `codex login` also relied
mostly on terminal stderr and the browser error page, so even when the
login crate emitted better sign-in diagnostics through TUI or app-server
flows, the one-shot CLI path still did not leave behind an easy artifact
to collect.
## Mental model
This implementation treats the browser page, the returned `io::Error`,
and the normal structured log as separate surfaces with different safety
requirements. The browser page and returned error preserve the detail
that operators need to diagnose failures. The structured log stays
narrower: it records reviewed lifecycle events, parsed safe fields, and
redacted transport errors without becoming a sink for secrets or
arbitrary backend bodies.
Direct `codex login` now adds a fourth support surface: a small
file-backed log at `codex-login.log` under the configured `log_dir`.
That artifact carries the same login-target events as the other
entrypoints without changing the existing stderr/browser UX.
## Non-goals
This does not add auth logging to normal runtime requests, and it does
not try to infer precise transport root causes from brittle string
matching. The scope remains the browser-login callback flow in the
`login` crate plus a direct-CLI wrapper that persists those events to
disk.
This also does not try to reuse the TUI logging stack wholesale. The TUI
path initializes feedback, OpenTelemetry, and other session-oriented
layers that are useful for an interactive app but unnecessary for a
one-shot login command.
## Tradeoffs
The implementation favors fidelity for caller-visible errors and
restraint for persistent logs. Parsed JSON token-endpoint errors are
logged safely by field. Non-JSON token-endpoint bodies remain available
to the returned error so CLI and browser surfaces still show backend
detail. Transport errors keep their real `reqwest` message, but attached
URLs are surgically redacted. Custom issuer URLs are sanitized before
logging.
On the CLI side, the code intentionally duplicates a narrow slice of the
TUI file-logging setup instead of sharing the full initializer. That
keeps `codex login` easy to reason about and avoids coupling it to
interactive-session layers that the command does not need.
## Architecture
The core auth behavior lives in `codex-rs/login/src/server.rs`. The
callback path now logs callback receipt, callback validation,
token-exchange start, token-exchange success, token-endpoint non-2xx
responses, and transport failures. App-server consumers still use this
same login-server path via `run_login_server(...)`, so the same
instrumentation benefits TUI, Electron, and VS Code extension flows.
The direct CLI path in `codex-rs/cli/src/login.rs` now installs a small
file-backed tracing layer for login commands only. That writes
`codex-login.log` under `log_dir` with login-specific targets such as
`codex_cli::login` and `codex_login::server`.
## Observability
The main signals come from the `login` crate target and are
intentionally scoped to sign-in. Structured logs include redacted issuer
URLs, redacted transport errors, HTTP status, and parsed token-endpoint
fields when available. The callback-layer log intentionally avoids
`%err` on token-endpoint failures so arbitrary backend bodies do not get
copied into the normal log file.
Direct `codex login` now leaves a durable artifact for both failure and
success cases. Example output from the new file-backed CLI path:
Failing callback:
```text
2026-03-06T22:08:54.143612Z INFO codex_cli::login: starting browser login flow
2026-03-06T22:09:03.431699Z INFO codex_login::server: received login callback path=/auth/callback has_code=false has_state=true has_error=true state_valid=true
2026-03-06T22:09:03.431745Z WARN codex_login::server: oauth callback returned error error_code="access_denied" has_error_description=true
```
Succeeded callback and token exchange:
```text
2026-03-06T22:09:14.065559Z INFO codex_cli::login: starting browser login flow
2026-03-06T22:09:36.431678Z INFO codex_login::server: received login callback path=/auth/callback has_code=true has_state=true has_error=false state_valid=true
2026-03-06T22:09:36.436977Z INFO codex_login::server: starting oauth token exchange issuer=https://auth.openai.com/ redirect_uri=http://localhost:1455/auth/callback
2026-03-06T22:09:36.685438Z INFO codex_login::server: oauth token exchange succeeded status=200 OK
```
## Tests
- `cargo test -p codex-login`
- `cargo clippy -p codex-login --tests -- -D warnings`
- `cargo test -p codex-cli`
- `just bazel-lock-update`
- `just bazel-lock-check`
- manual direct `codex login` smoke tests for both a failing callback
and a successful browser login
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
This is a structural cleanup of `codex-otel` to make the ownership
boundaries a lot clearer.
For example, previously it was quite confusing that `OtelManager` which
emits log + trace event telemetry lived under
`codex-rs/otel/src/traces/`. Also, there were two places that defined
methods on OtelManager via `impl OtelManager` (`lib.rs` and
`otel_manager.rs`).
What changed:
- move the `OtelProvider` implementation into `src/provider.rs`
- move `OtelManager` and session-scoped event emission into
`src/events/otel_manager.rs`
- collapse the shared log/trace event helpers into
`src/events/shared.rs`
- pull target classification into `src/targets.rs`
- move `traceparent_context_from_env()` into `src/trace_context.rs`
- keep `src/otel_provider.rs` as a compatibility shim for existing
imports
- update the `codex-otel` README to reflect the new layout
## Why
`lib.rs` and `otel_provider.rs` were doing too many different jobs at
once: provider setup, export routing, trace-context helpers, and session
event emission all lived together.
This refactor separates those concerns without trying to change the
behavior of the crate. The goal is to make future OTEL work easier to
reason about and easier to review.
## Notes
- no intended behavior change
- `OtelManager` remains the session-scoped event emitter in this PR
- the `otel_provider` shim keeps downstream churn low while the
internals move around
## Validation
- `just fmt`
- `cargo test -p codex-otel`
- `just fix -p codex-otel`
Restore the branch-local root/subagent prompt injection in codex-core and update the rollback snapshot to match. This keeps the watchdog branch aligned with its own role-prompt behavior instead of silently adopting main.s prompt layout.
Stop prepending root and subagent prompt files into session developer instructions during startup. Main no longer does this, and keeping the branch-local wrapping caused compaction and rollback snapshot drift in core tests.\n\nKeep the watchdog-specific prompt loader for watchdog check-ins, but leave ordinary session developer instructions sourced directly from config like upstream.
Align the branch ambient role spawn behavior with upstream by making spawn the default when neither the tool call nor the role config specifies a spawn mode.
This keeps the branch delta smaller while letting local config carry role-specific fork defaults where needed.
Stop hardcoding model and spawn-mode defaults for the built-in explorer, fast-worker, and awaiter roles.
That behavior can be expressed through local config instead, which keeps the branch feature focused on per-role override support rather than policy.
Set the built-in role to use , matching the existing lightweight worker-style roles.
This keeps long-running helper agents on the cheaper/faster model path by default while preserving per-role model override support in config.
Allow subagent spawning at the configured max depth by only disabling the agents feature when child depth exceeds the limit.
Update the affected codex-core tests to match the current role spec output and context-free spawn config behavior, and keep the depth-boundary spawn test on the plain spawn path while asserting that the spawned thread is actually registered.
Adjust the watchdog compaction regression test to match the new duplicate-blocking behavior instead of pre-seeding the in-progress set.
Also relax the MCP codex-tool assertion so it checks for developer instructions by substring, which keeps the test valid when branch-added root/subagent prompt text is coalesced into the same developer message.
Rename the branch-local collab inbox payload, constants, helper names,
and prompt text to agent inbox terminology without touching upstream
collaboration mode surfaces.
This keeps the watchdog/runtime behavior intact while removing the
branch-added collab naming that leaked into the stack.
Use the existing model_info::with_config_overrides path for alias-specific context window and auto-compact overrides instead of mutating those fields inline in ModelsManager.
The custom alias path now folds alias-level override values into a temporary config and reuses the centralized override helper, which keeps the precedence behavior unchanged while removing duplicated override logic.
List custom model aliases before bundled models in the picker while keeping default-model selection anchored to the bundled priority order when bundled presets exist.
Also add a regression test covering the new ordering.
Accept custom model aliases from [[custom_models]] entries so user config matches the documented TOML shape.
Also add explicit alias names plus duplicate-alias validation and refresh the generated schema/docs to match.
Preserve internal subagent handoffs as injected response items instead of degrading them into synthetic user messages.
When the destination root thread is idle, prepend an empty user message before the function-call/function-call-output pair so injection starts a valid turn. Keep active-turn behavior and subagent routing unchanged, and retain the regression coverage for the idle-root path.
This adds config.toml-defined model aliases that map to provider model slugs while applying alias-specific context settings for the active session.
- added custom_models config entries plus schema and docs coverage
- taught ModelsManager to resolve aliases to a provider-facing request_model while preserving the user-facing alias slug
- applied alias-specific context_window and model_auto_compact_token_limit overrides during model info resolution
- updated session/test plumbing and added regression coverage for alias resolution with local and remote model catalogs
Model selection and per-session context overrides already flow through ModelsManager and Config. Resolving aliases there keeps the provider slug separate from the user-facing alias while reusing the existing override plumbing.
- just write-config-schema
- just fmt
- cargo test -p codex-app-server -p codex-api -p codex-exec -p codex-mcp-server
- cargo test -p codex-tui
- cargo test -p codex-protocol -p codex-core (same two existing seatbelt failures remained in this environment: create_seatbelt_args_with_read_only_git_pointer_file and create_seatbelt_args_with_read_only_git_and_codex_subpaths)
- just fix -p codex-core -p codex-protocol -p codex-app-server -p codex-api -p codex-exec -p codex-mcp-server -p codex-tui
Reconcile the combined branch after merging the tracked feature branches.
- restore branch-owned config surface that was dropped during merge resolution
- handle ForkReference rollout items in codex-state metadata extraction
- materialize fork-reference history before building SessionConfigured so forked sessions keep parent lineage and startup scrollback
- cover the fork startup contract with a focused core regression test
## Summary
- reject the global `*` domain pattern in proxy allow/deny lists and
managed constraints introduced for testing earlier
- keep exact hosts plus scoped wildcards like `*.example.com` and
`**.example.com`
- update docs and regression tests for the new invalid-config behavior
Reject stale or corrupt archived rollout candidates before moving files
back into sessions. The unarchive path now requires the candidate to
live under archived_sessions and to have a rollout filename whose UUID
matches the requested thread id.
Add a regression test covering a stale archived DB rollout_path so an
unrelated active-session file is never renamed during lookup.
Restore archived rollout files back into sessions when resolving a thread by id. This lets resume, fork, and resume_agent paths that rely on find_thread_path_by_id_str recover archived sessions automatically instead of reporting them missing.
Also adds a regression test covering archived->sessions restoration and lookup behavior.
(cherry picked from commit fe31d1a911)
At over 7,000 lines, `codex-rs/core/src/config/mod.rs` was getting a bit
unwieldy.
This PR does the same type of move as
https://github.com/openai/codex/pull/12957 to put unit tests in their
own file, though I decided `config_tests.rs` is a more intuitive name
than `mod_tests.rs`.
Ultimately, I'll codemod the rest of the codebase to follow suit, but I
want to do it in stages to reduce merge conflicts for people.
## Summary
- reduce the SQLite-backed log retention window from 90 days to 10 days
## Testing
- just fmt
- cargo test -p codex-state
Co-authored-by: Codex <noreply@openai.com>
#### What
Add structured `@plugin` parsing and TUI support for plugin mentions.
- Core: switch from plain-text `@display_name` parsing to structured
`plugin://...` mentions via `UserInput::Mention` and
`[$...](plugin://...)` links in text, same pattern as apps/skills.
- TUI: add plugin mention popup, autocomplete, and chips when typing
`$`. Load plugin capability summaries and feed them into the composer;
plugin mentions appear alongside skills and apps.
- Generalize mention parsing to a sigil parameter, still defaults to `$`
<img width="797" height="119" alt="image"
src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
/>
Builds on #13510. Currently clients have to build their own `id` via
`plugin@marketplace` and filter plugins to show by `enabled`, but we
will add `id` and `available` as fields returned from `plugin/list`
soon.
####Tests
Added tests, verified locally.
This branch:
* Avoid flushing DB when not necessary
* Filter events for which we perfom an `upsert` into the DB
* Add a dedicated update function of the `thread:updated_at` that is
lighter
This should significantly reduce the DB lock contention. If it is not
sufficient, we can de-sync the flush of the DB for `updated_at`
## Summary
- move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
database to reduce lock contention with the main state DB
- add a dedicated logs migrator and route `codex-state-logs` to the new
database path
- leave the old `logs` table in the existing state DB untouched for now
## Testing
- just fmt
- cargo test -p codex-state
---------
Co-authored-by: Codex <noreply@openai.com>
### Summary
This adds turn-level latency metrics for the first model output and the
first completed agent message.
- `codex.turn.ttft.duration_ms` starts at turn start and records on the
first output signal we see from the model. That includes normal
assistant text, reasoning deltas, and non-text outputs like tool-call
items.
- `codex.turn.ttfm.duration_ms` also starts at turn start, but it
records when the first agent message finishes streaming rather than when
its first delta arrives.
### Implementation notes
The timing is tracked in codex-core, not app-server, so the definition
stays consistent across CLI, TUI, and app-server clients.
I reused the existing turn lifecycle boundary that already drives
`codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
state, and record each metric once per turn.
I also wired the new metric names into the OTEL runtime metrics summary
so they show up in the same in-memory/debug snapshot path as the
existing timing metrics.
Use the existing model_info::with_config_overrides path for alias-specific context window and auto-compact overrides instead of mutating those fields inline in ModelsManager.
The custom alias path now folds alias-level override values into a temporary config and reuses the centralized override helper, which keeps the precedence behavior unchanged while removing duplicated override logic.
This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test.
The interrupt path can race with the follow-up `/responses` request that
reports the aborted tool call, so the test now allows that extra no-op
response instead of assuming there will only ever be one request. The
assertions still stay focused on the behavior the test actually cares
about: starting the zsh-forked command correctly.
Testing:
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2
-- --exact --nocapture`
## Summary
Today `SandboxPermissions::requires_additional_permissions()` does not
actually mean "is `WithAdditionalPermissions`". It returns `true` for
any non-default sandbox override, including `RequireEscalated`. That
broad behavior is relied on in multiple `main` callsites.
The naming is security-sensitive because `SandboxPermissions` is used on
shell-like tool calls to tell the executor how a single command should
relate to the turn sandbox:
- `UseDefault`: run with the turn sandbox unchanged
- `RequireEscalated`: request execution outside the sandbox
- `WithAdditionalPermissions`: stay sandboxed but widen permissions for
that command only
## Problem
The old helper name reads as if it only applies to the
`WithAdditionalPermissions` variant. In practice it means "this command
requested any explicit sandbox override."
That ambiguity made it easy to read production checks incorrectly and
made the guardian change look like a standalone `main` fix when it is
not.
On `main` today:
- `shell` and `unified_exec` intentionally reject any explicit
`sandbox_permissions` request unless approval policy is `OnRequest`
- `exec_policy` intentionally treats any explicit sandbox override as
prompt-worthy in restricted sandboxes
- tests intentionally serialize both `RequireEscalated` and
`WithAdditionalPermissions` as explicit sandbox override requests
So changing those callsites from the broad helper to a narrow
`WithAdditionalPermissions` check would be a behavior change, not a pure
cleanup.
## What This PR Does
- documents `SandboxPermissions` as a per-command sandbox override, not
a generic permissions bag
- adds `requests_sandbox_override()` for the broad meaning: anything
except `UseDefault`
- adds `uses_additional_permissions()` for the narrow meaning: only
`WithAdditionalPermissions`
- keeps `requires_additional_permissions()` as a compatibility alias to
the broad meaning for now
- updates the current broad callsites to use the accurately named broad
helper
- adds unit coverage that locks in the semantics of all three helpers
## What This PR Does Not Do
This PR does not change runtime behavior. That is intentional.
---------
Co-authored-by: Codex <noreply@openai.com>
List custom model aliases before bundled models in the picker while keeping default-model selection anchored to the bundled priority order when bundled presets exist.
Also add a regression test covering the new ordering.
Accept custom model aliases from [[custom_models]] entries so user config matches the documented TOML shape.
Also add explicit alias names plus duplicate-alias validation and refresh the generated schema/docs to match.
## Summary
- add one-time session recovery in `RmcpClient` for streamable HTTP MCP
`404` session expiry
- rebuild the transport and retry the failed operation once after
reinitializing the client state
- extend the test server and integration coverage for `404`, `401`,
single-retry, and non-session failure scenarios
## Testing
- just fmt
- cargo test -p codex-rmcp-client (the post-rebase run lost its final
summary in the terminal; the suite had passed earlier before the rebase)
- just fix -p codex-rmcp-client
`/feedback` uploads can include `codex-logs.log` from the in-memory
feedback logger path. That logger was emitting level + message without a
timestamp, which made some uploaded logs much harder to inspect. This
change makes the feedback logger use an explicit timer so
feedback-captured log lines include timestamps consistently.
This is not Windows-specific code. The bug showed up in Windows reports
because those uploads were hitting the feedback-buffer path more often,
while Linux/macOS reports were typically coming from the SQLite feedback
export, which already prefixes timestamps.
Here's an example of a log that is missing the timestamps:
```
TRACE app-server request: getAuthStatus
TRACE app-server request: model/list
INFO models cache: evaluating cache eligibility
INFO models cache: attempting load_fresh
INFO models cache: loaded cache file
INFO models cache: cache version mismatch
INFO models cache: no usable cache entry
DEBUG
INFO models cache: cache miss, fetching remote models
TRACE windows::current_platform is called
TRACE Returning Info { os_type: Windows, version: Semantic(10, 0, 26200), edition: Some("Windows 11 Professional"), codename: None, bitness: X64, architecture: Some("x86_64") }
```
Restore the branch-local root/subagent prompt injection in codex-core and update the rollback snapshot to match. This keeps the watchdog branch aligned with its own role-prompt behavior instead of silently adopting main.s prompt layout.
Stop prepending root and subagent prompt files into session developer instructions during startup. Main no longer does this, and keeping the branch-local wrapping caused compaction and rollback snapshot drift in core tests.\n\nKeep the watchdog-specific prompt loader for watchdog check-ins, but leave ordinary session developer instructions sourced directly from config like upstream.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.
checks are best effort based on `codex_apps` loading, much like
`app/list`.
#### Tests
Added integration tests, tested locally.
- Replay thread rollback from the persisted rollout history instead of
truncating in-memory state.\n- Add rollback coverage, including
rollback-behind-compaction snapshot coverage.
## Summary
Simplify the trusted directory flow. This logic was originally designed
several months ago, to determine if codex should start in read-only or
workspace-write mode. However, that's no longer the purpose of directory
trust - and therefore we should get rid of this logic.
## Testing
- [x] Unit tests pass
We do this for codex-command-runner.exe as well for the same reason.
Windows sandbox users cannot execute binaries in the WindowsApp/
installed directory for the Codex App. This causes apply-patch to fail
because it tries to execute codex.exe as the sandbox user.
## Summary
- delete the network proxy admin server and its runtime listener/task
plumbing
- remove the admin endpoint config, runtime, requirement, protocol,
schema, and debug-surface fields
- update proxy docs to reflect the remaining HTTP and SOCKS listeners
only
## Summary
This PR:
1. fixes a deserialization mismatch for macOS automation permissions in
approval payloads by making core parsing accept both supported wire
shapes for bundle IDs.
2. added `#[serde(default)]` to `MacOsSeatbeltProfileExtensions` so
omitted fields deserialize to secure defaults.
## Why this change is needed
`MacOsAutomationPermission` uses `#[serde(try_from =
"MacOsAutomationPermissionDe")]`, so deserialization is controlled by
`MacOsAutomationPermissionDe`. After we aligned v2
`additionalPermissions.macos.automations` to the core shape, approval
payloads started including `{ "bundle_ids": [...] }` in some paths.
`MacOsAutomationPermissionDe` previously accepted only `"none" | "all"`
or a plain array, so object-shaped bundle IDs failed with `data did not
match any variant of untagged enum MacOsAutomationPermissionDe`. This
change restores compatibility by accepting both forms while preserving
existing normalization behavior (trim values and map empty bundle lists
to `None`).
## Validation
saw this error went away when running
```
cargo run -p codex-app-server-test-client -- \
--codex-bin ./target/debug/codex \
-c 'approval_policy="on-request"' \
-c 'features.shell_zsh_fork=true' \
-c 'zsh_path="/tmp/codex-zsh-fork/package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh"' \
send-message-v2 --experimental-api \
'Use $apple-notes and run scripts/notes_info now.'
```
:
```
Error: failed to deserialize ServerRequest from JSONRPCRequest
Caused by:
data did not match any variant of untagged enum MacOsAutomationPermissionDe
```
Align the branch ambient role spawn behavior with upstream by making spawn the default when neither the tool call nor the role config specifies a spawn mode.
This keeps the branch delta smaller while letting local config carry role-specific fork defaults where needed.
## Summary
This PR removes legacy macOS permission model types from
`codex-rs/protocol/src/models.rs`:
- `MacOsPermissions`
- `MacOsPreferencesValue`
- `MacOsAutomationValue`
The protocol now relies on the current `MacOsSeatbeltProfileExtensions`
model for macOS permission data.
Stop hardcoding model and spawn-mode defaults for the built-in explorer, fast-worker, and awaiter roles.
That behavior can be expressed through local config instead, which keeps the branch feature focused on per-role override support rather than policy.
Set the built-in role to use , matching the existing lightweight worker-style roles.
This keeps long-running helper agents on the cheaper/faster model path by default while preserving per-role model override support in config.
Allow subagent spawning at the configured max depth by only disabling the agents feature when child depth exceeds the limit.
Update the affected codex-core tests to match the current role spec output and context-free spawn config behavior, and keep the depth-boundary spawn test on the plain spawn path while asserting that the spawned thread is actually registered.
Adjust the watchdog compaction regression test to match the new duplicate-blocking behavior instead of pre-seeding the in-progress set.
Also relax the MCP codex-tool assertion so it checks for developer instructions by substring, which keeps the test valid when branch-added root/subagent prompt text is coalesced into the same developer message.
## Summary
- default the resume picker sort key to UpdatedAt instead of CreatedAt
- keep Tab sort toggling behavior and update the test expectation for
the new default
## Testing
- just fmt
- cargo test -p codex-tui
Co-authored-by: Codex <noreply@openai.com>
This adds config.toml-defined model aliases that map to provider model slugs while applying alias-specific context settings for the active session.
### What changed
- added custom_models config entries plus schema and docs coverage
- taught ModelsManager to resolve aliases to a provider-facing request_model while preserving the user-facing alias slug
- applied alias-specific context_window and model_auto_compact_token_limit overrides during model info resolution
- updated session/test plumbing and added regression coverage for alias resolution with local and remote model catalogs
### Why this approach
Model selection and per-session context overrides already flow through ModelsManager and Config. Resolving aliases there keeps the provider slug separate from the user-facing alias while reusing the existing override plumbing.
### Testing
- just write-config-schema
- just fmt
- cargo test -p codex-app-server -p codex-api -p codex-exec -p codex-mcp-server
- cargo test -p codex-tui
- cargo test -p codex-protocol -p codex-core (same two existing seatbelt failures remained in this environment: create_seatbelt_args_with_read_only_git_pointer_file and create_seatbelt_args_with_read_only_git_and_codex_subpaths)
- just fix -p codex-core -p codex-protocol -p codex-app-server -p codex-api -p codex-exec -p codex-mcp-server -p codex-tui
## Summary
- keep the SQLite schema unchanged (no migrations)
- add timestamps to SQLite-backed `/feedback` log exports
- keep the existing SQL-side byte cap behavior and newline handling
- document the remaining fidelity gap (span prefixes + structured
fields) with TODOs
## Details
- update `query_feedback_logs` to format each exported line as:
- `YYYY-MM-DDTHH:MM:SS.ffffffZ {level} {message}`
- continue scoping rows to requested-thread + same-process threadless
logs
- continue capping in SQL before returning rows
- keep the existing fallback behavior unchanged when SQLite returns no
rows
- update parity tests to normalize away the new timestamp prefix while
we still only store `message`
## Follow-up
- TODO already in code: persist enough span/event metadata in SQLite to
reproduce span prefixes and structured fields in `/feedback` exports
## Testing
- `cargo test -p codex-state`
- `just fmt`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- render pending steer previews with a single `pending steer:` prefix
instead of repeating it for each source line
- reuse the same truncation path for pending steers and queued drafts so
multiline previews behave consistently
- add snapshot coverage for the multiline pending steer case
Before
<img width="969" height="219" alt="Screenshot 2026-03-05 at 3 55 11 PM"
src="https://github.com/user-attachments/assets/b062c9c8-43d3-4a52-98e0-3c7643d1697b"
/>
After
<img width="965" height="203" alt="Screenshot 2026-03-05 at 3 56 08 PM"
src="https://github.com/user-attachments/assets/40935863-55b3-444f-9e14-1ac63126b2e1"
/>
## Codex author
`codex resume 019cc054-385e-79a3-bb85-ec9499623bd8`
Co-authored-by: Codex <noreply@openai.com>
- Replay thread rollback from the persisted rollout history instead of
truncating in-memory state.\n- Add rollback coverage, including
rollback-behind-compaction snapshot coverage.
Hydrate fork-reference chains before truncating forked histories and
before building forked spawn histories, then use the hydrated items
for forked-session startup replay while still persisting only the
compact fork-reference suffix.
This fixes fork-of-fork boundary calculations, preserves inherited
context on forked startup, and updates the regression tests to compare
logical materialized history instead of raw compact storage bytes.
## Summary
- group recent work by git repo when available, otherwise by directory
- render recent work as bounded user asks with per-thread cwd context
- exclude hidden files and directories from workspace trees
### Motivation
Today config.toml has three different OTEL knobs under `[otel]`:
- `exporter` controls where OTEL logs go
- `trace_exporter` controls where OTEL traces go
- `metrics_exporter` controls where metrics go
Those often (pretty much always?) serve different purposes.
For example, for OpenAI internal usage, the **log exporter** is already
being used for IT/security telemetry, and that use case is intentionally
content-rich: tool calls, arguments, outputs, MCP payloads, and in some
cases user content are all useful there. `log_user_prompt` is a good
example of that distinction. When it’s enabled, we include raw prompt
text in OTEL logs, which is acceptable for the security use case.
The **trace exporter** is a different story. The goal there is to give
OpenAI engineers visibility into latency and request behavior when they
run Codex locally, without sending sensitive prompt or tool data as
trace event data. In other words, traces should help answer “what was
slow?” or “where did time go?”, not “what did the user say?” or “what
did the tool return?”
The complication is that Rust’s `tracing` crate does not make a hard
distinction between “logs” and “trace events.” It gives us one
instrumentation API for logs and trace events (via `tracing::event!`),
and subscribers decide what gets treated as logs, trace events, or both.
Before this change, our OTEL trace layer was effectively attached to the
general tracing stream, which meant turning on `trace_exporter` could
pick up content-rich events that were originally written with logging
(and the `log_exporter`) in mind. That made it too easy for sensitive
data to end up in exported traces by accident.
### Concrete example
In `otel_manager.rs`, this `tracing::event!` call would be exported in
both logs AND traces (as a trace event).
```
pub fn user_prompt(&self, items: &[UserInput]) {
let prompt = items
.iter()
.flat_map(|item| match item {
UserInput::Text { text, .. } => Some(text.as_str()),
_ => None,
})
.collect::<String>();
let prompt_to_log = if self.metadata.log_user_prompts {
prompt.as_str()
} else {
"[REDACTED]"
};
tracing::event!(
tracing::Level::INFO,
event.name = "codex.user_prompt",
event.timestamp = %timestamp(),
// ...
prompt = %prompt_to_log,
);
}
```
Instead of `tracing::event!`, we should now be using `log_event!` and
`trace_event!` instead to more clearly indicate which sink (logs vs.
traces) that event should be exported to.
### What changed
This PR makes the log and trace export distinct instead of treating them
as two sinks for the same data.
On the provider side, OTEL logs and traces now have separate
routing/filtering policy. The log exporter keeps receiving the existing
`codex_otel` events, while trace export is limited to spans and trace
events.
On the event side, `OtelManager` now emits two flavors of telemetry
where needed:
- a log-only event with the current rich payloads
- a tracing-safe event with summaries only
It also has a convenience `log_and_trace_event!` macro for emitting to
both logs and traces when it's safe to do so, as well as log- and
trace-specific fields.
That means prompts, tool args, tool output, account email, MCP metadata,
and similar content stay in the log lane, while traces get the pieces
that are actually useful for performance work: durations, counts, sizes,
status, token counts, tool origin, and normalized error classes.
This preserves current IT/security logging behavior while making it safe
to turn on trace export for employees.
### Full list of things removed from trace export
- raw user prompt text from `codex.user_prompt`
- raw tool arguments and output from `codex.tool_result`
- MCP server metadata from `codex.tool_result` (mcp_server,
mcp_server_origin)
- account identity fields like `user.email` and `user.account_id` from
trace-safe OTEL events
- `host.name` from trace resources
- generic `codex.tool_decision` events from traces
- generic `codex.sse_event` events from traces
- the full ToolCall debug payload from the `handle_tool_call` span
What traces now keep instead is mostly:
- spans
- trace-safe OTEL events
- counts, lengths, durations, status, token counts, and tool origin
summaries
Mark the subagent panel helpers that are only exercised by the later behavior branch as dead-code-tolerant in the standalone foundation branch so codex-tui tests stay warning-free.
- Update `models.json` to surface the new model entry.
- Refresh the TUI model picker snapshot to match the updated catalog
ordering.
---------
Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
## Note-- added plugin mentions via @, but that conflicts with file
mentions
depends and builds upon #13433.
- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.
### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.
## Summary
This updates the `js_repl` prompt and docs to make the image guidance
less confusing.
## What changed
- Clarified that `codex.emitImage(...)` adds one image per call and can
be called multiple times to emit multiple images.
- Reworded the image-encoding guidance to be general `js_repl` advice
instead of `ImageDetailOriginal`-specific behavior.
- Updated the guidance to recommend JPEG at about quality 85 when lossy
compression is acceptable, and PNG when transparency or lossless detail
matters.
- Mirrored the same wording in the public `js_repl` docs.
Rename the branch-local collab inbox payload, constants, helper names,
and prompt text to agent inbox terminology without touching upstream
collaboration mode surfaces.
This keeps the watchdog/runtime behavior intact while removing the
branch-added collab naming that leaked into the stack.
This improves macOS Seatbelt handling for sandboxed tool processes.
## Changes
- Allow dual-stack local binding in proxy-managed sessions, while still
keeping traffic limited to loopback and configured proxy endpoints.
- Replace the old generic unix-socket path rule with explicit AF_UNIX
permissions for socket creation, bind, and outbound connect.
- Keep explicitly approved wrapper sockets connect-only.
Local helper servers are less likely to fail when binding on macOS.
Tools using local unix-socket IPC should work more reliably under the
sandbox.
Full-network sessions, proxy fail-closed behavior, and proxy lifecycle
are unchanged.
Rename the branch-added collab inbox payload, constants, helpers, snapshot, and spawn-mode type to agent terminology while leaving upstream-established collaboration surfaces unchanged.
Regenerate the app-server schema outputs and update the TUI replay snapshot to match the renamed AgentSpawnMode and agent inbox compatibility coverage.
Introduce the base TUI plumbing needed to render collaboration events and subagent state.
Add the foundational app events, history cells, chatwidget handling, and text formatting used by collaboration surfaces, and keep the replay dedupe coverage that prevents duplicate collab inbox rows when compatibility encodings are replayed from snapshots.
Keep forked sessions compact by recording fork references instead of duplicating full parent history.
Repair stale parent rollout paths by resolving the referenced thread id back to the current active or archived rollout location during replay, and materialize fork references before app-server derives thread summaries. Retain the core and app-server regressions that cover archive/unarchive and thread-read behavior.
Preserve internal subagent handoffs as injected response items instead of degrading them into synthetic user messages.
When the destination root thread is idle, prepend an empty user message before the function-call/function-call-output pair so injection starts a valid turn. Keep active-turn behavior and subagent routing unchanged, and retain the regression coverage for the idle-root path.
## Summary
- always pass `--unshare-user` in the Linux bubblewrap argv builders
- stop relying on bubblewrap's auto-userns behavior, which is skipped
for `uid 0`
- update argv expectations in tests and document the explicit user
namespace behavior
The installed Codex binary reproduced the same issue with:
- `codex -c features.use_linux_sandbox_bwrap=true sandbox linux -- true`
- `bwrap: Creating new namespace failed: Operation not permitted`
This happens because Codex asked bubblewrap for mount/pid/network
namespaces without explicitly asking for a user namespace. In a
root-inside-container environment without ambient `CAP_SYS_ADMIN`, that
fails. Adding `--unshare-user` makes bubblewrap create the user
namespace first and then the remaining namespaces succeed.
### Motivation
- Ensure the multitool `codex` wrapper enforces the same `ExecCli`
validation rules as the standalone `codex-exec` binary so `codex exec
--fork <id> resume` is rejected consistently and the related unit test
no longer fails.
### Description
- Call `ExecCli::validate()` in the multitool dispatch path before
forwarding to `codex_exec::run_main` and handle validation errors by
exiting appropriately; change located in `codex-rs/cli/src/main.rs`.
- Update the unit test `exec_fork_conflicts_with_resume_subcommand` to
parse the CLI and assert that `exec.validate()` returns an error (parse
succeeds but validation fails) to reflect where the conflict is
enforced.
### Testing
- Ran formatting with `just fmt` in the workspace and it completed
successfully.
- Executed the failing unit case with:
`PKG_CONFIG_PATH=/tmp/libcap-shim/lib/pkgconfig
NO_PROXY=127.0.0.1,localhost cargo test -p codex-cli
tests::exec_fork_conflicts_with_resume_subcommand`, and the test passed.
- Ran the `codex-exec` and `codex-cli` test suites with the same
environment shim (`PKG_CONFIG_PATH=/tmp/libcap-shim/lib/pkgconfig
NO_PROXY=127.0.0.1,localhost`) and they completed successfully.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69a9e5053e448323965a3c030a90e154)
This PR adds a durable trace linkage for each turn by storing the active
trace ID on the rollout TurnContext record stored in session rollout
files.
Before this change, we propagated trace context at runtime but didn’t
persist a stable per-turn trace key in rollout history. That made
after-the-fact debugging harder (for example, mapping a historical turn
to the corresponding trace in datadog). This sets us up for much easier
debugging in the future.
### What changed
- Added an optional `trace_id` to TurnContextItem (rollout schema).
- Added a small OTEL helper to read the current span trace ID.
- Captured `trace_id` when creating `TurnContext` and included it in
`to_turn_context_item()`.
- Updated tests and fixtures that construct TurnContextItem so
older/no-trace cases still work.
### Why this approach
TurnContext is already the canonical durable per-turn metadata in
rollout. This keeps ownership clean: trace linkage lives with other
persisted turn metadata.
### Motivation
- Prevent untrusted js_repl code from supplying arbitrary external URLs
that the host would forward into model input and cause external fetches
/ data exfiltration. This change narrows the emitImage contract to safe,
self-contained data URLs.
### Description
- Kernel: added `normalizeEmitImageUrl` and enforce that string-valued
`codex.emitImage(...)` inputs and `input_image`/content-item paths only
accept non-empty `data:` URLs; byte-based paths still produce data URLs
as before (`kernel.js`).
- Host: added `validate_emitted_image_url` and check `EmitImage`
requests before creating `FunctionCallOutputContentItem::InputImage`,
returning an error to the kernel if the URL is not a `data:` URL
(`mod.rs`).
- Tests/docs: added a runtime test
`js_repl_emit_image_rejects_non_data_url` to assert rejection of
non-data URLs and updated user-facing docs/instruction text to state
`data URL` support instead of generic direct image URLs (`mod.rs`,
`docs/js_repl.md`, `project_doc.rs`).
### Testing
- Ran `just fmt` in `codex-rs`; it completed successfully.
- Added a runtime test (`cargo test -p codex-core
js_repl_emit_image_rejects_non_data_url`) but executing the test in this
environment failed due to a missing system dependency required by
`codex-linux-sandbox` (the vendored `bubblewrap` build requires
`libcap.pc` via `pkg-config`), so the test could not be run here.
- Attempted a focused `cargo test` invocation with and without default
features; both compile/test attempts were blocked by the same missing
system `libcap` dependency in this environment.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69a7837bce98832d91db92d5f76d6cbe)
## Summary
This changes the Unix shell escalation path for skill-matched
executables to apply a skill's `PermissionProfile` as additive
permissions on top of the existing turn/request sandbox policy.
Previously, skill-matched executables compiled the skill permission
profile into a standalone sandbox policy and executed against that
replacement policy. Now they go through the same
`additional_permissions` merge path used elsewhere in shell sandbox
preparation.
## What Changed
- Changed `skill_escalation_execution()` to return
`EscalationPermissions::PermissionProfile(...)` for non-empty skill
permission profiles.
- Kept empty or missing skill permission profiles on the `TurnDefault`
path.
- Added tests covering the new additive skill-permission behavior.
- Added inline comments in `prepare_escalated_exec()` clarifying the
difference between additive permission merging and fully specified
replacement sandbox policies.
- Removed the now-unused skill permission compiler module after
switching this path away from standalone compiled skill sandbox
policies.
## Testing
- Ran `just fmt` in `codex-rs`
- Ran `cargo test -p codex-core`
`cargo test -p codex-core` still hits an unrelated existing failure:
`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`
## Follow-up
This change intentionally does not merge skill-specific macOS seatbelt
profile extensions through the `additional_permissions` path yet.
Filesystem and network permissions now follow the additive merge path,
but seatbelt extension permissions still need separate handling in a
follow-up PR.
### Motivation
- A recent `--fork` addition used `conflicts_with = "command"` in the
`clap` attribute which references a non-existent id and caused a runtime
panic (`Argument or group 'command' ... does not exist`) during command
construction, breaking CI and tooling that invokes `codex exec`.
- The intent was to disallow `--fork` together with any subcommand; this
must be enforced without using an invalid clap conflict target.
### Description
- Removed the invalid `conflicts_with = "command"` attribute from the
`--fork` flag in `codex-rs/exec/src/cli.rs` and instead added an
explicit `Cli::validate()` method that returns a `clap::Error` with
`ErrorKind::ArgumentConflict` when `fork_session_id` and a subcommand
are both present.
- Wired `Cli::validate()` into the binary entrypoint in
`codex-rs/exec/src/main.rs` so the parsed CLI is validated before
execution and the standard clap error handling is used (`err.exit()` on
validation failure).
- Updated the existing unit test to call `Cli::validate()` after parse
so the test still asserts the intended conflict behavior.
- Ran formatting (`just fmt`) after changes.
### Testing
- Ran `just fmt` in `codex-rs` successfully.
- Ran unit/integration tests for the modified crate with environment
stubs: `PKG_CONFIG_PATH=/tmp/libcap-stub/lib/pkgconfig
NO_PROXY=127.0.0.1,localhost no_proxy=127.0.0.1,localhost cargo test -p
codex-exec` and `cargo test -p codex-exec --test all` which completed
with the `codex-exec` tests passing locally.
- Verified a targeted failing originator test after the change and it
passed when run with the same env overrides.
- Attempted to run `sdk/typescript` tests but the `pnpm`/`corepack`
install failed due to network/proxy download errors (403 from proxy)
when fetching `pnpm`, which is environmental and unrelated to this
change.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69a9d1a216ec8323a648cfa001def5ae)
## Summary
- Change `js_repl` failed-cell persistence so later cells keep prior
bindings plus only the current-cell bindings whose initialization
definitely completed before the throw.
- Preserve initialized lexical bindings across failed cells via
module-namespace readability, including top-level destructuring that
partially succeeds before a later throw.
- Preserve hoisted `var` and `function` bindings only when execution
clearly reached their declaration site, and preserve direct top-level
pre-declaration `var` writes and updates through explicit write-site
markers.
- Preserve top-level `for...in` / `for...of` `var` bindings when the
loop body executes at least once, using a first-iteration guard to avoid
per-iteration bookkeeping overhead.
- Keep prior module state intact across link-time failures and
evaluation failures before the prelude runs, while still allowing failed
cells that already recreated prior bindings to persist updates to those
existing bindings.
- Hide internal commit hooks from user `js_repl` code after the prelude
aliases them, so snippets cannot spoof committed bindings by calling the
raw `import.meta` hooks directly.
- Add focused regression coverage for the supported failed-cell
behaviors and the intentionally unsupported boundaries.
- Update `js_repl` docs and generated instructions to describe the new,
narrower failed-cell persistence model.
## Motivation
We saw `js_repl` drop bindings that had already been initialized
successfully when a later statement in the same cell threw, for example:
const { context: liveContext, session } =
await initializeGoogleSheetsLiveForTab(tab);
// later statement throws
That was surprising in practice because successful earlier work
disappeared from the next cell.
This change makes failed-cell persistence more useful without trying to
model every possible partially executed JavaScript edge case. The
resulting behavior is narrower and easier to reason about:
- prior bindings are always preserved
- lexical bindings persist when their initialization completed before
the throw
- hoisted `var` / `function` bindings persist only when execution
clearly reached their declaration or a supported top-level `var` write
site
- failed cells that already recreated prior bindings can persist writes
to those existing bindings even if they introduce no new bindings
The detailed edge-case matrix stays in `docs/js_repl.md`. The
model-facing `project_doc` guidance is intentionally shorter and focused
on generation-relevant behavior.
## Supported Failed-Cell Behavior
- Prior bindings remain available after a failed cell.
- Initialized lexical bindings remain available after a failed cell.
- Top-level destructuring like `const { a, b } = ...` preserves names
whose initialization completed before a later throw.
- Hoisted `function` bindings persist when execution reached the
declaration statement before the throw.
- Direct top-level pre-declaration `var` writes and updates persist, for
example:
- `x = 1`
- `x += 1`
- `x++`
- short-circuiting logical assignments only persist when the write
branch actually runs
- Non-empty top-level `for...in` / `for...of` `var` loops persist their
loop bindings.
- Failed cells can persist updates to existing carried bindings after
the prelude has run, even when the cell commits no new bindings.
- Link failures and eval failures before the prelude do not poison
`@prev`.
## Intentionally Unsupported Failed-Cell Cases
- Hoisted function reads before the declaration, such as `foo(); ...;
function foo() {}`
- Aliasing or inference-based recovery from reads before declaration
- Nested writes inside already-instrumented assignment RHS expressions
- Destructuring-assignment recovery for hoisted `var`
- Partial `var` destructuring recovery
- Pre-declaration `undefined` reads for hoisted `var`
- Empty top-level `for...in` / `for...of` loop vars
- Nested or scope-sensitive pre-declaration `var` writes outside direct
top-level expression statements
This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.
Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).
This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.
### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.
In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.
In practice an elicitation is often triggered by an MCP tool call, but
not always.
### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.
### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.
## Why
`shell_zsh_fork` already provides stronger guarantees around which
executables receive elevated permissions. To reuse that machinery from
unified exec without pushing Unix-specific escalation details through
generic runtime code, the escalation bootstrap and session lifetime
handling need a cleaner boundary.
That boundary also needs to be safe for long-lived sessions: when an
intercepted shell session is closed or pruned, any in-flight approval
workers and any already-approved escalated child they spawned must be
torn down with the session, and the inherited escalation socket must not
leak into unrelated subprocesses.
## What Changed
- Extracted a reusable `EscalationSession` and
`EscalateServer::start_session(...)` in `shell-escalation` so callers
can get the wrapper/socket env overlay and keep the escalation server
alive without immediately running a one-shot command.
- Documented that `EscalationSession::env()` and
`ShellCommandExecutor::run(...)` exchange only that env overlay, which
callers must merge into their own base shell environment.
- Clarified the prepared-exec helper boundary in `core` by naming the
new helper APIs around `ExecRequest`, while keeping the legacy
`execute_env(...)` entrypoints as thin compatibility wrappers for
existing callers that still use the older naming.
- Added a small post-spawn hook on the prepared execution path so the
parent copy of the inheritable escalation socket is closed immediately
after both the existing one-shot shell-command spawn and the
unified-exec spawn.
- Made session teardown explicit with session-scoped cancellation:
dropping an `EscalationSession` or canceling its parent request now
stops intercept workers, and the server-spawned escalated child uses
`kill_on_drop(true)` so teardown cannot orphan an already-approved
child.
- Added `UnifiedExecBackendConfig` plumbing through `ToolsConfig`, a
`shell::zsh_fork_backend` facade, and an opaque unified-exec
spawn-lifecycle hook so unified exec can prepare a wrapped `zsh -c/-lc`
request without storing `EscalationSession` directly in generic
process/runtime code.
- Kept the existing `shell_command` zsh-fork behavior intact on top of
the new bootstrap path. Tool selection is unchanged in this PR: when
`shell_zsh_fork` is enabled, `ShellCommand` still wins over
`exec_command`.
## Verification
- `cargo test -p codex-shell-escalation`
- includes coverage for `start_session_exposes_wrapper_env_overlay`
- includes coverage for `exec_closes_parent_socket_after_shell_spawn`
- includes coverage for
`dropping_session_aborts_intercept_workers_and_kills_spawned_child`
- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-core --test all
shell_zsh_fork_prompts_for_skill_script_execution`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13392).
* #13432
* __->__ #13392
- add a speed row to the startup/session header under the model row
- render the speed row with the same styling pattern as the model row,
using /fast to change
- show only Fast or Standard to users and update the affected snapshots
---------
Co-authored-by: Codex <noreply@openai.com>
add `web_search_tool_type` on model_info that can be populated from
backend. will be used to filter which models can use `web_search` with
images and which cant.
added small unit test.
- lower `submission_dispatch` span logging to debug for realtime audio
submissions only
- keep other submission spans at info and add a targeted test for the
level selection
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add `js_repl` support for dynamic imports of relative and absolute
local ESM `.js` / `.mjs` files
- keep bare package imports on the native Node path and resolved from
REPL-global search roots (`CODEX_JS_REPL_NODE_MODULE_DIRS`, then `cwd`),
even when they originate from imported local files
- restrict static imports inside imported local files to other local
relative/absolute `.js` / `.mjs` files, and surface a clear error for
unsupported top-level static imports in the REPL cell
- run imported local files inside the REPL VM context so they can access
`codex.tmpDir`, `codex.tool`, captured `console`, and Node-like
`import.meta` helpers
- reload local files between execs so later `await import("./file.js")`
calls pick up edits and fixed failures, while preserving package/builtin
caching and persistent top-level REPL bindings
- make `import.meta.resolve()` self-consistent by allowing the returned
`file://...` URLs to round-trip through `await import(...)`
- update both public and injected `js_repl` docs to clarify the narrowed
contract, including global bare-import resolution behavior for local
absolute files
## Testing
- `cargo test -p codex-core js_repl_`
- built codex binary and verified behavior
---------
Co-authored-by: Codex <noreply@openai.com>
### Motivation
- Fix a type mismatch that caused `codex-exec` to fail compiling when
resolving a fork/resume target path because underlying `std::io::Error`
results were returned from helpers while the caller expected
`anyhow::Error`.
### Description
- Update `resolve_thread_path_by_id_or_name` in
`codex-rs/exec/src/lib.rs` to convert the `std::io::Error` returned by
`find_thread_path_by_id_str` and `find_thread_path_by_name_str` into
`anyhow::Error` using `.map_err(anyhow::Error::from)`, so the function
returns `anyhow::Result<Option<PathBuf>>` consistently.
### Testing
- Ran `just fmt` in `codex-rs`, which completed successfully.
- Attempted `cargo test -p codex-exec`, but the build could not complete
in this environment because `codex-linux-sandbox` requires the system
`libcap` library (pkg-config could not find `libcap.pc`), so the test
suite did not finish.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69a92230d0c08323b0417fbae810e6e0)
- rotate the paid-plan startup promo slot 50/50 between the existing
Codex App promo and a new Fast mode promo
- keep the Fast mode call to action platform-neutral so Windows can show
the same tip
- add a focused unit test to ensure the paid promo pool actually rotates
---------
Co-authored-by: Codex <noreply@openai.com>
### first half of changes, followed by #13510
Track plugin capabilities as derived summaries on `PluginLoadOutcome`
for enabled plugins with at least one skill/app/mcp.
Also add `Plugins` section to `user_instructions` injected on session
start. These introduce the plugins concept and list enabled plugins, but
do NOT currently include paths to enabled plugins or details on what
apps/mcps the plugins contain (current plan is to inject this on
@-mention). that can be adjusted in a follow up and based on evals.
### tests
Added/updated tests, confirmed locally that new `Plugins` section +
currently enabled plugins show up in `user_instructions`.
## Summary
- ensure `thread.resume` reuses the stored `gitInfo` instead of
rebuilding it from the live working tree
- persist and apply thread git metadata through the resume flow and add
a regression test covering branch mismatch cases
## Testing
- Not run (not requested)
Bumps [serde_with](https://github.com/jonasbb/serde_with) from 3.16.1 to
3.17.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jonasbb/serde_with/releases">serde_with's
releases</a>.</em></p>
<blockquote>
<h2>serde_with v3.17.0</h2>
<h3>Added</h3>
<ul>
<li>Support <code>OneOrMany</code> with <code>smallvec</code> v1 (<a
href="https://redirect.github.com/jonasbb/serde_with/issues/920">#920</a>,
<a
href="https://redirect.github.com/jonasbb/serde_with/issues/922">#922</a>)</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Switch to <code>yaml_serde</code> for a maintained yaml dependency
by <a href="https://github.com/kazan417"><code>@kazan417</code></a> (<a
href="https://redirect.github.com/jonasbb/serde_with/issues/921">#921</a>)</li>
<li>Bump MSRV to 1.82, since that is required for
<code>yaml_serde</code> dev-dependency.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4031878a4c"><code>4031878</code></a>
Bump version to v3.17.0 (<a
href="https://redirect.github.com/jonasbb/serde_with/issues/924">#924</a>)</li>
<li><a
href="204ae56f8b"><code>204ae56</code></a>
Bump version to v3.17.0</li>
<li><a
href="7812b5a006"><code>7812b5a</code></a>
serde_yaml 0.9 to yaml_serde 0.10 (<a
href="https://redirect.github.com/jonasbb/serde_with/issues/921">#921</a>)</li>
<li><a
href="614bd8950b"><code>614bd89</code></a>
Bump MSRV to 1.82 as required by yaml_serde</li>
<li><a
href="518d0ed787"><code>518d0ed</code></a>
Suppress RUSTSEC-2026-0009 since we don't have untrusted time input in
tests ...</li>
<li><a
href="a6579a8984"><code>a6579a8</code></a>
Suppress RUSTSEC-2026-0009 since we don't have untrusted time input in
tests</li>
<li><a
href="9d4d0696e6"><code>9d4d069</code></a>
Implement OneOrMany for smallvec_1::SmallVec (<a
href="https://redirect.github.com/jonasbb/serde_with/issues/922">#922</a>)</li>
<li><a
href="fc78243e8c"><code>fc78243</code></a>
Add changelog</li>
<li><a
href="2b8c30bf67"><code>2b8c30b</code></a>
Implement OneOrMany for smallvec_1::SmallVec</li>
<li><a
href="2d9b9a1815"><code>2d9b9a1</code></a>
Carg.lock update</li>
<li>Additional commits viewable in <a
href="https://github.com/jonasbb/serde_with/compare/v3.16.1...v3.17.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Eric Traut <etraut@openai.com>
Bumps [strum_macros](https://github.com/Peternator7/strum) from 0.27.2
to 0.28.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/Peternator7/strum/blob/master/CHANGELOG.md">strum_macros's
changelog</a>.</em></p>
<blockquote>
<h2>0.28.0</h2>
<ul>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/461">#461</a>:
Allow any kind of passthrough attributes on
<code>EnumDiscriminants</code>.</p>
<ul>
<li>Previously only list-style attributes (e.g.
<code>#[strum_discriminants(derive(...))]</code>) were supported. Now
path-only
(e.g. <code>#[strum_discriminants(non_exhaustive)]</code>) and
name/value (e.g. <code>#[strum_discriminants(doc =
"foo")]</code>)
attributes are also supported.</li>
</ul>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/462">#462</a>:
Add missing <code>#[automatically_derived]</code> to generated impls not
covered by <a
href="https://redirect.github.com/Peternator7/strum/pull/444">#444</a>.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/466">#466</a>:
Bump MSRV to 1.71, required to keep up with updated <code>syn</code> and
<code>windows-sys</code> dependencies. This is a breaking change if
you're on an old version of rust.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/469">#469</a>:
Use absolute paths in generated proc macro code to avoid
potential name conflicts.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/465">#465</a>:
Upgrade <code>phf</code> dependency to v0.13.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/473">#473</a>:
Fix <code>cargo fmt</code> / <code>clippy</code> issues and add GitHub
Actions CI.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/477">#477</a>:
<code>strum::ParseError</code> now implements
<code>core::fmt::Display</code> instead
<code>std::fmt::Display</code> to make it <code>#[no_std]</code>
compatible. Note the <code>Error</code> trait wasn't available in core
until <code>1.81</code>
so <code>strum::ParseError</code> still only implements that in std.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/476">#476</a>:
<strong>Breaking Change</strong> - <code>EnumString</code> now
implements <code>From<&str></code>
(infallible) instead of <code>TryFrom<&str></code> when the
enum has a <code>#[strum(default)]</code> variant. This more accurately
reflects that parsing cannot fail in that case. If you need the old
<code>TryFrom</code> behavior, you can opt back in using
<code>parse_error_ty</code> and <code>parse_error_fn</code>:</p>
<pre lang="rust"><code>#[derive(EnumString)]
#[strum(parse_error_ty = strum::ParseError, parse_error_fn =
make_error)]
pub enum Color {
Red,
#[strum(default)]
Other(String),
}
<p>fn make_error(x: &str) -> strum::ParseError {
strum::ParseError::VariantNotFound
}
</code></pre></p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/431">#431</a>:
Fix bug where <code>EnumString</code> ignored the
<code>parse_err_ty</code>
attribute when the enum had a <code>#[strum(default)]</code>
variant.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/Peternator7/strum/pull/474">#474</a>:
EnumDiscriminants will now copy <code>default</code> over from the
original enum to the Discriminant enum.</p>
<pre lang="rust"><code>#[derive(Debug, Default, EnumDiscriminants)]
#[strum_discriminants(derive(Default))] // <- Remove this in 0.28.
enum MyEnum {
#[default] // <- Will be the #[default] on the MyEnumDiscriminant
#[strum_discriminants(default)] // <- Remove this in 0.28
Variant0,
Variant1 { a: NonDefault },
}
</code></pre>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7376771128"><code>7376771</code></a>
Peternator7/0.28 (<a
href="https://redirect.github.com/Peternator7/strum/issues/475">#475</a>)</li>
<li><a
href="26e63cd964"><code>26e63cd</code></a>
Display exists in core (<a
href="https://redirect.github.com/Peternator7/strum/issues/477">#477</a>)</li>
<li><a
href="9334c728ee"><code>9334c72</code></a>
Make TryFrom and FromStr infallible if there's a default (<a
href="https://redirect.github.com/Peternator7/strum/issues/476">#476</a>)</li>
<li><a
href="0ccbbf823c"><code>0ccbbf8</code></a>
Honor parse_err_ty attribute when the enum has a default variant (<a
href="https://redirect.github.com/Peternator7/strum/issues/431">#431</a>)</li>
<li><a
href="2c9e5a9259"><code>2c9e5a9</code></a>
Automatically add Default implementation to EnumDiscriminant if it
exists on ...</li>
<li><a
href="e241243e48"><code>e241243</code></a>
Fix existing cargo fmt + clippy issues and add GH actions (<a
href="https://redirect.github.com/Peternator7/strum/issues/473">#473</a>)</li>
<li><a
href="639b67fefd"><code>639b67f</code></a>
feat: allow any kind of passthrough attributes on
<code>EnumDiscriminants</code> (<a
href="https://redirect.github.com/Peternator7/strum/issues/461">#461</a>)</li>
<li><a
href="0ea1e2d0fd"><code>0ea1e2d</code></a>
docs: Fix typo (<a
href="https://redirect.github.com/Peternator7/strum/issues/463">#463</a>)</li>
<li><a
href="36c051b910"><code>36c051b</code></a>
Upgrade <code>phf</code> to v0.13 (<a
href="https://redirect.github.com/Peternator7/strum/issues/465">#465</a>)</li>
<li><a
href="9328b38617"><code>9328b38</code></a>
Use absolute paths in proc macro (<a
href="https://redirect.github.com/Peternator7/strum/issues/469">#469</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/Peternator7/strum/compare/v0.27.2...v0.28.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Eric Traut <etraut@openai.com>
Bumps
[actions/download-artifact](https://github.com/actions/download-artifact)
from 7 to 8.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/download-artifact/releases">actions/download-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v8.0.0</h2>
<h2>v8 - What's new</h2>
<h3>Direct downloads</h3>
<p>To support direct uploads in <code>actions/upload-artifact</code>,
the action will no longer attempt to unzip all downloaded files.
Instead, the action checks the <code>Content-Type</code> header ahead of
unzipping and skips non-zipped files. Callers wishing to download a
zipped file as-is can also set the new <code>skip-decompress</code>
parameter to <code>false</code>.</p>
<h3>Enforced checks (breaking)</h3>
<p>A previous release introduced digest checks on the download. If a
download hash didn't match the expected hash from the server, the action
would log a warning. Callers can now configure the behavior on mismatch
with the <code>digest-mismatch</code> parameter. To be secure by
default, we are now defaulting the behavior to <code>error</code> which
will fail the workflow run.</p>
<h3>ESM</h3>
<p>To support new versions of the @actions/* packages, we've upgraded
the package to ESM.</p>
<h2>What's Changed</h2>
<ul>
<li>Don't attempt to un-zip non-zipped downloads by <a
href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in
<a
href="https://redirect.github.com/actions/download-artifact/pull/460">actions/download-artifact#460</a></li>
<li>Add a setting to specify what to do on hash mismatch and default it
to <code>error</code> by <a
href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in
<a
href="https://redirect.github.com/actions/download-artifact/pull/461">actions/download-artifact#461</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/download-artifact/compare/v7...v8.0.0">https://github.com/actions/download-artifact/compare/v7...v8.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="70fc10c6e5"><code>70fc10c</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/download-artifact/issues/461">#461</a>
from actions/danwkennedy/digest-mismatch-behavior</li>
<li><a
href="f258da9a50"><code>f258da9</code></a>
Add change docs</li>
<li><a
href="ccc058e5fb"><code>ccc058e</code></a>
Fix linting issues</li>
<li><a
href="bd7976ba57"><code>bd7976b</code></a>
Add a setting to specify what to do on hash mismatch and default it to
<code>error</code></li>
<li><a
href="ac21fcf45e"><code>ac21fcf</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/download-artifact/issues/460">#460</a>
from actions/danwkennedy/download-no-unzip</li>
<li><a
href="15999bff51"><code>15999bf</code></a>
Add note about package bumps</li>
<li><a
href="974686ed50"><code>974686e</code></a>
Bump the version to <code>v8</code> and add release notes</li>
<li><a
href="fbe48b1d27"><code>fbe48b1</code></a>
Update test names to make it clearer what they do</li>
<li><a
href="96bf374a61"><code>96bf374</code></a>
One more test fix</li>
<li><a
href="b8c4819ef5"><code>b8c4819</code></a>
Fix skip decompress test</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/download-artifact/compare/v7...v8">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Improve observability of realtime conversation event handling by logging
non-audio events with payload details in the event loop, while skipping
audio-out events to reduce noise.
Support marketplace.json that points to a local file, with
```
"source":
{
"source": "local",
"path": "./plugin-1"
},
```
Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
Addresses #13478
Summary
- Add two new scopes for `tui.notifications` config: `plan-mode-prompt`
and `user-input-requested`.
- Add Plan Mode prompt and user-input-requested notifications to the TUI
so these events surface consistently outside of plan mode
- Add helpers and tests to ensure the new notification types publish the
right titles, summaries, and type tags for filtering
- Add prioritization mechanism to fix an existing bug where one
notification event could arbitrarily overwrite others
Testing
- Manually tested plan mode to ensure that notification appeared
### Overview
This PR:
- Updates `app-server-test-client` to load OTEL settings from
`$CODEX_HOME/config.toml` and initializes its own OTEL provider.
- Add real client root spans to app-server test client traces.
This updates `codex-app-server-test-client` so its Datadog traces
reflect the full client-driven flow instead of a set of server spans
stitched together under a synthetic parent.
Before this change, the test client generated a fake `traceparent` once
and reused it for every JSON-RPC request. That kept the requests in one
trace, but there was no real client span at the top, so Datadog ended up
showing the sequence in a slightly misleading way, where all RPCs were
anchored under `initialize`.
Now the test client:
- loads OTEL settings from the normal Codex config path, including
`$CODEX_HOME/config.toml` and existing --config overrides
- initializes tracing the same way other Codex binaries do when trace
export is enabled
- creates a real client root span for each scripted command
- creates per-request client spans for JSON-RPC methods like
`initialize`, `thread/start`, and `turn/start`
- injects W3C trace context from the current client span into
request.trace instead of reusing a fabricated carrier
This gives us a cleaner trace shape in Datadog:
- one trace URL for the whole scripted flow
- a visible client root span
- proper client/server parent-child relationships for each app-server
request
## Problem
The `ansi`, `base16`, and `base16-256` syntax themes are designed to
emit ANSI palette colors so that highlighted code respects the user's
terminal color scheme. Syntect encodes this intent in the alpha channel
of its `Color` struct — a convention shared with `bat` — but
`convert_style` was ignoring it entirely, treating every foreground
color as raw RGB. This caused ANSI-family themes to produce hard-coded
RGB values (e.g. `Rgb(0x02, 0, 0)` instead of `Green`), defeating their
purpose and rendering them as near-invisible dark colors on most
terminals.
Reported in #12890.
## Mental model
Syntect themes use a compact encoding in their `Color` struct:
| `alpha` | Meaning of `r` | Mapped to |
|---------|----------------|-----------|
| `0x00` | ANSI palette index (0–255) | `RtColor::Black`…`Gray` for 0–7,
`Indexed(n)` for 8–255 |
| `0x01` | Unused (sentinel) | `None` — inherit terminal default fg/bg |
| `0xFF` | True RGB red channel | `RtColor::Rgb(r, g, b)` |
| other | Unexpected | `RtColor::Rgb(r, g, b)` (silent fallback) |
This encoding is a bat convention that three bundled themes rely on. The
new `convert_syntect_color` function decodes it; `ansi_palette_color`
maps indices 0–7 to ratatui's named ANSI variants.
| macOS - Dark | macOS - Light | Windows - ansi | Windows - base16 |
|---|---|---|---|
| <img width="1064" height="1205" alt="macos-dark"
src="https://github.com/user-attachments/assets/f03d92fb-b44b-4939-b2b9-503fde133811"
/> | <img width="1073" height="1227" alt="macos-light"
src="https://github.com/user-attachments/assets/2ecb2089-73b5-4676-bed8-e4e6794250b4"
/> |

|

|
## Non-goals
- Background color decoding — we intentionally skip backgrounds to
preserve the terminal's own background. The decoder supports it, but
`convert_style` does not apply it.
- Italic/underline changes — those remain suppressed as before.
- Custom `.tmTheme` support for ANSI encoding — only the bundled themes
use this convention.
## Tradeoffs
- The alpha-channel encoding is an undocumented bat/syntect convention,
not a formal spec. We match bat's behavior exactly, trading formality
for ecosystem compatibility.
- Indices 0–7 are mapped to ratatui's named variants (`Black`, `Red`, …,
`Gray`) rather than `Indexed(0)`…`Indexed(7)`. This lets terminals apply
bold/bright semantics to named colors, which is the expected behavior
for ANSI themes, but means the two representations are not perfectly
round-trippable.
## Architecture
All changes are in `codex-rs/tui/src/render/highlight.rs`, within the
style-conversion layer between syntect and ratatui:
```
syntect::highlighting::Color
└─ convert_syntect_color(color) [NEW — alpha-dispatch]
├─ a=0x00 → ansi_palette_color() [NEW — index→named/indexed]
├─ a=0x01 → None (terminal default)
├─ a=0xFF → Rgb(r,g,b) (standard opaque path)
└─ other → Rgb(r,g,b) (silent fallback)
```
`convert_style` delegates foreground mapping to `convert_syntect_color`
instead of inlining the `Rgb(r,g,b)` conversion. The core highlighter is
refactored into `highlight_to_line_spans_with_theme` (accepts an
explicit theme reference) so tests can highlight against specific themes
without mutating process-global state.
### ANSI-family theme contract
The ANSI-family themes (`ansi`, `base16`, `base16-256`) rely on upstream
alpha-channel encoding from two_face/syntect. We intentionally do
**not** validate this contract at runtime — if the upstream format
changes, the `ansi_themes_use_only_ansi_palette_colors` test catches it
at build time, long before it reaches users. A runtime warning would be
unactionable noise.
### Warning copy cleanup
User-facing warning messages were rewritten for clarity:
- Removed internal jargon ("alpha-encoded ANSI color markers", "RGB
fallback semantics", "persisted override config")
- Dropped "syntax" prefix from "syntax theme" — users just think "theme"
- Downgraded developer-only diagnostics (duplicate override, resolve
fallback) from `warn` to `debug`
## Observability
- The `ansi_themes_use_only_ansi_palette_colors` test enforces the
ANSI-family contract at build time.
- The snapshot test provides a regression tripwire for palette color
output.
- User-facing warnings are limited to actionable issues: unknown theme
names and invalid custom `.tmTheme` files.
## Tests
- **Unit tests for each alpha branch:** `alpha=0x00` with low index
(named color), `alpha=0x00` with high index (`Indexed`), `alpha=0x01`
(terminal default), unexpected alpha (falls back to RGB), ANSI white →
Gray mapping.
- **Integration test:**
`ansi_family_themes_use_terminal_palette_colors_not_rgb` — highlights a
Rust snippet with each ANSI-family theme and asserts zero `Rgb`
foreground colors appear.
- **Snapshot test:** `ansi_family_foreground_palette_snapshot` — records
the exact set of unique foreground colors each ANSI-family theme
produces, guarding against regressions.
- **Warning validation tests:** verify user-facing warnings for missing
custom themes, invalid `.tmTheme` files, and bundled theme resolution.
## Test plan
- [ ] `cargo test -p codex-tui` passes all new and existing tests
- [ ] Select `ansi`, `base16`, or `base16-256` theme and verify code
blocks render with terminal palette colors (not near-black RGB)
- [ ] Select a standard RGB theme (e.g. `dracula`) and verify no
regression in color output
## Summary
- update the /fast slash command description to mention fastest
inference
- mention the 3X plan usage tradeoff in the help copy
## Testing
- cargo test -p codex-tui slash_command (currently blocked by an
unrelated latest-main codex-tui compile error in chatwidget.rs:
refresh_queued_user_messages missing)
---------
Co-authored-by: Codex <noreply@openai.com>
This is PR 3 of the app-server tracing rollout.
PRs https://github.com/openai/codex/pull/13285 and
https://github.com/openai/codex/pull/13368 gave us inbound request spans
in app-server and propagated trace context through Submission. This
change finishes the next piece in core: when a request actually starts a
turn, we now create a core-owned long-lived span that stays open for the
real lifetime of the turn.
What changed:
- `Session::spawn_task` can now optionally create a long-lived turn span
and run the spawned task inside it
- `turn/start` uses that path, so normal turn execution stays under a
single core-owned span after the async handoff
- `review/start` uses the same pattern
- added a unit test that verifies the spawned turn task inherits the
submission dispatch trace ancestry
**Why**
The app-server request span is intentionally short-lived. Once work
crosses into core, we still want one span that covers the actual
execution window until completion or interruption. This keeps that
ownership where it belongs: in the layer that owns the runtime
lifecycle.
The electron app doesn't start up the app-server in a particular
workspace directory.
So sandbox setup happens in the app-installed directory instead of the
project workspace.
This allows the app do specify the workspace cwd so that the sandbox
setup actually sets up the ACLs instead of exiting fast and then having
the first shell command be slow.
Validated login + refresh flows. Removing scopes from the refresh
request until we have upgrade flow in place. Confirmed that tokens
refresh with existing scopes.
@@ -51,6 +51,7 @@ You can enable notifications by configuring a script that is run whenever the ag
### `codex exec` to run Codex programmatically/non-interactively
To run Codex non-interactively, run `codex exec PROMPT` (you can also pass the prompt via `stdin`) and Codex will work on your task until it decides that it is done and exits. Output is printed to the terminal directly. You can set the `RUST_LOG` environment variable to see more about what's going on.
Use `codex exec --fork <SESSION_ID> PROMPT` to fork an existing session without launching the interactive picker/UI.
Use `codex exec --ephemeral ...` to run without persisting session rollout files to disk.
"description":"Explicit mention selected by the user (name + app://connectorid).",
"description":"Explicit structured mention selected by the user.\n\n`path` identifies the exact mention target, for example `app://<connector-id>` or `plugin://<plugin-name>@<marketplace-name>`.",
"properties":{
"name":{
"type":"string"
@@ -7092,6 +7268,66 @@
"title":"WebSearchEndEventMsg",
"type":"object"
},
{
"properties":{
"call_id":{
"type":"string"
},
"type":{
"enum":[
"image_generation_begin"
],
"title":"ImageGenerationBeginEventMsgType",
"type":"string"
}
},
"required":[
"call_id",
"type"
],
"title":"ImageGenerationBeginEventMsg",
"type":"object"
},
{
"properties":{
"call_id":{
"type":"string"
},
"result":{
"type":"string"
},
"revised_prompt":{
"type":[
"string",
"null"
]
},
"saved_path":{
"type":[
"string",
"null"
]
},
"status":{
"type":"string"
},
"type":{
"enum":[
"image_generation_end"
],
"title":"ImageGenerationEndEventMsgType",
"type":"string"
}
},
"required":[
"call_id",
"result",
"status",
"type"
],
"title":"ImageGenerationEndEventMsg",
"type":"object"
},
{
"description":"Notification that the server is about to execute a command.",
"properties":{
@@ -7614,12 +7850,19 @@
"id":{
"$ref":"#/definitions/RequestId"
},
"message":{
"type":"string"
"request":{
"$ref":"#/definitions/ElicitationRequest"
},
"server_name":{
"type":"string"
},
"turn_id":{
"description":"Turn ID that this elicitation belongs to, when known.",
"description":"Typed form schema for MCP `elicitation/create` requests.\n\nThis matches the `requestedSchema` shape from the MCP 2025-11-25 `ElicitRequestFormParams` schema.",
"description":"Active Codex turn when this elicitation was observed, if app-server could correlate one.\n\nThis is nullable because MCP models elicitation as a standalone server-to-client request identified by the MCP server request id. It may be triggered during a turn, but turn context is app-server correlation rather than part of the protocol identity of the elicitation itself.",
"description":"Optional client metadata for form-mode action handling."
},
"action":{
"$ref":"#/definitions/McpServerElicitationAction"
},
"content":{
"description":"Structured user input for accepted elicitations, mirroring RMCP `CreateElicitationResult`.\n\nThis is nullable because decline/cancel responses have no content."
"description":"Typed form schema for MCP `elicitation/create` requests.\n\nThis matches the `requestedSchema` shape from the MCP 2025-11-25 `ElicitRequestFormParams` schema.",
"description":"Active Codex turn when this elicitation was observed, if app-server could correlate one.\n\nThis is nullable because MCP models elicitation as a standalone server-to-client request identified by the MCP server request id. It may be triggered during a turn, but turn context is app-server correlation rather than part of the protocol identity of the elicitation itself.",
"type":[
"string",
"null"
]
}
},
"required":[
"serverName",
"threadId"
],
"type":"object"
},
"NetworkApprovalContext":{
"properties":{
"host":{
@@ -981,6 +1584,31 @@
"title":"Item/tool/requestUserInputRequest",
"type":"object"
},
{
"description":"Request input for an MCP server elicitation.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"description":"A path that is guaranteed to be absolute and normalized (though it is not guaranteed to be canonicalized or exist on the filesystem).\n\nIMPORTANT: When deserializing an `AbsolutePathBuf`, a base path must be set using [AbsolutePathBufGuard::new]. If no base path is set, the deserialization will fail unless the path being deserialized is already absolute.",
"type":"string"
}
},
"properties":{
"cwds":{
"description":"Optional working directories used to discover repo marketplaces. When omitted, only home-scoped marketplaces are considered.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.