Commit Graph

2602 Commits

Author SHA1 Message Date
Eric Traut
6f87eb0479 Hide unsupported MCP bearer_token from config schema (#19294)
## Summary

Fixes #19275.

Codex runtime rejects inline MCP `bearer_token` config entries and asks
users to configure `bearer_token_env_var` instead, but the generated
config schema still advertised `mcp_servers.<name>.bearer_token` as a
supported field. That made editor/schema validation disagree with
runtime validation.

This keeps `bearer_token` in `RawMcpServerConfig` so Codex can continue
producing the targeted runtime error for recent or existing configs, but
skips the field during schemars generation. The checked-in
`core/config.schema.json` fixture now exposes `bearer_token_env_var`
without exposing unsupported inline `bearer_token`.

## Verification

- Added `config_schema_hides_unsupported_inline_mcp_bearer_token` to
assert the generated schema hides `bearer_token` while preserving
`bearer_token_env_var`.
- Ran `cargo test -p codex-config`.
- Ran `cargo test -p codex-core config_schema`.
2026-04-24 00:17:43 -07:00
sayan-oai
e083b6c757 chore: apply truncation policy to unified_exec (#19247)
we were not respecting turn's `truncation_policy` to clamp output tokens
for `unified_exec` and `write_stdin`.

this meant truncation was only being applied by `ContextManager` before
the output was stored in-memory (so it _was_ being truncated from
model-visible context), but the full output was persisted to rollout on
disk.

now we respect that `truncation_policy` and `ContextManager`-level
truncation remains a backup.

### Tests
added tests, tested locally.
2026-04-24 00:17:39 -07:00
Eric Traut
ac8c9fc49c Reject unsupported js_repl image MIME types (#19292)
## Summary

`codex.emitImage` accepted arbitrary image MIME types for byte payloads
and data URLs. That allowed a value like `image/rgba` to be wrapped as
an `input_image`, even though it is not a supported encoded image
format, so the invalid image could reach the model-input path and
trigger output sanitization.

This results in a panic in debug builds because the output sanitization
is meant as a final safety net, not a primary means of rejecting invalid
image types. I've hit this case multiple times when executing certain
long-running tasks.

This PR rejects unsupported image MIME types before they are emitted
from `js_repl`.

## Changes

- Validate `codex.emitImage({ bytes, mimeType })` in the JS kernel so
only encoded PNG, JPEG, WebP, or GIF payloads are accepted.
- Apply the same MIME allowlist to direct image data URLs, including the
Rust host-side validation path.
- Clarify the JS REPL instructions so agents know byte payloads must
already be encoded as PNG/JPEG/WebP/GIF.
2026-04-24 00:14:51 -07:00
Eric Traut
d87d918716 Resolve relative agent role config paths from layers (#19261)
Fixes #19257.

## Summary

Agent roles declared in config layers can set `config_file` to a
relative path, but deserializing the layer-local `[agents.*]` table
happened without an `AbsolutePathBuf` base path. That caused configs
like `config_file = "agents/my-role.toml"` to fail with `AbsolutePathBuf
deserialized without a base path`.

This updates agent role layer loading to deserialize `[agents.*]` while
the layer config folder is active as the path base, matching the
behavior documented for `AgentRoleToml.config_file`. It also adds
coverage for a user config layer with a relative agent role
`config_file`.
2026-04-23 23:23:11 -07:00
Michael Bolin
4816b89204 permissions: make profiles represent enforcement (#19231)
## Why

`PermissionProfile` is becoming the canonical permissions abstraction,
but the old shape only carried optional filesystem and network fields.
It could describe allowed access, but not who is responsible for
enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
when profiles were exported, cached, or round-tripped through app-server
APIs.

The important model change is that active permissions are now a disjoint
union over the enforcement mode. Conceptually:

```rust
pub enum PermissionProfile {
    Managed {
        file_system: FileSystemSandboxPolicy,
        network: NetworkSandboxPolicy,
    },
    Disabled,
    External {
        network: NetworkSandboxPolicy,
    },
}
```

This distinction matters because `Disabled` means Codex should apply no
outer sandbox at all, while `External` means filesystem isolation is
owned by an outside caller. Those are not equivalent to a broad managed
sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
inner sandbox may require the outer Codex layer to use no sandbox rather
than a permissive one.

## How Existing Modeling Maps

Legacy `SandboxPolicy` remains a boundary projection, but it now maps
into the higher-fidelity profile model:

- `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
with restricted filesystem entries plus the corresponding network
policy.
- `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
the “no outer sandbox” intent instead of treating it as a lax managed
sandbox.
- `ExternalSandbox { network_access }` maps to
`PermissionProfile::External { network }`, preserving external
filesystem enforcement while still carrying the active network policy.
- Split runtime policies that legacy `SandboxPolicy` cannot faithfully
express, such as managed unrestricted filesystem plus restricted
network, stay `Managed` instead of being collapsed into
`ExternalSandbox`.
- Per-command/session/turn grants remain partial overlays via
`AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
complete active runtime permissions.

## What Changed

- Change active `PermissionProfile` into a tagged union: `managed`,
`disabled`, and `external`.
- Keep partial permission grants separate with
`AdditionalPermissionProfile` for command/session/turn overlays.
- Represent managed filesystem permissions as either `restricted`
entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
present.
- Preserve old rollout compatibility by accepting the pre-tagged `{
network, file_system }` profile shape during deserialization.
- Preserve fidelity for important edge cases: `DangerFullAccess`
round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
and managed unrestricted filesystem + restricted network stays managed
instead of being mistaken for external enforcement.
- Preserve configured deny-read entries and bounded glob scan depth when
full profiles are projected back into runtime policies, including
unrestricted replacements that now become `:root = write` plus deny
entries.
- Regenerate the experimental app-server v2 JSON/TypeScript schema and
update the `command/exec` README example for the tagged
`permissionProfile` shape.

## Compatibility

Legacy `SandboxPolicy` remains available at config/API boundaries as the
compatibility projection. Existing rollout lines with the old
`PermissionProfile` shape continue to load. The app-server
`permissionProfile` field is experimental, so its v2 wire shape is
intentionally updated to match the higher-fidelity model.

## Verification

- `just write-app-server-schema`
- `cargo check --tests`
- `cargo test -p codex-protocol permission_profile`
- `cargo test -p codex-protocol
preserving_deny_entries_keeps_unrestricted_policy_enforceable`
- `cargo test -p codex-app-server-protocol
permission_profile_file_system_permissions`
- `cargo test -p codex-app-server-protocol serialize_client_response`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox`
- `just fix`
- `just fix -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
2026-04-23 23:02:18 -07:00
Celia Chen
e8d8080818 feat: let model providers own model discovery (#18950)
## Why

`codex-models-manager` had grown to own provider-specific concerns:
constructing OpenAI-compatible `/models` requests, resolving provider
auth, emitting request telemetry, and deciding how provider catalogs
should be sourced. That made the manager harder to reuse for providers
whose model catalog is not fetched from the OpenAI `/models` endpoint,
such as Amazon Bedrock.

This change moves provider-specific model discovery behind
provider-owned implementations, so the models manager can focus on
refresh policy, cache behavior, picker ordering, and model metadata
merging.

## What Changed

- Introduced a `ModelsManager` trait with separate `OpenAiModelsManager`
and `StaticModelsManager` implementations.
- Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives
outside `codex-models-manager`.
- Moved `/models` request construction, provider auth resolution,
timeout handling, and request telemetry into `codex-model-provider` via
`OpenAiModelsEndpoint`.
- Added provider-owned `models_manager(...)` construction so configured
OpenAI-compatible providers use `OpenAiModelsManager`, while
static/catalog-backed providers can return `StaticModelsManager`.
- Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock
model IDs.
- Updated core/session/thread manager code and tests to depend on
`Arc<dyn ModelsManager>`.
- Moved offline model test helpers into
`codex_models_manager::test_support`.
## Metadata References

The Bedrock catalog metadata is based on the official Amazon Bedrock
OpenAI model documentation:

- [Amazon Bedrock OpenAI
models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
lists the Bedrock model IDs, text input/output modalities, and `128,000`
token context window for `gpt-oss-20b` and `gpt-oss-120b`.
- [Amazon Bedrock `gpt-oss-120b` model
card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html)
lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the
`bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities,
and `128K` context window.
- [OpenAI `gpt-oss-120b` model
docs](https://developers.openai.com/api/docs/models/gpt-oss-120b)
document configurable reasoning effort with `low`, `medium`, and `high`,
plus text input/output modality.

The display names, default reasoning effort, and priority ordering are
Codex-local catalog choices.

## Test Plan
- Manually verified app-server model listing with an AWS profile:

```shell
CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \
  --codex-bin ./target/debug/codex \
  -c 'model_provider="amazon-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \
  -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \
  model-list
```

The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0`
as the default model and `openai.gpt-oss-20b-1:0` as the second listed
model, both text-only and supporting low/medium/high reasoning effort.
2026-04-24 04:28:25 +00:00
xl-openai
53be451673 feat: Use short SHA versions for curated plugin cache entries (#19095)
Curated plugin cache entries now use an 8-character SHA prefix, instead
of the full SHA, as the cache folder version number.
2026-04-23 21:15:03 -07:00
starr-openai
49fb25997f Add sticky environment API and thread state (#18897)
## Summary
- add sticky environment selections to app-server v2 thread/start and
turn/start request flow
- carry thread-level selections through core session/thread state
- add app-server coverage for sticky selections and turn overrides

## Stack
1. This PR: API and thread persistence
2. #18898: config.toml named environment loading
3. #18899: downstream tool/runtime consumers

## Validation
- Not run locally; split only.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 18:57:13 -07:00
cassirer-openai
e3c8720a99 [rollout_trace] Add debug trace reduction command (#18880)
## Summary

Adds the debug CLI entry point for reducing recorded rollout traces.
This gives developers a direct way to inspect whether the emitted trace
stream reduces into the expected conversation/runtime model.

## Stack

This is PR 5/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is intentionally last: it depends on the trace crate, core
recorder, runtime/tool events, and session/agent edge data all existing.
The command should remain a debug/developer tool and avoid adding new
runtime behavior.

The useful review question is whether the CLI exposes the reducer in the
smallest practical way for local inspection without turning the debug
command into a supported user-facing workflow.
2026-04-24 01:56:48 +00:00
efrazer-oai
5882f3f95e refactor: route Codex auth through AuthProvider (#18811)
## Summary

This PR moves Codex backend request authentication from direct
bearer-token handling to `AuthProvider`.

The new `codex-auth-provider` crate defines the shared request-auth
trait. `CodexAuth::provider()` returns a provider that can apply all
headers needed for the selected auth mode.

This lets ChatGPT token auth and AgentIdentity auth share the same
callsite path:
- ChatGPT token auth applies bearer auth plus account/FedRAMP headers
where needed.
- AgentIdentity auth applies AgentAssertion plus account/FedRAMP headers
where needed.

Reference old stack: https://github.com/openai/codex/pull/17387/changes

## Callsite Migration

| Area | Change |
| --- | --- |
| backend-client | accepts an `AuthProvider` instead of a raw
token/header |
| chatgpt client/connectors | applies auth through
`CodexAuth::provider()` |
| cloud tasks | keeps Codex-backend gating, applies auth through
provider |
| cloud requirements | uses Codex-backend auth checks and provider
headers |
| app-server remote control | applies provider headers for backend calls
|
| MCP Apps/connectors | gates on `uses_codex_backend()` and keys caches
from generic account getters |
| model refresh | treats AgentIdentity as Codex-backend auth |
| OpenAI file upload path | rejects non-Codex-backend auth before
applying headers |
| core client setup | keeps model-provider auth flow and allows
AgentIdentity through provider-backed OpenAI auth |

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. This PR: migrate Codex backend auth callsites through AuthProvider
5. https://github.com/openai/codex/pull/18904: accept AgentIdentity JWTs
and load `CODEX_AGENT_IDENTITY`

## Testing

Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.
2026-04-23 17:14:02 -07:00
Eric Traut
3f8c06e457 Fix /review interrupt and TUI exit wedges (#18921)
Addresses #11267

## Summary
`/review` can be interrupted while it is still spawning the review
sub-agent. That spawn path lives in `codex-core` and did not observe the
task cancellation token until after `Codex::spawn` returned, so an
interrupted review could keep building a child session and leave the TUI
in a wedged state.

The TUI exit path also waited indefinitely for app-server
`thread/unsubscribe`, which made Ctrl+C look broken if the app-server
was already stuck. This makes interactive delegate startup
cancellation-aware and bounds the TUI shutdown-first unsubscribe wait
with a short UI escape-hatch timeout.

## Testing
I reproed the hang using the steps in the bug report. Confirmed hang no
longer exists after fix.
2026-04-23 13:28:12 -07:00
Michael Bolin
9c0eced391 shell-escalation: carry resolved permission profiles (#18287)
## Why

Shell escalation still has adapter code that expects a legacy sandbox
policy, but command approvals should carry the resolved
`PermissionProfile` so callers can reason about the granted permissions
canonically.

## What changed

This introduces profile-shaped resolved escalation permissions while
retaining the derived legacy sandbox policy for the Unix escalation
adapter. It updates approval types, the escalation server protocol, and
tests that inspect escalated command permissions.

## Verification

- `cargo test -p codex-core --test all handle_container_exec_ --
--nocapture`
- `cargo test -p codex-core --test all handle_sandbox_ -- --nocapture`

























































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18287).
* #18288
* __->__ #18287
2026-04-23 12:46:19 -07:00
cassirer-openai
6d09b6752d [rollout_trace] Trace tool and code-mode boundaries (#18878)
## Summary

Extends rollout tracing across tool dispatch and code-mode runtime
boundaries. This records canonical tool-call lifecycle events and links
code-mode execution/wait operations back to the model-visible calls that
caused them.

## Stack

This is PR 3/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This PR is about attribution. Reviewers should focus on whether direct
tool calls, code-mode-originated tool calls, waits, outputs, and
cancellation boundaries are recorded with enough source information for
deterministic reduction without coupling the reducer to live runtime
internals.

The stack remains valid after this layer: tool and code-mode traces
reduce through the existing crate model, while the broader session and
multi-agent relationships are added in the next PR.
2026-04-23 12:22:11 -07:00
Michael Bolin
ff22982d75 mcp: include permission profiles in sandbox state (#18286)
## Why

MCP tool calls can receive a serialized `SandboxState` when a server
declares the sandbox-state capability. That state is one of the places
MCP runtimes learn what permissions Codex is operating under. As the
permissions migration makes `PermissionProfile` the canonical
representation, MCP consumers should be able to read that profile
directly instead of reconstructing permissions from the legacy
`SandboxPolicy`.

## What changed

- Adds optional `permissionProfile` to `codex_mcp::SandboxState`, while
keeping `sandboxPolicy` for existing MCP consumers.
- Populates `permissionProfile` from the current `TurnContext` when
serializing sandbox state for MCP tool calls.

## Verification

- Current GitHub Actions for this PR are passing.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18286).
* #18288
* #18287
* __->__ #18286
2026-04-23 12:21:26 -07:00
Michael Bolin
f90cc0ee64 tui: carry permission profiles on user turns (#18285)
## Why

Per-turn permission overrides should use the same canonical profile
abstraction as session configuration. That lets TUI submissions preserve
exact configured permissions without round-tripping through legacy
sandbox fields.

## What changed

This adds `permission_profile` to user-turn operations, threads it
through TUI/app-server submission paths, fills the new field in existing
test fixtures, and adds coverage that composer submission includes the
configured profile.

## Verification

- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`























































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285).
* #18288
* #18287
* #18286
* __->__ #18285
2026-04-23 11:54:17 -07:00
Rasmus Rygaard
f11583b8f6 Add remote thread config endpoint (#18908)
## Why

App-server needs a way to fetch thread-scoped config from the remote
thread config service when the user config opts into that behavior. This
mirrors the existing experimental remote thread store endpoint while
keeping local/noop behavior as the default.

Startup paths also need to avoid silently dropping the remote config
endpoint after the first config load. The stdio app-server path
discovers the endpoint from the initial config and installs the real
thread config loader for later config builds, while in-process clients
used by TUI/exec now select the same remote loader directly from their
provided config.

## What changed

- Added `experimental_thread_config_endpoint` to `ConfigToml`, `Config`,
and `core/config.schema.json`.
- Added config parsing coverage for the new setting.
- Updated app-server startup to select `RemoteThreadConfigLoader` from
the initially loaded config, falling back to `NoopThreadConfigLoader`
when unset.
- Let `ConfigManager` replace its thread config loader after startup
discovery so later config loads use the selected loader.
- Updated in-process app-server client startup to pass
`RemoteThreadConfigLoader` when its config has
`experimental_thread_config_endpoint` set.

## Verification

- Added `experimental_thread_config_endpoint_loads_from_config_toml`.
- Added
`runtime_start_args_use_remote_thread_config_loader_when_configured`.
- Ran `cargo check -p codex-app-server --lib`.
- Ran `cargo test -p codex-app-server-client`.
2026-04-23 11:46:06 -07:00
maja-openai
cff337e4e3 Use Auto-review wording for fallback rationale (#19168)
## Why

PR #18797 currently surfaces fallback rationale text that names Guardian
directly.

## What changed

- Updated the bare allow and bare deny fallback rationales in
`codex-rs/core/src/guardian/prompt.rs` from Guardian to Auto-review.
- Updated the existing bare allow parser test and added explicit bare
deny parser coverage.

## Verification

- `cargo test -p codex-core parse_guardian_assessment_treats_bare`
2026-04-23 11:42:43 -07:00
xl-openai
198eddd25d Move marketplace add/remove and startup sync out of core. (#19099)
Move more things to core-plugins.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-23 11:27:17 -07:00
Tom
f1061d9d07 [codex] Implement remote thread store methods (#19008) 2026-04-23 17:49:28 +00:00
Tom
f1923a38b1 [codex] Route live thread writes through ThreadStore (#18882)
Begin migrating the thread write codepaths to ThreadStore.

This starts using ThreadStore inside of core session code, not only in
the app server code.

Rework the interfaces around thread recording/persistence. We're left
with the following:

* `ThreadManager`: owns the process-level registry of loaded threads and
handles cross-thread orchestration: start, resume, fork, lookup, remove,
and route ops to running CodexThreads.
* `CodexThread`: represents one loaded/running thread from the outside.
It is the handle app-server and callers use to submit ops, inspect
session metadata, and shut the thread down.
* `LiveThread`: session-owned persistence lifecycle handle for one
active thread. Core session code uses it to append rollout items,
materialize lazy persistence, flush, shutdown, discard init-failed
writers, and load that thread’s persisted history.
* `ThreadStore`: storage backend abstraction. It answers “how are
threads persisted, read, listed, updated, archived?” Local and remote
implementations live behind this trait.
* `LocalThreadStore`: local ThreadStore implementation. It owns the
file/sqlite-specific details and keeps RolloutRecorder as a local
implementation detail.

This is a few too many Thread abstractions for my liking, but they do
all represent different concepts / needs / layers.

Migration note: in places where the core code explicitly requires a
path, rather than a thread ID, throw an error if we're running with a
remote store.

Cover the new local live-writer lifecycle with focused tests and
preserve app-server thread-start behavior, including ephemeral pathless
sessions.
2026-04-23 10:17:09 -07:00
jif-oai
a2f868c9d6 feat: drop spawned-agent context instructions (#19127)
## Why

MultiAgentV2 children should not receive an extra model-visible
developer fragment just because they were spawned. The parent/configured
developer instructions should carry through normally, but the dedicated
`<spawned_agent_context>` block is no longer desired.

## What changed

- Removed the `SpawnAgentInstructions` context fragment and its
`<spawned_agent_context>` wrapper.
- Stopped appending spawned-agent instructions in
`codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`.
- Updated subagent notification coverage to assert inherited parent
developer instructions without expecting the spawned-agent wrapper.

## Verification

- `cargo test -p codex-core --test all
spawned_multi_agent_v2_child_inherits_parent_developer_context --
--nocapture`
- `cargo test -p codex-core --test all
skills_toggle_skips_instructions_for_parent_and_spawned_child --
--nocapture`
- `cargo test -p codex-core --test all subagent_notifications --
--nocapture`
2026-04-23 18:54:45 +02:00
jif-oai
d3b044938d Reject agents.max_threads with multi_agent_v2 (#19129)
## Why

`multi_agent_v2` uses the v2 agent lifecycle, so accepting the legacy
`agents.max_threads` limit alongside it creates conflicting
configuration semantics. Config load should fail early with a clear
error instead of allowing both knobs to be set.

## What Changed

- During config load, detect when the effective `multi_agent_v2` feature
is enabled and `agents.max_threads` is explicitly set.
- Return an `InvalidInput` error: `agents.max_threads cannot be set when
multi_agent_v2 is enabled`.

## Verification

- `cargo test -p codex-core multi_agent_v2_rejects_agents_max_threads`
passed locally with a temporary focused test for this behavior.
- `cargo test -p codex-core` was also run; the new focused path passed,
but the crate suite has unrelated pre-existing failures in managed
config/proxy/request-permissions tests.
2026-04-23 13:31:54 +02:00
Won Park
17ae906048 Fix auto-review config compatibility across protocol and SDK (#19113)
## Why

This keeps the partial Guardian subagent -> Auto-review rename
forward-compatible across mixed Codex installations. Newer binaries need
to understand the new `auto_review` spelling, but they cannot write it
to shared `~/.codex/config.toml` yet because older CLI/app-server
bundles only know `user` and `guardian_subagent` and can fail during
config load before recovering.

The Python SDK had the opposite compatibility gap: app-server responses
can contain `approvalsReviewer: "auto_review"`, but the checked-in
generated SDK enum did not accept that value.

## What Changed

- Keep `ApprovalsReviewer::AutoReview` readable from both
`guardian_subagent` and `auto_review`, while serializing it as
`guardian_subagent` in both protocol crates.
- Update TUI Auto-review persistence tests so enabling Auto-review
writes `approvals_reviewer = "guardian_subagent"` while UI copy still
says Auto-review.
- Map managed/cloud `feature_requirements.auto_review` to the existing
`Feature::GuardianApproval` gate without adding a broad local
`[features].auto_review` key or changing config writes.
- Add `auto_review` to the Python SDK `ApprovalsReviewer` enum and cover
`ThreadResumeResponse` validation.

## Testing

- `cargo test -p codex-protocol approvals_reviewer`
- `cargo test -p codex-app-server-protocol approvals_reviewer`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_selects_auto_review`
- `cargo test -p codex-tui
update_feature_flags_enabling_guardian_in_profile_sets_profile_auto_review_policy`
- `cargo test -p codex-core
feature_requirements_auto_review_disables_guardian_approval`
- `pytest
sdk/python/tests/test_client_rpc_methods.py::test_thread_resume_response_accepts_auto_review_reviewer`
- `git diff --check`
2026-04-23 03:12:56 -07:00
Abhinav
305825abd9 Support MCP tools in hooks (#18385)
## Summary

Lifecycle hooks currently treat `PreToolUse`, `PostToolUse`, and
`PermissionRequest` as Bash-only flows
- hook schema constrains `tool_name` to `Bash`
- hook input assumes a command-shaped `tool_input`
- core hook dispatch path passes only shell command strings

That means hooks cannot target MCP tools even though MCP tool names are
model-visible and stable

This change generalizes those hook paths so they can match and receive
payloads for MCP tools while preserving the existing Bash behavior.

## Reviewer Notes

I think these are the key files
- `codex-rs/core/src/tools/handlers/mcp.rs`
- `codex-rs/core/src/mcp_tool_call.rs`

Otherwise the changes across apply_patch, shell, and unified_exec are
mainly to rewire everything to be `tool_input` based instead of just
`command` so that it'll make sense for MCP tools.

## Changes

- Allow `PreToolUse`, `PostToolUse`, and `PermissionRequest` hook inputs
to carry arbitrary `tool_name` and `tool_input` values instead of
hard-coding `Bash` and command-only payloads.
- Add MCP hook payload support through `McpHandler`, using the
model-visible tool name from `ToolInvocation` and the raw MCP arguments
as `tool_input`.
- Include MCP tool responses in `PostToolUse` by serializing
`McpToolOutput` into the hook response payload.
- Run `PermissionRequest` hooks for MCP approval requests after
remembered approval checks and before falling back to user-facing MCP
elicitation.
- Preserve exact matching for literal hook matchers like `Bash` and
`mcp__memory__create_entities`, while keeping regex matcher support for
patterns like `mcp__memory__.*` and `mcp__.*__write.*`.

---------

Co-authored-by: Andrei Eternal <eternal@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-04-23 07:33:57 +00:00
xl-openai
951be1a8a1 feat: Warn and continue on unknown feature requirements (#19038)
Requirements feature flags now fail open like config feature flags, but
with a startup warning.

<img width="443" height="68" alt="image"
src="https://github.com/user-attachments/assets/76767fa7-8ce8-4fc7-8a09-902fcdda6298"
/>
2026-04-22 22:50:44 -07:00
Eric Traut
bbff4ee61a Add safety check notification and error handling (#19055)
Adds a new app-server notification that fires when a user account has
been flagged for potential safety reasons.
2026-04-22 22:24:12 -07:00
Shijie Rao
02170996e6 Default Fast service tier for eligible ChatGPT plans (#19053)
## Why

Enterprise and business-like ChatGPT plans should get Codex's Fast
service tier by default when the user or caller has not made an explicit
service-tier choice. At the same time, callers need a durable way to
choose standard routing without adding a new persisted `standard`
service tier value. This keeps existing config compatibility while
letting core own the managed default policy.

## What changed

- Resolve the effective service tier in core at session creation:
explicit `fast` or `flex` wins, explicit null/clear or
`[notice].fast_default_opt_out = true` resolves to standard routing, and
otherwise eligible ChatGPT plans resolve to Fast when FastMode is
enabled.
- Add `[notice].fast_default_opt_out` as the persisted opt-out marker
for managed Fast defaults.
- Treat app-server/TUI `service_tier: null` as an explicit
standard/clear choice by preserving that intent through config loading.
- Update TUI rendering to use core's effective service tier for startup
and status surfaces while still keeping `config.service_tier` as the
explicit configured choice.
- Update `/fast off` to clear `service_tier`, persist the opt-out
marker, and send explicit standard for subsequent turns.

## Verification

- Added unit coverage for config override/notice handling, service-tier
resolution, runtime null clearing, and `/fast off` turn propagation.
- `cargo build -p codex-cli`

Full test suite was not run locally per author request.
2026-04-22 21:54:44 -07:00
Michael Bolin
082fc4f632 protocol: report session permission profiles (#18282)
## Why

Clients that observe `SessionConfigured` need the same canonical
permission view that app-server thread responses provide. Reporting the
profile in protocol events lets clients keep their local state
synchronized without reinterpreting legacy sandbox fields.

## What changed

This adds `permission_profile` to `SessionConfigured` and propagates it
through core, exec JSON output, MCP server messages, and TUI
history/widget handling.

## Verification

- `cargo test -p codex-tui permissions -- --nocapture`
- `cargo test -p codex-core --test all permissions_messages --
--nocapture`















































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18282).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* __->__ #18282
2026-04-22 21:29:32 -07:00
Andrei Eternal
2b2de3f38b codex: support hooks in config.toml and requirements.toml (#18893)
## Summary

Support the existing hooks schema in inline TOML so hooks can be
configured from both `config.toml` and enterprise-managed
`requirements.toml` without requiring a separate `hooks.json` payload.

This gives enterprise admins a way to ship managed hook policy through
the existing requirements channel while still leaving script delivery to
MDM or other device-management tooling, and it keeps `hooks.json`
working unchanged for existing users.

This also lays the groundwork for follow-on managed filtering work such
as #15937, while continuing to respect project trust gating from #14718.
It does **not** implement `allow_managed_hooks_only` itself.

NOTE: yes, it's a bit unfortunate that the toml isn't formatted as
closely as normal to our default styling. This is because we're trying
to stay compatible with the spec for plugins/hooks that we'll need to
support & the main usecase here is embedding into requirements.toml

## What changed

- moved the shared hook serde model out of `codex-rs/hooks` into
`codex-rs/config` so the same schema can power `hooks.json`, inline
`config.toml` hooks, and managed `requirements.toml` hooks
- added `hooks` support to both `ConfigToml` and
`ConfigRequirementsToml`, including requirements-side `managed_dir` /
`windows_managed_dir`
- treated requirements-managed hooks as one constrained value via
`Constrained`, so managed hook policy is merged atomically and cannot
drift across requirement sources
- updated hook discovery to load requirements-managed hooks first, then
per-layer `hooks.json`, then per-layer inline TOML hooks, with a warning
when a single layer defines both representations
- threaded managed hook metadata through discovered handlers and exposed
requirements hooks in app-server responses, generated schemas, and
`/debug-config`
- added hook/config coverage in `codex-rs/config`, `codex-rs/hooks`,
`codex-rs/core/src/config_loader/tests.rs`, and
`codex-rs/core/tests/suite/hooks.rs`

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-hooks`
- `cargo test -p codex-app-server config_api`

## Documentation

Companion updates are needed in the developers website repo for:

- the hooks guide
- the config reference, sample, basic, and advanced pages
- the enterprise managed configuration guide

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2026-04-22 21:20:09 -07:00
Dylan Hurd
5e71da1424 feat(request-permissions) approve with strict review (#19050)
## Summary
Allow the user to approve a request_permissions_tool request with the
condition that all commands in the rest of the turn are reviewed by
guardian, regardless of sandbox status.

## Testing
- [x] Added unit tests
- [x] Ran locally
2026-04-23 01:56:32 +00:00
Dylan Hurd
c6ab601824 chore(auto-review) feature => stable (#19063)
## Summary
Turn on Auto Review

## Testing
- [x] Update unit tests
2026-04-22 18:51:39 -07:00
Michael Bolin
3cc3763e6c core: box multi-agent wrapper futures (#19059)
## Why

While debugging the Windows stack overflows we saw in
[#13429](https://github.com/openai/codex/pull/13429) and then again in
[#18893](https://github.com/openai/codex/pull/18893), I hit another
overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.

That test drives the legacy multi-agent spawn / close / resume path. The
behavior was fine, but several thin async wrappers were still inlining
much larger `AgentControl` futures into their callers, which was enough
to overflow the default Windows stack.

## What

- Box the thin `AgentControl` wrappers around `spawn_agent_internal`,
`resume_single_agent_from_rollout`, and `shutdown_agent_tree`.
- Box the corresponding legacy `multi_agents` handler calls in `spawn`,
`resume_agent`, and `close_agent`.
- Keep behavior unchanged while reducing future size on this call path
so the Windows test no longer overflows its stack.

## Testing

- `cargo test -p codex-core --lib
tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed
-- --exact --nocapture`
- `cargo test -p codex-core` (this still hit unrelated local
integration-test failures because `codex.exe` / `test_stdio_server.exe`
were not present in this shell; the relevant unit tests passed)
2026-04-22 17:48:13 -07:00
Won Park
83ec1eb5d6 Rename approvals reviewer variant to auto-review (#19056)
## Why

`approvals_reviewer` now uses `auto_review` as the canonical config/API
value after #18504, but the Rust enum variant and nearby helper/test
names still used `GuardianSubagent` / guardian approval wording. That
made follow-up code and reviews confusing even though the external value
had already moved to Auto-review.

## What changed

- Renamed `ApprovalsReviewer::GuardianSubagent` to
`ApprovalsReviewer::AutoReview`.
- Updated protocol, app-server, config, core, TUI, exec, and analytics
test callsites.
- Renamed nearby helper/test names from guardian approval wording to
Auto-review wording where they refer to the approvals reviewer mode.
- Preserved wire compatibility:
  - `auto_review` remains the canonical serialized value.
  - `guardian_subagent` remains accepted as a legacy alias.

This intentionally does not rename the `[features].guardian_approval`
key, `Feature::GuardianApproval`, `core/src/guardian`, analytics event
names, or app-server Guardian review event types.

## Verification

- `cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent`
- `cargo test -p codex-config approvals_reviewer`
- `cargo test -p codex-tui update_feature_flags`
- `cargo test -p codex-core permissions_instructions`
- `cargo test -p codex-tui permissions_selection`
2026-04-22 17:22:35 -07:00
Andrei Eternal
eed0e07825 hooks: emit Bash PostToolUse when exec_command completes via write_stdin (#18888)
Fixes #16246.

## Why

`exec_command` already emits `PreToolUse`, but long-running unified exec
commands that finish on a later `write_stdin` poll could miss the
matching `PostToolUse`. That left the Bash hook lifecycle inconsistent,
broke expectations around `tool_use_id` and `tool_input.command`, and
meant `PostToolUse` block/replacement feedback could fail to replace the
final session output before it reached model context.

This keeps the fix scoped to the `exec_command` / `write_stdin`
lifecycle. Broader non-Bash hook expansion is still out of scope here
and remains tracked separately in #16732.

## What changed

- Compute and store `PostToolUsePayload` while handlers still have
access to their concrete output type, and carry `tool_use_id` through
that payload.
- Preserve the original hook-facing `exec_command` string through
unified exec state (`ExecCommandRequest`, `ProcessEntry`,
`PreparedProcessHandles`, and `ExecCommandToolOutput`) via
`hook_command`, and remove the now-unused `session_command` output
metadata.
- Emit exactly one Bash `PostToolUse` for long-running `exec_command`
sessions when a later `write_stdin` poll observes final completion,
using the original `exec_command` call id and hook-facing command.
- Keep one-shot `exec_command` behavior aligned with the same payload
construction, including interactive completions that return a final
result directly.
- Apply `PostToolUse` block/replacement feedback before the final
`write_stdin` completion output is sent back to the model.
- Keep `write_stdin` itself out of `PreToolUse` matching so it continues
to act as transport/polling for the original Bash tool call.
- Restore plain matcher behavior for tool-name matchers such as `Bash`
and `Edit|Write`, while still treating patterns with regex characters
(for example `mcp__.*`) as regexes.
- Add unit coverage for unified exec payload construction and parallel
session separation, plus a core integration regression that verifies a
blocked `PostToolUse` replaces the final `write_stdin` output in model
context.

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-core post_tool_use_payload`
- `cargo test -p codex-core
post_tool_use_blocks_when_exec_session_completes_via_write_stdin`
2026-04-22 17:14:22 -07:00
Michael Bolin
6ca038bbd1 rollout: persist turn permission profiles (#18281)
## Why

Resume and reconstruction need to preserve the permissions that were
active for each user turn. If rollouts only keep legacy sandbox fields,
replay cannot faithfully represent profile-shaped overrides introduced
earlier in the stack.

## What changed

This records `permission_profile` on user-turn rollout events,
reconstructs it through history/state extraction, and updates rollout
reconstruction and related fixtures to keep the field explicit.

## Verification

- `cargo test -p codex-core --test all permissions_messages --
--nocapture`
- `cargo test -p codex-core --test all request_permissions --
--nocapture`











































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* __->__ #18281
2026-04-22 17:00:29 -07:00
Won Park
46142c3cb0 Rebrand approvals reviewer config to auto-review (#18504)
### Why

Auto-review is the user-facing name for the approvals reviewer, but the
config/API value still exposed the old `guardian_subagent` name. That
made new configs and generated schemas point users at Guardian
terminology even though the intended product surface is Auto-review.

This PR updates the external `approvals_reviewer` value while preserving
compatibility for existing configs and clients.

### What changed

- Makes `auto_review` the canonical serialized value for
`approvals_reviewer`.
- Keeps `guardian_subagent` accepted as a legacy alias.
- Keeps `user` accepted and serialized as `user`.
- Updates generated config and app-server schemas so
`approvals_reviewer` includes:
  - `user`
  - `auto_review`
  - `guardian_subagent`
- Updates app-server README docs for the reviewer value.
- Updates analytics and config requirements tests for the canonical
auto_review value.


### Compatibility

Existing configs and API payloads using:

```toml
approvals_reviewer = "guardian_subagent"
```

continue to load and map to the Auto-review reviewer behavior. 

New serialization emits: 
```toml
approvals_reviewer = "auto_review" 
```

This PR intentionally does not rename the [features].guardian_approval
key or broad internal Guardian symbols. Those are split out for a
follow-up PR to keep this migration small and avoid touching large
TUI/internal surfaces.

**Verification**
cargo test -p codex-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
cargo test -p codex-app-server-protocol
approvals_reviewer_serializes_auto_review_and_accepts_legacy_guardian_subagent
2026-04-22 15:45:35 -07:00
khoi
568cdacc7e [Codex] Register browser requirements feature keys (#18956)
## Summary
- register `in_app_browser` and `browser_use` as stable feature keys
- allow requirements/MDM feature requirements to pin those desktop
browser controls
- add coverage for browser requirements being accepted by config loading

## Testing
- `cargo fmt --all` (`just fmt` unavailable locally; rustfmt warned
about nightly-only `imports_granularity` config)
- `cargo test -p codex-features`
- `cargo test -p codex-core browser_feature_requirements_are_valid`
- Tested manually by setting in `requirements.toml` and seeing after app
restart state to reflect the setting was correct (at the time hiding the
`Browser Use` setting when the enterprise setting was set to false
2026-04-22 15:27:15 -07:00
Leo Shimonaka
16eeeb534a Fix MCP permission policy sync (#19033)
###### Why/Context/Summary

Repro: start a session outside Full Access, switch permissions to Full
Access, then submit a new turn that triggers MCP/CUA permission
handling.

The turn used the live Full Access `SessionConfiguration`, but the MCP
coordinator was still synced from the stale `original_config_do_not_use`
/ per-turn config copy. That left the coordinator with an old sandbox
policy, so empty MCP permission elicitations could be denied instead of
auto-accepted.

Fix: update/rebuild the MCP connection manager from the live
turn/session approval and sandbox policy fields.

###### Test plan

```sh
just fmt
cargo test -p codex-core --lib
cargo test -p codex-core --lib mcp_tool_call::tests
```
2026-04-22 14:30:29 -07:00
viyatb-oai
2d73bac45f feat: add guardian network approval trigger context (#18197)
## Summary

Give guardian network-access reviews the command context that triggered
a managed-network approval. The prompt JSON now includes the originating
tool call id, tool name, command argv, cwd, sandbox permissions,
additional permissions, justification, and tty state when a single
active tool call can be attributed.

The implementation keeps the trigger shape canonical by serializing
`GuardianNetworkAccessTrigger` directly and lets each runtime build that
trigger from its `ToolCtx`. Non-guardian approval prompts avoid cloning
the full trigger payload.

## UX changes

Guardian network-access reviews now include a `trigger` object that
explains what command caused the network approval. Instead of seeing
only the requested host, the guardian reviewer can also see the
originating tool call, argv, working directory, sandbox mode,
justification, and tty state.

Example payload the guardian reviewer can see:

```json
{
  "tool": "network_access",
  "target": "https://api.github.com:443",
  "host": "api.github.com",
  "protocol": "https",
  "port": 443,
  "trigger": {
    "callId": "call_abc123",
    "toolName": "shell",
    "command": ["gh", "api", "/repos/openai/codex/pulls/18197"],
    "cwd": "/workspace/codex",
    "sandboxPermissions": "require_escalated",
    "justification": "Fetch PR metadata from GitHub.",
    "tty": false
  }
}
```

The network review itself remains scoped to the network decision:
`target_item_id` stays `null`. `trigger.callId` is attribution context
only, so clients can still distinguish network reviews from
item-targeted command reviews.

## Verification

- Added coverage for serializing network trigger context in guardian
approval JSON.
- Added regression coverage that network guardian reviews do not reuse
`trigger.callId` as `target_item_id`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 14:00:53 -07:00
Michael Bolin
18a26d7bbc app-server: accept permission profile overrides (#18279)
## Why

`PermissionProfile` is becoming the canonical permissions shape shared
by core and app-server. After app-server responses expose the active
profile, clients need to be able to send that same shape back when
starting, resuming, forking, or overriding a turn instead of translating
through the legacy `sandbox`/`sandboxPolicy` shorthands.

This still needs to preserve the existing requirements/platform
enforcement model. A profile-shaped request can be downgraded or
rejected by constraints, but the server should keep the user's
elevated-access intent for project trust decisions. Turn-level profile
overrides also need to retain existing read protections, including
deny-read entries and bounded glob-scan metadata, so a permission
override cannot accidentally drop configured protections such as
`**/*.env = deny`.

## What changed

- Adds optional `permissionProfile` request fields to `thread/start`,
`thread/resume`, `thread/fork`, and `turn/start`.
- Rejects ambiguous requests that specify both `permissionProfile` and
the legacy `sandbox`/`sandboxPolicy` fields, including running-thread
resume requests.
- Converts profile-shaped overrides into core runtime filesystem/network
permissions while continuing to derive the constrained legacy sandbox
projection used by existing execution paths.
- Preserves project-trust intent for profile overrides that are
equivalent to workspace-write or full-access sandbox requests.
- Preserves existing deny-read entries and `globScanMaxDepth` when
applying turn-level `permissionProfile` overrides.
- Updates app-server docs plus generated JSON/TypeScript schema fixtures
and regression coverage.

## Verification

- `cargo test -p codex-app-server-protocol schema_fixtures`
- `cargo test -p codex-core
session_configuration_apply_permission_profile_preserves_existing_deny_read_entries`







---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279).
* #18288
* #18287
* #18286
* #18285
* #18284
* #18283
* #18282
* #18281
* #18280
* __->__ #18279
2026-04-22 13:34:33 -07:00
Dylan Hurd
ed4def8286 feat(auto-review) short-circuit (#18890)
## Summary
Short circuit the convo if auto-review hits too many denials

## Testing
- [x] Added unit tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-22 20:34:15 +00:00
xl-openai
b77791c228 feat: Fairly trim skill descriptions within context budget (#18925)
Preserve skill name/path entries whenever possible and trim descriptions
first, using round-robin character allocation so short descriptions do
not waste budget.
2026-04-22 12:33:29 -07:00
Won Park
11e5af53c4 Add plumbing to approve stored Auto-Review denials (#18955)
## Summary

This adds the structural plumbing needed for an app-server client to
approve a previously denied Guardian review and carry that approval
context into the next model turn.

This PR does not add the actual `/auto-review-denials` tool 

## What Changed

- Added app-server v2 RPC `thread/approveGuardianDeniedAction`.
- Added generated JSON schema and TypeScript fixtures for
`ThreadApproveGuardianDeniedAction*`.
- Added core `Op::ApproveGuardianDeniedAction`.
- Added a core handler that validates the event is a denied Guardian
assessment and injects a developer message containing the stored denial
event JSON.
- Queues the approval context for the next turn if there is no active
turn yet.
- Added the TUI app-server bridge so `Op::ApproveGuardianDeniedAction {
event }` is routed to the app-server request.

## What This Does Not Do

- Does not add `/auto-review-denials`.
- Does not add chat widget recent-denial state.
- Does not add popup/list UI.
- Does not add a product-facing denial lookup/store.
- Does not change where Guardian denials are originally emitted or
persisted.

## Verification

- `cargo test -p codex-tui thread_approve_guardian_denied_action`
2026-04-22 10:38:19 -07:00
Dylan Hurd
78593d72ea feat(auto-review) policy config (#18959)
## Summary
Allow users to customize their own auto-review policy config.

## Testing
- [x] added config_tests
2026-04-22 10:33:02 -07:00
cassirer-openai
f67383bcba [rollout_trace] Record core session rollout traces (#18877)
## Summary

Wires rollout trace recording into `codex-core` session and turn
execution. This records the core model request/response, compaction, and
session lifecycle boundaries needed for replay without yet tracing every
nested runtime/tool boundary.

## Stack

This is PR 2/5 in the rollout trace stack.

- [#18876](https://github.com/openai/codex/pull/18876): Add rollout
trace crate
- [#18877](https://github.com/openai/codex/pull/18877): Record core
session rollout traces
- [#18878](https://github.com/openai/codex/pull/18878): Trace tool and
code-mode boundaries
- [#18879](https://github.com/openai/codex/pull/18879): Trace sessions
and multi-agent edges
- [#18880](https://github.com/openai/codex/pull/18880): Add debug trace
reduction command

## Review Notes

This layer is the first live integration point. The important review
question is whether trace recording is isolated from normal session
behavior: trace failures should not become user-visible execution
failures, and recording should preserve the existing turn/session
lifecycle semantics.

The PR depends on the reducer/data model from the first stack entry and
only introduces the core recorder surface that later PRs use for richer
runtime and relationship events.
2026-04-22 17:00:48 +00:00
jif-oai
639382609f fix: wait_agent timeout for queued mailbox mail (#18968)
## Why

`wait_agent` can be called while mailbox mail is already pending. The
previous implementation subscribed for future mailbox sequence changes
and then waited for the next notification. If the mail was queued before
that wait started, no new notification arrived, so the tool could sit
until `timeout_ms` even though mail was ready to deliver.

## What Changed

- Added `Session::has_pending_mailbox_items()` for checking pending
mailbox mail through the session API.
- Updated `multi_agents_v2::wait` to return immediately when pending
mailbox mail already exists before sleeping on a new mailbox sequence
update.
- Reworked the regression coverage in `multi_agents_tests.rs` so already
queued mailbox mail must wake `wait_agent` promptly.

Relevant code:
- [`wait_agent` pending-mail
check](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_v2/wait.rs (L55-L60))
-
[`Session::has_pending_mailbox_items`](aa8ca06e83/codex-rs/core/src/session/mod.rs (L2979-L2981))
-
[`multi_agent_v2_wait_agent_returns_for_already_queued_mail`](aa8ca06e83/codex-rs/core/src/tools/handlers/multi_agents_tests.rs (L2854))

## Verification

- `cargo test -p codex-core
multi_agent_v2_wait_agent_returns_for_already_queued_mail`
2026-04-22 11:16:17 +01:00
acrognale-oai
4f8c58f737 Support multiple cwd filters for thread list (#18502)
## Summary

- Teach app-server `thread/list` to accept either a single `cwd` or an
array of cwd filters, returning threads whose recorded session cwd
matches any requested path
- Add `useStateDbOnly` as an explicit opt-in fast path for callers that
want to answer `thread/list` from SQLite without scanning JSONL rollout
files
- Preserve backwards compatibility: by default, `thread/list` still
scans JSONL rollouts and repairs SQLite state
- Wire the new cwd array and SQLite-only options through app-server,
local/remote thread-store, rollout listing, generated TypeScript/schema
fixtures, proto output, and docs

## Test Plan

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-rollout`
- `cargo test -p codex-thread-store`
- `cargo test -p codex-app-server thread_list`
- `just fmt`
- `just fix -p codex-app-server-protocol -p codex-rollout -p
codex-thread-store -p codex-app-server`
- `cargo build -p codex-cli --bin codex`
2026-04-22 06:10:09 -04:00
rhan-oai
213b17b7a3 [codex-analytics] guardian review TTFT plumbing and emission (#17696)
## Why

Guardian analytics includes time-to-first-token, but the Guardian
reviewer runs as a normal Codex session and `TurnCompleteEvent` did not
expose TTFT. The timing needs to flow through the standard
turn-completion protocol so Guardian review analytics can consume the
same value as the rest of the session machinery.

## What changed

Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and
populates it from `TurnTiming`. The value is carried through app-server
thread history, rollout reconstruction, TUI/app-server adapters, and
Guardian review session handling.

Guardian review analytics now captures TTFT from the reviewer
turn-complete event when available. Existing tests and fixtures are
updated to set the new optional field to `None` where TTFT is not
relevant.

## Verification

- `cargo clippy -p codex-tui --tests -- -D warnings`
- `cargo clippy -p codex-core --lib --tests -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696).
* __->__ #17696
* #17695
* #17693
* #18278
* #18953
2026-04-22 01:52:48 -07:00
rhan-oai
37aadeaa13 [codex-analytics] guardian review truncation (#17695)
## Why

The Guardian review event needs to report whether the action shown to
Guardian was truncated. That field should come from the same truncation
path used to build the Guardian prompt, rather than being inferred after
the fact.

## What changed

Plumbs truncation metadata through Guardian action formatting, prompt
construction, review session execution, and analytics emission.
`guardian_truncate_text` now reports both the rendered text and whether
it inserted the truncation marker, and `reviewed_action_truncated` is
set from that prompt-building result.

This keeps the analytics field aligned with the model-visible reviewed
action while preserving the existing Guardian prompt behavior.

## Verification

- Guardian truncation tests cover both truncated and non-truncated
action payloads.
- Guardian review tests assert the review session metadata and
truncation field are propagated.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17695).
* #17696
* __->__ #17695
* #17693
* #18278
* #18953
2026-04-22 08:35:29 +00:00
rhan-oai
4e7399c6b9 [codex-analytics] guardian review analytics events emission (#17693)
## Why

Guardian approvals now run as review sessions, but Codex analytics did
not have a terminal event for those reviews. That made it hard to
measure approval outcomes, failure modes, Guardian session reuse, model
metadata, token usage, and timing separately from the parent turn.

## What changed

Adds `codex_guardian_review` analytics emission for Guardian approval
reviews. The event is emitted from the Guardian review path with review
identity, target item id, approval request source, a PII-minimized
reviewed-action shape, terminal decision/status, failure reason,
Guardian assessment fields, Guardian session metadata, token usage, and
timing metadata.

The reviewed-action payload intentionally omits high-risk fields such as
shell commands, working directories, argv, file paths, network
targets/hosts, rationale, retry reason, and permission justifications.
It also classifies prompt-build failures separately from Guardian
session/runtime failures so fail-closed cases are distinguishable in
analytics.

## Verification

- Guardian review analytics tests cover terminal success,
timeout/cancel/fail-closed paths, session metadata, and token usage
plumbing.
- `cargo clippy -p codex-core --lib --tests -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17693).
* #17696
* #17695
* __->__ #17693
2026-04-22 01:02:47 -07:00