Commit Graph

3385 Commits

Author SHA1 Message Date
Abhinav Vedmala
f6f6721555 Update plan hook coverage 2026-05-20 17:18:10 -07:00
Abhinav Vedmala
26c2104698 Default hooks for function tools 2026-05-20 17:18:10 -07:00
Abhinav Vedmala
08f7c920dd Fix subagent hook fixture arg comments 2026-05-20 15:43:55 -07:00
Abhinav Vedmala
fd3a862118 Merge remote-tracking branch 'origin/main' into abhinav/subagent-hook-context-stack
# Conflicts:
#	codex-rs/core/src/hook_runtime.rs
#	codex-rs/core/tests/suite/subagent_notifications.rs
2026-05-20 15:24:00 -07:00
Eric Traut
0e9d222178 Make goals feature on by default and no longer experimental (#23732)
## Why

The `goals` feature is ready to be available without requiring users to
opt into experimental features. Keeping it behind the beta flag leaves
persisted thread goals and automatic goal continuation disabled by
default.

This PR also marks the goal-related app server APIs and events as no
longer experimental.

## What changed

- Mark `goals` as `Stage::Stable`.
- Enable `goals` by default in `codex-rs/features/src/lib.rs`.
2026-05-20 15:07:35 -07:00
Abhinav
eee3e60db3 Add SubagentStop hook (#22873)
# What

<img width="1792" height="1024" alt="image"
src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
/>

`SubagentStop` runs when a thread-spawned subagent turn is about to
finish. Thread-spawned subagents use `SubagentStop` instead of the
normal root-agent `Stop` hook.

Configured handlers match on `agent_type`. Hook input includes the
normal stop fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.
- `agent_transcript_path`: the child subagent transcript path.
- `transcript_path`: the parent thread transcript path.
- `last_assistant_message`: the final assistant message from the child
turn, when available.
- `stop_hook_active`: `true` when the child is already continuing
because an earlier stop-like hook blocked completion.

`SubagentStop` shares the same completion-control semantics as `Stop`,
scoped to the child turn:

- No decision allows the child turn to finish.
- `decision: "block"` with a non-empty `reason` records that reason as
hook feedback and continues the child with that prompt.
- `continue: false` stops the child turn. If `stopReason` is present,
Codex surfaces it as the stop reason.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStop`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
This avoids exposing synthetic matcher labels for internal
implementation paths.

# Stack

1. #22782: add `SubagentStart`.
2. This PR: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.
2026-05-20 14:59:41 -07:00
viyatb-oai
40ad7be2b5 core: refresh active permission profiles at runtime (#22931)
## Why

Once a named permission profile is selected, runtime state has to keep
that profile identity intact instead of collapsing back to anonymous
effective permissions. The session refresh path also needs to rebuild
profile-derived network proxy state so active profile switches take
effect consistently.

## What changed

- Preserve the active permission profile through session updates.
- Rebuild profile-derived runtime/network configuration when the active
profile changes.
- Keep the runtime path aligned with the current session configuration
APIs.
- Tighten the affected tests, including the Windows delete-pending
memory-file case that was intermittently tripping CI.

## Stack

1. **This PR**: runtime/session/network propagation for active
permission profiles.
2. [#23708](https://github.com/openai/codex/pull/23708): TUI selection
plumbing and guardrail flow.
3. [#21559](https://github.com/openai/codex/pull/21559): profile-aware
`/permissions` menu and custom profile display.

<img width="1296" height="906" alt="image"
src="https://github.com/user-attachments/assets/077fa3a7-80cb-4925-80b1-d2395018d90a"
/>
2026-05-20 21:55:21 +00:00
Michael Bolin
896ee672cc windows-sandbox: feed setup from resolved permissions (#23167)
## Why

This is the next step in the Windows sandbox migration away from the
legacy `SandboxPolicy` abstraction. #22923 moved write-root and token
decisions onto `ResolvedWindowsSandboxPermissions`, but setup and
identity still accepted `SandboxPolicy` and converted internally. This
PR pushes that conversion outward so the setup path consumes the
resolved Windows permission view directly.

## What Changed

- Changed `SandboxSetupRequest` to carry
`ResolvedWindowsSandboxPermissions` instead of `SandboxPolicy` plus
policy cwd.
- Updated setup refresh/elevation and identity credential preparation to
use resolved permissions for read roots, write roots, network identity,
and deny-write payload planning.
- Removed the production `allow.rs` legacy wrapper; allow-path
computation now takes resolved permissions directly.
- Added a permissions-based world-writable audit entry point while
keeping the existing legacy wrapper for compatibility.
- Updated legacy ACL setup and the core Windows setup bridge to
construct resolved permissions at the boundary.
- Hardened the Windows sandbox integration test helper staging so Bazel
retries can reuse an already-staged helper if a prior sandbox helper
process still has the executable open.

## Verification

- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core --test all --no-run`
- `just fix -p codex-windows-sandbox`
- `just fix -p codex-core`
- Attempted `cargo check -p codex-windows-sandbox --target
x86_64-pc-windows-gnullvm`, but the local machine is missing
`x86_64-w64-mingw32-clang`; Windows CI should cover that target.











---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23167).
* #23715
* #23714
* __->__ #23167
2026-05-20 14:52:38 -07:00
Michael Bolin
e1ec0eee5f windows-sandbox: drive write roots from resolved permissions (#22923)
## Why

This is the third PR in the Windows sandbox `SandboxPolicy` ->
`PermissionProfile` migration stack.

#22896 introduced `ResolvedWindowsSandboxPermissions`, and #22918 moved
elevated runner IPC to carry `PermissionProfile`. This PR starts moving
the remaining setup/spawn helpers away from asking legacy enum questions
like “is this `WorkspaceWrite`?” and toward resolved runtime permission
questions like “does this profile require write capability roots?”

## What changed

- Added resolved-permissions helpers for network identity and
write-capability detection.
- Moved setup write-root gathering to operate on
`ResolvedWindowsSandboxPermissions`, with the legacy `SandboxPolicy`
wrapper left in place for existing call sites.
- Updated identity setup, elevated capture setup, and world-writable
audit denies to use resolved write roots.
- Updated spawn preparation to carry resolved permissions in
`SpawnContext` and use them for network blocking, setup write roots,
elevated capability SID selection, and legacy capability roots.
- Removed a now-unused legacy write-root helper.

## Verification

- `cargo test -p codex-windows-sandbox`
- `just fix -p codex-windows-sandbox`
- Existing stack checks are green on #22896 and #22918; CI has started
for this PR.
















---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22923).
* #23715
* #23714
* #23167
* __->__ #22923
2026-05-20 14:30:42 -07:00
Abhinav
af49d38373 Support compact SessionStart hooks (#21272)
# Why

Compaction replaces the live conversation history, so hooks that use
`SessionStart` to re-inject durable model context need a way to run
again after that rewrite.

Related - #19905 adds dedicated compact lifecycle hooks

# What

- add `compact` as a supported `SessionStart` source and matcher value
- change pending `SessionStart` state from a single slot to a small FIFO
queue so `resume` / `startup` / `clear` can be preserved alongside a
later `compact`
- drain all queued `SessionStart` sources before the next model request,
preserving their original order

# Testing

The new integration coverage verifies both the basic `compact` matcher
path and the stacked `resume` -> `compact` case where both hooks
contribute `additionalContext` to the next model turn.
2026-05-20 20:46:19 +00:00
viyatb-oai
fe7c069fe6 feat(permissions): resolve permission profile inheritance (#22270)
## Stack

This is the foundation PR for the permission-profile inheritance stack.

- This PR adds config-level `extends` resolution and merge semantics.
- Follow-up: #23705 applies resolved profiles at runtime and updates the
active-profile protocol surfaces.

## Why

Permission profiles are starting to carry enough policy that
copy-pasting near-identical definitions becomes hard to review and easy
to drift. Before the runtime can consume inherited profiles, the config
layer needs one explicit resolver that can merge parent chains and
reject unsafe or invalid inheritance shapes.

## What changed

- Add `extends` to permission-profile TOML and resolve parent chains in
inheritance order.
- Merge inherited profile TOML with the existing config merge behavior
while preserving the permission-specific normalization needed for
network domain keys.
- Keep parent descriptions out of resolved child profiles and record
inherited profile names separately for downstream consumers.
- Reject undefined parents, unsupported built-in parents, and
inheritance cycles with targeted errors.
- Cover resolver behavior with TOML fixture tests and refresh the
generated config schema.

## Validation

- `cargo test -p codex-config`
- `cargo test -p codex-core permissions_profiles_`
2026-05-20 20:12:07 +00:00
evawong-oai
3d94e24a3d Add MITM hook config model (#18868)
## Stack
1. This PR adds MITM hook config and model only.
2. Runtime follow up: #20659 wires hook enforcement into the proxy
request path.
3. User facing config follow up: #18240 moves MITM policy into the
PermissionProfile network tree.

## Why
1. Viyat asked for the original parent PR to be split so reviewers can
inspect the policy model before request behavior changes.
2. This PR gives the proxy a typed MITM hook model, validation, matcher
compilation, permissions TOML plumbing, schema support, and config
tests.
3. This PR deliberately does not change CONNECT or MITM request
handling.
4. Keeping runtime behavior out of this PR makes the review boundary
simple: does the policy model parse, validate, compile, and lower
correctly.

## Summary
1. Add the MITM hook config model and matcher compilation.
2. Validate hosts, methods, paths, query matchers, header matchers,
secret sources, and reserved body matching.
3. Add wildcard matcher support for path, query value, and header value
matching.
4. Add permissions TOML and schema support for flat runtime hook config.
5. Add config loader tests for MITM hook overlay behavior.

## Validation
1. Regenerated the config schema.
2. Ran the network proxy MITM hook unit tests.
3. Ran the core permission profile MITM hook parsing tests.
4. Ran the core config schema fixture test.
5. Ran the scoped Clippy fixer for the network proxy crate.
6. Ran the scoped Clippy fixer for the core crate.

## Notes
1. Runtime enforcement moved to #20659.
2. User facing PermissionProfile TOML shape remains in #18240.
2026-05-20 12:51:12 -07:00
jif-oai
c5bd131567 feat: add turn_id and truncation_policy to extension tool calls (#23666)
## Why

Extension-owned tools currently receive a stripped `ToolCall` with only
`call_id`, `tool_name`, and `payload`.
That makes extension work that needs turn-local execution context
awkward, especially web-search extension work that needs the active
`truncation_policy` at tool invocation time.

Reconstructing that value from config or `ExtensionData` would be
indirect and could drift from the actual turn context, so the cleaner
fix is to pass the needed turn metadata directly on the extension-facing
invocation type.

## What changed

- added `turn_id` and `truncation_policy` to `codex_tools::ToolCall`
- populated those fields when core adapts `ToolInvocation` into an
extension tool call
- added a focused adapter test that verifies extension executors receive
the forwarded turn metadata
- updated the memories extension tests to construct the richer
`ToolCall`
- added the `codex-utils-output-truncation` dependency to `codex-tools`
and refreshed lockfiles

## Testing

- `cargo test -p codex-tools`
- `cargo test -p codex-memories-extension`
- `cargo test -p codex-core passes_turn_fields_to_extension_call`
- `just bazel-lock-update`
- `just bazel-lock-check`
2026-05-20 20:14:41 +02:00
Abhinav Vedmala
15fed46b20 Merge remote-tracking branch 'origin/abhinav/subagent-stop-stack' into abhinav/subagent-hook-context-stack
# Conflicts:
#	codex-rs/core/src/hook_runtime.rs
#	codex-rs/core/tests/suite/subagent_notifications.rs
2026-05-20 10:14:44 -07:00
jif-oai
d4f842f3b3 feat: account active goal progress in the goal extension (#23696)
## Why

The goal extension can create and surface goals, but the live
turn-accounting path still stopped short of persisting active-goal
progress. That leaves token and wall-clock usage, plus
`ThreadGoalUpdated` events, out of sync with the extension boundary once
work actually advances or a goal transitions out of active state.

## What changed

- Teach `GoalAccountingState` to track the current turn, active goal,
token deltas, and wall-clock progress snapshots against the persisted
goal id.
- Flush active-goal accounting from tool-finish, turn-stop, and
turn-abort lifecycle hooks, and emit `ThreadGoalUpdated` events when
persisted progress changes.
- Route `create_goal` and `update_goal` through the same accounting
state so new goals start from the right baseline, final progress is
flushed before status changes, and `update_goal` can mark a goal
`blocked` as well as `complete`.
- Keep budget-limited goals accruing through the end of the turn while
clearing local active-goal state once a turn or explicit update is
finished.
- Expand backend and lifecycle coverage around store ids, baseline
reset, tool-finish accounting, budget-limited carry-through, and
blocked-goal updates.

## Testing

- Added focused backend coverage in
`codex-rs/ext/goal/tests/goal_extension_backend.rs` for baseline reset,
tool-finish accounting, budget-limited turns, and blocked-goal updates.
- Extended `codex-rs/core/src/session/tests.rs` to assert that lifecycle
inputs expose the expected session, thread, and turn store ids.
2026-05-20 18:36:37 +02:00
pakrym-oai
a52c91d8b5 [codex] Hide deferred tools from code mode prompt (#23605)
## Why

`code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
was failing because code-mode prompt generation used the same nested
tool spec list for both the model-visible `exec` guide and the runtime
`ALL_TOOLS` surface. That allowed deferred MCP/app tools, such as
`calendar_timezone_option_99`, to leak into the `exec` description even
though they should only be discoverable through `ALL_TOOLS` at runtime.

## What changed

Split code-mode nested tool planning into two sets in
`core/src/tools/spec_plan.rs`:

- runtime nested tool specs still include deferred tools, so
`tools[...]` and `ALL_TOOLS` can call them
- `exec` prompt docs only render non-deferred tools, so deferred app
tools stay out of the model-visible guide

## Validation

- `cargo test -p codex-core --test all
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools --
--nocapture`
- looped the same focused test 5 additional times with `cargo test -q -p
codex-core --test all
code_mode_only_guides_all_tools_search_and_calls_deferred_app_tools`
2026-05-20 08:09:45 -07:00
jif-oai
59507b8491 feat: expose turn-start metadata to extensions (#23688)
## Why

The goal extension needs more context when a turn starts than
`turn_store` alone provides.

In particular, goal accounting needs the stable turn id, the effective
collaboration mode, and the cumulative token-usage baseline captured at
turn start so it can:

- suppress goal accounting for plan-mode turns
- compute exact per-turn deltas from cumulative `total_token_usage`
snapshots instead of relying on the most recent usage event alone
- keep the extension-owned accounting path aligned with the host turn
lifecycle

## What

- extend `codex_extension_api::TurnStartInput` to expose `turn_id`,
`collaboration_mode`, and `token_usage_at_turn_start`
- pass the full `TurnContext` plus the captured token-usage baseline
through the turn-start lifecycle emission path
- initialize goal turn accounting from the turn-start baseline and
collaboration mode
- switch goal token accounting to compute deltas from cumulative
`total_token_usage` snapshots
- add coverage for the new turn-start lifecycle fields and for
goal-accounting baseline behavior

## Testing

- added `turn_start_lifecycle_exposes_turn_metadata_and_token_baseline`
in `codex-rs/core/src/session/tests.rs`
- added `ext/goal/tests/accounting.rs` coverage for baseline-aware goal
accounting and plan-mode suppression
2026-05-20 15:54:29 +02:00
jif-oai
1392a2a770 feat: async turn item process (#23692)
Mechanical change
2026-05-20 15:30:01 +02:00
jif-oai
9483b09ea4 feat: rename 2 (#23668)
Just a mechanical renaming
2026-05-20 12:11:44 +02:00
jif-oai
66d5edf825 feat: rename 3 (#23669)
Just a mechanical renaming
2026-05-20 12:07:06 +02:00
jif-oai
93456320ef feat: rename 1 (#23667)
Just a mechanical renaming
2026-05-20 12:05:58 +02:00
jif-oai
18cefba922 Add timeout for remote compaction requests (#23451)
## Why

Remote compaction currently sends a unary `POST /responses/compact` and
waits for the full response before replacing history or emitting the
completed `ContextCompaction` item. Unlike normal `/responses` streaming
requests, this unary compact request had no timeout boundary. If the
backend accepts the request and then stalls before returning a body, the
existing request retry policy never sees a transport error, so the
compact turn can remain stuck after the started item with no completion
or actionable error.

That matches the reported hang shape in issues such as #18363, where
logs show `responses/compact` was posted but no corresponding compact
completion followed. A bounded request timeout gives the existing retry
policy a concrete timeout error to retry instead of letting the user sit
indefinitely on automatic context compaction.

## What

- Add a request timeout to legacy `/responses/compact` calls.
- Size that timeout from the provider stream idle timeout with a
conservative multiplier, so the default compact attempt gets 20 minutes
rather than the 5 minute stream idle window.
- Map API transport timeouts to a request timeout error instead of the
child-process timeout message.

## Testing

- Not run (per request; CI will cover).
2026-05-20 11:56:00 +02:00
richardopenai
000bf5ce6d Migrate exec-server remote registration to environments (#23633)
## Summary
- migrate exec-server remote registration naming from executor to
environment
- align CLI, public Rust exports, registry error messages, and relay
test fixtures with the environment registry contract
- keep the live registration path and response model consistent with
`/cloud/environment/{environment_id}/register`

## Verification
- `cargo test -p codex-exec-server
remote::tests::register_environment_posts_with_auth_provider_headers
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo test -p codex-exec-server --test relay
multiplexed_remote_environment_routes_independent_virtual_streams
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo check -p codex-cli --manifest-path
/Users/richardlee/code/codex/codex-rs/Cargo.toml` (still running when PR
opened; will update after completion if needed)
2026-05-20 00:25:04 -07:00
sayan-oai
34aad43684 add encryptedcontent to functioncalloutput (#23500)
add new `EncryptedContent` variant to `FunctionCallOutputContentItem`
ahead of standalone websearch.

we need to be able to receive and pass encrypted function call output
from the new web search endpoint back to responsesapi, as we cannot
expose direct search results.
2026-05-19 23:47:48 -07:00
Eric Traut
9dda71dbae Warn on invalid UTF-8 in AGENTS.md files (#23232)
Fixes #23223.

## Why

Malformed AGENTS instructions should not fail silently. The reported
issue had invalid UTF-8 in a global `AGENTS.md`; before this change,
Codex treated that decode failure like a missing file, so the personal
instructions disappeared without a user-visible explanation and the
rollout had no `# AGENTS.md instructions` block.

Project-level AGENTS files already used lossy decoding, so their
instructions still appeared, but invalid bytes were replaced without
telling the user. Global and project AGENTS files should behave
consistently: keep usable instruction text when possible, and surface a
diagnostic when bytes had to be replaced.

## What changed

Global `AGENTS.override.md` and `AGENTS.md` loading now reads bytes and
decodes with replacement characters on invalid UTF-8, matching
project-level AGENTS behavior. Both global and project AGENTS loading
now emit a startup warning when invalid UTF-8 is found, and both keep
the instruction text with invalid byte sequences replaced.

Missing files, non-file candidates, empty files, and the existing
`AGENTS.override.md` before `AGENTS.md` precedence keep their current
behavior.

## How users see it

The warnings flow through the existing startup warning surface.
App-server clients receive config-time startup warnings as
`configWarning` notifications during initialization, and thread startup
emits startup warnings as thread-scoped `warning` notifications.

Global AGENTS invalid UTF-8 warnings can appear on both surfaces.
Project-level AGENTS invalid UTF-8 warnings are discovered while
building thread instructions, so they appear as thread-scoped `warning`
notifications. Clients that render warning notifications in the
conversation surface show the message as a visible diagnostic instead of
silently hiding or altering instructions.
2026-05-19 21:56:46 -07:00
Ahmed Ibrahim
5a4202ad90 [codex] Preserve raw code-mode exec output by default (#23564)
## Why
Code mode can use nested unified exec calls as data sources. When those
calls omit `max_output_tokens`, code mode should receive raw command
output so the script can parse or summarize it itself. When code mode
does provide `max_output_tokens`, that explicit nested budget should be
respected, including values above the default unified exec limit, rather
than being capped before code mode sees the result.

## What
- Preserve direct unified exec truncation behavior, while letting
code-mode exec/write_stdin keep `max_output_tokens` as `None` unless
explicitly supplied.
- Make code-mode tool results use raw output when no explicit limit is
present, and use the explicit nested limit directly when one is
specified.
- Refactor unified exec output formatting so `truncated_output` takes
the caller-selected token budget.
- Add e2e integration coverage for explicit nested exec limits, omitted
nested exec limits, outer exec limit propagation, omitted-limit outputs
that exceed both the default and a small truncation policy, explicit
nested limits above those caps, and high explicit limits that still
compact larger command output.
- Reuse the code-mode turn setup helper while directly asserting the
exact exec output item in each test.

## Testing
- `just fmt`
- `git diff --check`
- Not run locally per repo guidance; CI should validate the e2e
integration tests.
2026-05-20 04:02:14 +00:00
Eric Traut
e43a2e297f Fix stale background terminal poll events (#23231)
## Why

Issue #23214 reports `/ps` showing no background terminals while the
status line still says it is waiting for a background terminal. The race
is in core: `write_stdin` can poll a process that exits before the
response returns. The process manager correctly returns `process_id:
None`, but the handler still emitted a `TerminalInteraction` event using
the requested session id, causing clients to believe a dead process was
still being polled.

Fixes #23214.

## What changed

- Suppress `TerminalInteraction` events for empty `write_stdin` polls
once `response.process_id` is `None`.
- Continue emitting interactions for non-empty stdin, even if that input
causes the process to exit before the response returns.
- Extend the unified exec integration test to assert completed empty
polls do not emit terminal interactions.

## Verification

- `cargo test -p codex-core --test all
unified_exec_emits_one_begin_and_one_end_event`
- `cargo test -p codex-core --test all
unified_exec_emits_terminal_interaction_for_write_stdin`

`cargo test -p codex-core` currently aborts in unrelated
`agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
with a reproducible stack overflow.
2026-05-19 20:48:37 -07:00
Ahmed Ibrahim
532b9c83ae Move plugin and skill warmup into session startup (#23535)
## Why

Plugin and skill loading is useful as warmup and early validation, but
session startup does not need to wait for that work before it can
continue building the session. Keeping it on the serial startup path
adds avoidable latency to every fresh thread start.

We still want invalid skill configurations to show up quickly, and we
want the warmup to exercise the same plugin and skill manager caches
that the normal turn path uses.

## What changed

- moved plugin and skill warmup into the session startup async path
instead of eagerly awaiting it on the serial setup path
- kept the warmup using the session's resolved filesystem/environment
context so skill loading still sees the right roots
- preserved early skill-load error logging so broken skill
configurations still surface during startup
- left the per-turn plugin and skill loading path unchanged, so turns
still use the normal cached managers

## Testing

- Not run locally; relying on CI for validation.
2026-05-19 20:05:52 -07:00
viyatb-oai
c3faea0b09 feat: add permission profile list api (#23412)
## Why

Clients need a typed permission-profile catalog instead of
reconstructing that state from config internals.

## What changed

- Added `permissionProfile/list` to the app-server v2 protocol with
cursor pagination and optional `cwd`.
- The list response includes built-in permission profiles plus
config-defined `[permissions.<id>]` profiles from the effective config
for the request context.
- Permission profiles keep optional `description` metadata for display
purposes.
- App-server docs and schema fixtures are updated for the new RPC.
2026-05-20 02:42:56 +00:00
Michael Bolin
c58c84d6ee test: fix multi-agent service tier assertion (#23576)
## Why

`openai/codex#22169` added a regression test that expects an invalid
child `service_tier` to be rejected, but the test used
`Result::expect_err` on `SpawnAgentHandler::handle`. That requires the
`Ok` type to implement `Debug`, and this handler returns `Box<dyn
ToolOutput>`, so Bazel failed while compiling `codex-core` tests before
it could run them.

## What changed

- Capture the handler result and assert on `result.err()` instead of
calling `expect_err`.
- Keep the same `FunctionCallError::RespondToModel` assertion for the
rejected service tier.

## Verification

- `cargo test -p codex-core
spawn_agent_role_service_tier_does_not_hide_invalid_spawn_request`
2026-05-19 16:47:20 -07:00
Matthew Zeng
b019a678d8 Remove unused ARC monitor path (#23573)
## Summary
- remove the unreachable ARC monitor path from MCP tool approval
handling
- delete the unused ARC monitor module/tests and trim the orphaned
safety-monitor decision plumbing
- keep `always allow` approvals on the existing auto-approval
short-circuit without a dead monitor hop

## Testing
- `cargo test -p codex-core mcp_tool_call`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`

## Additional validation
- Attempted `cargo test -p codex-core`; the library test target passed,
then the integration target failed in this local environment.
- The narrower MCP-focused rerun passed its unit coverage and only hit
missing local `test_stdio_server` binaries in filtered integration
cases.
2026-05-19 16:23:25 -07:00
adams-oai
d86352d520 Add CUA requirements subsection for locked computer use (#23555)
Adds a new top-level section for "CUA" requirements that can allow for
disablement of specific features as needed for enterprises.
2026-05-19 15:41:44 -07:00
Ahmed Ibrahim
c53da029bc [codex] Honor role-defined spawn service tiers (#22169)
## Why
Custom agent roles are ordinary config layers, so a role file can
already express `service_tier` just like other config values. The
spawned-agent tier path needs to preserve that effective role config and
follow the same precedence pattern as model/reasoning.

## What changed
- Apply an explicit spawn-time `service_tier` onto the child config
before role application, so a role config layer can override it just
like role-defined model/reasoning settings do.
- Validate the final effective child tier after the final child model is
known, while still falling back to the parent tier when no child tier
survives.
- Add focused integration coverage for both v1 and v2 proving role TOML
loads a service tier, spawned children keep that role-configured tier,
and a role tier wins over a conflicting spawn-time tier.

## Validation
- `just fmt`
- `git diff --check`
- Local Rust tests not run, per repo guidance; CI should exercise the
new coverage.
2026-05-19 22:40:41 +00:00
Matthew Zeng
8335b56c33 Split plugin install discovery into list and request tools (#23372)
## Summary
- Add `list_available_plugins_to_install` as the inventory step for
plugin and connector install suggestions.
- Slim `request_plugin_install` so it only handles the actual
elicitation, instead of carrying the full discoverable list in its
prompt.
- Emit send-time telemetry when an install elicitation is dispatched,
including requested tool identity in the event payload.
- Emit install-result telemetry through `SessionTelemetry`, including
tool type, user response action, and completion status.
- Update registration and tests to cover the new two-step flow while
keeping the existing `tool_suggest` feature gate unchanged.

## Testing
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core request_plugin_install`
- `cargo test -p codex-core list_available_plugins_to_install`
- `cargo test -p codex-core
install_suggestion_tools_can_be_registered_without_search_tool`
- `cargo test -p codex-otel
manager_records_plugin_install_suggestion_metric`
- `cargo test -p codex-otel
manager_records_plugin_install_elicitation_sent_metric`
- `just fix -p codex-core`
- `just fix -p codex-tools`
- `just fix -p codex-otel`
- `cargo check -p codex-core`
2026-05-19 14:45:37 -07:00
Abhinav Vedmala
a2c6e1b6b0 Merge main into abhinav/subagent-stop-stack 2026-05-19 13:40:03 -07:00
starr-openai
5c43a64e2b Make local environment optional in EnvironmentManager (#23369)
## Summary
- make `EnvironmentManager` local environment/runtime paths optional
- simplify constructor surface around snapshot materialization
- rename local env accessors to `require_local_environment` /
`try_local_environment`

## Validation
- devbox Bazel build for touched crate surfaces
- `//codex-rs/exec-server:exec-server-unit-tests`
- `//codex-rs/app-server-client:app-server-client-unit-tests`
- filtered touched `//codex-rs/core:core-unit-tests` cases
2026-05-19 12:55:34 -07:00
Abhinav
d661ab70ed Add SubagentStart hook (#22782)
# What

`SubagentStart` runs once when Codex creates a thread-spawned subagent,
before that child sends its first model request. Thread-spawned
subagents use `SubagentStart` instead of the normal root-agent
`SessionStart` hook.

Configured handlers match on the subagent `agent_type`, using the same
value passed to `spawn_agent`. When no agent type is specified, Codex
uses the default agent type.

Hook input includes the normal session-start fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.

`SubagentStart` may return `hookSpecificOutput.additionalContext`. That
context is added to the child conversation before the first model
request.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStart`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `SessionStart` hooks and do not run
`SubagentStart`. This avoids exposing synthetic matcher labels for
internal implementation paths.

Also the `SessionStart` hook no longer fires for subagents, this matches
behavior with other coding agents' implementation

# Stack

1. This PR: add `SubagentStart`.
2. #22873: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.
2026-05-19 12:45:08 -07:00
viyatb-oai
3c76081876 Make deny canonical for filesystem permission entries (#23493)
## Why
Filesystem permission profiles used `none` for deny-read entries, which
is less direct than the action the entry actually represents. This
change makes `deny` the canonical filesystem permission spelling while
preserving compatibility for older configs that still send `none`.

## What changed
- rename `FileSystemAccessMode::None` to `Deny`
- serialize and generate schemas with `deny` as the canonical value
- retain `none` only as a legacy input alias for temporary config
compatibility
- update filesystem glob diagnostics and regression coverage to use the
canonical spelling
- refresh config and app-server schema fixtures to match the new wire
shape

## Validation
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core config_toml_deserializes_permission_profiles
--lib`
- `cargo test -p codex-core
read_write_glob_patterns_still_reject_non_subpath_globs --lib`

Earlier in the session, a broad `cargo test -p codex-core` run reached
unrelated pre-existing failures in timing/snapshot/git-info tests under
this environment; the targeted surfaces touched by this PR passed
cleanly.
2026-05-19 11:03:47 -07:00
jif-oai
05b8ce4354 chore: namespace v1 sub-agent tools (#23475)
## Why

The v1 sub-agent tools are a single tool family, but they were exposed
as separate flat function tools. This makes the model-visible surface
less clearly grouped and leaves the legacy names in the same flat
namespace as newer agent tooling.

## What

- Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`,
`wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace.
- Registers the corresponding handlers with namespaced runtime tool
names.
- Updates tool-planning, deferred tool search, and sub-agent
notification tests to assert the namespace shape and child `spawn_agent`
lookup.

## Verification

- Updated `codex-core` coverage for the v1 multi-agent tool plan,
deferred tool search output, and sub-agent tool descriptions.
2026-05-19 19:46:17 +02:00
pakrym-oai
ccbf0137db [codex] Make contextual user fragments dyn-renderable (#23397)
## Why
`ContextualUserFragment` needs to be usable behind `dyn` for render-only
paths, but associated constants made the trait non-object-safe.

## What changed
- Replaced associated constants with trait methods so `dyn
ContextualUserFragment` can render fragments.
- Preserved the existing typed `T::matches_text(text)` registration
pattern via `type_markers()`.
- Kept default `render()` on the main trait so implementations only
provide role, markers, and body.
- Added unit coverage for rendering a `Box<dyn ContextualUserFragment>`.

## Verification
- `cargo test -p codex-core contextual_user_fragment_is_dyn_compatible`
- `just fix -p codex-core`
2026-05-19 10:42:54 -07:00
pakrym-oai
f0663fd4fd [codex] Preserve steer input as user input (#23405)
## Why

Steered input was queued as a `ResponseInputItem`, then parsed back into
a user message before recording. That path loses information that only
exists on `UserInput`, such as UI text elements.

This change keeps turn-local pending input typed as either original
`UserInput` or existing response items, so steered user input reaches
user-message recording without being reconstructed from a response item.

## What changed

- Add `TurnInput` for active-turn pending input.
- Queue `Session::steer_input` as `TurnInput::UserInput`.
- Run pending-input hook inspection only for `TurnInput::UserInput`.
- Process drained pending input item by item: accepted items are
recorded, blocked items append hook context and are skipped.
- Remove the pending-input prepend/requeue path.

## Validation

- `just fmt`
- `just fix -p codex-core`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib
session::tests::task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input
-- --nocapture`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib steer_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib pending_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all
pending_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core` (unit tests passed:
1835 passed, 0 failed, 4 ignored; integration `all` target failed due
missing helper binaries such as `codex`/`test_stdio_server` plus
unrelated MCP/search/code-mode expectations)
2026-05-19 09:47:43 -07:00
pakrym-oai
9289b7cea8 [codex] Move hook request plumbing into hook runtime (#23388)
## Why

`run_turn` was still hand-building hook payloads and lifecycle events
for a couple of hook paths. Most hook call sites already delegate
request construction and event emission to `hook_runtime`, which keeps
turn orchestration focused on model-flow decisions rather than hook
plumbing.

This also keeps the legacy `after_agent` message extraction next to the
legacy hook dispatch instead of leaving response-item walking in
`run_turn`.

## What changed

- Added `run_stop_hooks` in `hook_runtime` to build `StopRequest`, emit
preview start events, run the hook, and emit completion events.
- Added `run_legacy_after_agent_hook` in `hook_runtime` to build and
dispatch the legacy `AfterAgent` hook payload, including extracting
input messages from response items.
- Updated `run_turn` to call the hook runtime helpers and keep only the
resulting continuation/block/stop decisions inline.
- Removed the repeated pending session-start hook check from the run
loop.

## Validation

- `cargo test -p codex-core hook_runtime`
2026-05-19 08:41:26 -07:00
pakrym-oai
ef24ef127f [codex] Allow empty turn/start requests (#23409)
## Why

`turn/start` already accepts an input array on the wire, including an
empty array, but core treated empty input as a no-op before the turn
could reach the model. App-server clients need to be able to start a
real turn even when there is no new user message, for example to let the
model proceed from existing thread context.

## What changed

- Removed the `run_turn` early return that skipped empty-input turns
when there was no pending input.
- Kept empty active-turn steering rejected by moving the `steer_input`
empty-input check until after core has determined whether there is an
active regular turn.
- Empty regular turns now refresh `previous_turn_settings` like other
regular turns, so follow-up context injection state advances
consistently.
- Added an app-server v2 integration test proving `turn/start` with
`input: []` emits started/completed notifications, sends one Responses
request, and does not synthesize an empty user message.

## Validation

- `cargo test -p codex-app-server --test all
turn_start_with_empty_input_runs_model_request`
2026-05-19 08:39:45 -07:00
jif-oai
b3ae3de405 Defer v1 multi-agent tools behind tool search (#23144)
Summary: defer v1 multi-agent tools when tool_search and namespace tools
are available; keep concise searchable descriptions and move the v1
usage guidance into developer instructions; add targeted coverage.
Testing: not run per request; ran just fmt.
2026-05-19 15:04:35 +02:00
jif-oai
80fdd4688f Add body_after_prefix auto-compact token limit scope (#22870)
## Why

`model_auto_compact_token_limit` has only been able to budget the full
active context. That makes it hard to set a small "growth since
compaction" budget for sessions that preserve a large carried window
prefix: the preserved prefix can consume the whole budget and force
immediate repeated compaction.

This PR adds an opt-in `body_after_prefix` scope so callers can apply
`model_auto_compact_token_limit` to sampled output and later growth
after the current carried prefix, while still forcing compaction before
the full model context window is exhausted.

## What changed

- Adds `AutoCompactTokenLimitScope` with the existing `total` behavior
as the default and a new `body_after_prefix` mode:
[`config_types.rs`](973806b1cb/codex-rs/protocol/src/config_types.rs (L24-L37)).
- Threads `model_auto_compact_token_limit_scope` through config loading,
`Config`, `core-api`, and app-server v2 schema/TypeScript generation.
- Records the first observed input-token count for a `body_after_prefix`
compaction window and uses it as the baseline when deciding whether the
scoped auto-compaction budget is exhausted:
[`turn.rs`](973806b1cb/codex-rs/core/src/session/turn.rs (L743-L781)).
- Keeps a hard context-window cap in `body_after_prefix`, so scoped
budgeting cannot let the active context overrun the usable window.

## Verification

Added compact-suite coverage for the two key behaviors:
`body_after_prefix` does not re-compact just because the carried prefix
is larger than the scoped budget, and it still compacts when the total
active context reaches the configured context window:
[`compact.rs`](973806b1cb/codex-rs/core/tests/suite/compact.rs (L3003-L3128)).
2026-05-19 10:19:46 +00:00
jif-oai
05e171094d Remove ToolsConfig from tool planning (#22835)
## Why

`codex-tools` is meant to hold reusable tool primitives, but
`ToolsConfig` had become a second copy of core runtime decisions instead
of a small shared contract. It carried provider capabilities, auth/model
gates, permission and environment state, web/search/image feature gates,
multi-agent settings, and goal availability from core into `codex-tools`
([definition](22dd9ad392/codex-rs/tools/src/tool_config.rs (L97)),
[stored on each
`TurnContext`](22dd9ad392/codex-rs/core/src/session/turn_context.rs (L87))).
Every session/context variant then had to build and mutate that snapshot
before assembling tools.

This PR removes that master object instead of renaming it. Tool planning
now reads the live `TurnContext`, where `codex-core` already owns those
decisions, while `codex-tools` keeps only reusable primitives and a
generic `ToolSetBuilder`/`ToolSet` accumulator.

## What Changed

- Removed `ToolsConfig` / `ToolsConfigParams` from `codex-tools`; the
crate keeps the shared helpers that still belong there, including
request-user-input mode selection, shell backend/type resolution,
`UnifiedExecShellMode`, and `ToolEnvironmentMode`.
- Replaced config-snapshot planning with `ToolRouter::from_turn_context`
and a `spec_plan` pipeline over `CoreToolPlanContext`, deriving provider
capabilities, auth gates, model support, feature gates, environment
count, goal support, multi-agent options, web search, and image
generation from the authoritative turn state.
- Added generic `codex_tools::ToolSetBuilder` / `ToolSet`, plus the
small core adapter needed to accumulate `CoreToolRuntime` values and
hosted model specs.
- Added the `tool_family::shell` registration module and moved
shell/unified-exec/memory accounting call sites to read the narrow
per-turn fields directly.
- Narrowed `TurnContext` to the remaining explicit per-turn fields
needed by planning: `available_models`, `unified_exec_shell_mode`, and
`goal_tools_supported`.
- Reworked MCP exposure and tool-search setup so deferred/direct MCP
behavior is driven by the current turn rather than a precomputed config
snapshot.
- Replaced the large expected-spec fixture tests with focused
behavior-level coverage for shell tools, environments, goal and
agent-job gates, MCP direct/deferred exposure, tool search,
request-plugin-install, code mode, multi-agent mode, hosted tools, and
extension executor dispatch.

## Verification

- `cargo check -p codex-tools`
- `cargo check -p codex-core --lib`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spec_plan --lib`
- `cargo test -p codex-core router --lib`
2026-05-19 11:24:09 +02:00
jif-oai
ba57aab13a feat: dedicated goal DB (#23300)
## Why

Thread goals are moving toward extension-owned runtime behavior, but
their persisted state was still stored in the shared state database.
This makes the goal store harder to isolate and keeps future storage
splits tied to ad hoc runtime plumbing.

This PR gives goals their own SQLite database while keeping the existing
`StateRuntime` entry point. The goal is to make this the pattern for
adding more dedicated runtime databases later.

This also reduce load on existing DB and reduce contention

## Limitation
Thread preview from goal is not supported anymore. I'm looking into this
[EDIT]: solved

## What changed

- Added a dedicated `goals_1.sqlite` database with its own
`goals_migrations` directory.
- Moved `thread_goals` creation into the goals DB migration set.
- Dropped the old `thread_goals` table from the main state DB with a
normal state migration. There is intentionally no backfill for existing
goal rows.
- Changed `GoalStore` to be backed only by the goals DB pool.
- Removed the old goal-write side effect that filled empty
`threads.preview` values from the goal objective.
- Added shared runtime DB path metadata so startup, telemetry, `codex
doctor`, and repair handling can include future DBs without bespoke path
lists.
- Updated Bazel compile data so the new goals migration directory is
available to `sqlx::migrate!`.

## Verification

- `cargo check --tests -p codex-state -p codex-cli -p codex-core -p
codex-app-server`
- `just fix -p codex-state`
- `just fix -p codex-cli`
- `just fix -p codex-app-server`
2026-05-19 11:11:41 +02:00
jif-oai
826b2182ed Preserve context baselines for full-history agent forks (#23352)
## Why

Full-history agent forks should continue from the same prompt prefix as
the parent. Dropping the stored `TurnContext` baseline forced the child
to rebuild startup context on its first turn, which can duplicate
developer instructions and also loses the cache continuity that a
full-history fork is supposed to preserve.

Truncated forks are different: once we keep only the last N turns, the
original prompt prefix is no longer intact, so the child must establish
a fresh context baseline.

## What changed

- Preserve `RolloutItem::TurnContext` when forking with
`SpawnAgentForkMode::FullHistory`, and keep dropping it for truncated
forks:
4090717d94/codex-rs/core/src/agent/control.rs (L98-L126)
and
4090717d94/codex-rs/core/src/agent/control.rs (L399-L401)
- Remove the special-case MultiAgentV2 usage-hint filtering path.
Full-history fork now preserves the cached developer prefix instead of
trying to reconstruct part of it.
- Extend the fork coverage to assert both sides of the contract:
full-history forks keep the parent reference baseline, while last-N
forks rebuild context after truncation:
4090717d94/codex-rs/core/src/agent/control_tests.rs (L603-L759)
and
4090717d94/codex-rs/core/src/agent/control_tests.rs (L854-L977)

## Verification

- `cargo test -p codex-core
spawn_agent_can_fork_parent_thread_history_with_sanitized_items --
--nocapture`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core
spawn_agent_fork_last_n_turns_keeps_only_recent_turns -- --nocapture`
2026-05-19 10:34:24 +02:00
viyatb-oai
3009e23644 core: expose permission profile picker metadata (#22928)
## Why

The `/permissions` picker needs a config-level way to distinguish legacy
anonymous presets from named permission-profile mode. That signal cannot
be inferred reliably in the TUI, especially for the edge case where
`default_permissions = ":workspace"` is present without a
`[permissions]` table.

## What changed

- Expose whether the merged config is explicitly in permission-profile
mode.
- Expose the configured custom permission profile IDs alongside the
built-in profile semantics.
- Add regression coverage for profile mode detection and custom profile
metadata, including the `default_permissions = ":workspace"` case.
- Update the thread-manager sample config literal to match the expanded
config shape.

## Stack

1. **This PR**: config metadata needed by downstream permission-profile
consumers.
2. [#22931](https://github.com/openai/codex/pull/22931): refresh active
permission profiles through runtime/session/network state.
3. [#21559](https://github.com/openai/codex/pull/21559): switch
`/permissions` to the profile-aware TUI picker.

## Verification

- `cargo check -p codex-thread-manager-sample`
- `cargo test -p codex-core
default_permissions_can_select_builtin_profile_without_permissions_table`
- `cargo test -p codex-core
permissions_profiles_allow_direct_write_roots_outside_workspace_root`
2026-05-18 23:26:17 -07:00
sayan-oai
1dd9bf9a74 Remove explicit connector tool undeferral (#23390)
## Summary
- remove the explicit-connector carveout that kept mentioned app tools
directly exposed instead of deferred
- keep the surviving explicit-mention reconstruction only for analytics,
preserving `codex_app_mentioned` and `codex_app_used.invoke_type`
- trim the now-unused prompt/tool-exposure plumbing and refresh coverage
around always-defer behavior

## Verification
- `just fmt`
- `cargo test -p codex-analytics`
- `cargo test -p codex-core` *(one transient timeout in
`shell_snapshot::tests::macos_zsh_snapshot_includes_sections`; isolated
rerun passed)*
- `cargo test -p codex-core --lib
shell_snapshot::tests::macos_zsh_snapshot_includes_sections`
- `cargo test -p codex-core --test all
explicit_app_mentions_respect_always_defer`
- `cargo test -p codex-core --lib
mcp_tool_exposure::tests::always_defer_feature_defers_apps_too`
- `just fix -p codex-analytics`
- `just fix -p codex-core`
2026-05-18 21:33:46 -07:00