Commit Graph

1162 Commits

Author SHA1 Message Date
pakrym-oai
46391f7efa [codex] remove plain image wrapper spans (#24652)
## Why

Remote image submissions currently wrap native `input_image` spans in
literal `<image>` and `</image>` text spans. Those extra prompt tokens
add structure without providing label or routing information.

## What Changed

- Serialize `UserInput::Image` directly as an `input_image` content
span.
- Preserve named local-image framing and legacy wrapper parsing for
labeled attachments and existing histories.
- Update existing request-shape expectations for drag-and-drop images,
model switching, and compaction.

## Validation

- `just test -p codex-protocol`
- Focused `codex-core` run covering
`drag_drop_image_persists_rollout_request_shape`,
`model_change_from_image_to_text_strips_prior_image_content`, and
`snapshot_request_shape_pre_turn_compaction_including_incoming_user_message`

## Notes

- A broader `just test -p codex-core` run was attempted; the affected
tests passed, while the overall run failed in unrelated CLI, MCP, and
tooling tests plus a `thread_manager` timeout.
2026-05-26 15:49:37 -07:00
Curtis 'Fjord' Hawthorne
675cb1afbd Clarify view_image tool description (#23949) 2026-05-26 14:17:43 -07:00
Owen Lin
1911021c0e Add forked_from_thread_id turn metadata (#24160)
## Why

When Codex calls responsesapi, we currently send `session_id`,
`thread_id`, and `turn_id` among other things as
`client_metadata["x-codex-turn-metadata"]`. This PR adds
`forked_from_thread_id` which helps explain the "lineage" of a forked
thread.

## What's changed

- Track the immediate history source copied into a forked thread through
thread/session creation, including subagent and review turn metadata
paths.
- Include `forked_from_thread_id` in Codex turn metadata while
preventing turn-scoped Responses API client metadata from overwriting
Codex-owned lineage fields.
- Add coverage for fork lineage in turn metadata and the app-server
Responses API request path.
2026-05-26 14:05:28 -07:00
pakrym-oai
768848ab6f Add experimental turn additional context (#24154)
## Summary

Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.

## API Shape

The parameter shape is:

```ts
additionalContext?: Record<string, {
  value: string
  kind: "untrusted" | "application"
}> | null
```

Example:

```json
{
  "additionalContext": {
    "browser_info": {
      "value": "Active tab is CI failures.",
      "kind": "untrusted"
    },
    "automation_info": {
      "value": "CI rerun is in progress.",
      "kind": "application"
    }
  }
}
```

The keys are opaque and caller-defined.

## Context Injection

When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.

`kind: "untrusted"` entries are inserted with role `user`:

```text
<external_${key}>${value}</external_${key}>
```

`kind: "application"` entries are inserted with role `developer`:

```text
<${key}>${value}</${key}>
```

Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.

For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.

## Dedupe Strategy

`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.

Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.

Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.

## What Changed

- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.
2026-05-26 13:02:34 -07:00
jif-oai
9f47e19b21 test: clean up apply_patch allow-session artifact (#24611)
## Why

The
`approving_apply_patch_for_session_skips_future_prompts_for_same_file`
integration test writes `apply_patch_allow_session.txt` under the
process cwd while exercising outside-workspace patch approval behavior.
With `just test` now being the normal validation path, that file can be
left behind in the checkout when the test runs or fails, creating
confusing untracked state.

## What changed

- Registers the resolved `apply_patch_allow_session.txt` path with
`tempfile::TempPath` before the test removes and recreates it through
`apply_patch`.
- Preserves the existing outside-workspace path shape so the approval
behavior under test does not change.
- Lets `TempPath` remove the generated file when the test exits,
including panic paths.

## Verification

- `just test -p codex-core --test all
approving_apply_patch_for_session_skips_future_prompts_for_same_file`
2026-05-26 18:54:59 +02:00
pakrym-oai
ff7513cd83 Move MCP tool naming mode into manager (#21576)
## Why

The `non_prefixed_mcp_tool_names` feature should be applied where MCP
tools become model-visible, not by remapping names later in core.
Keeping the decision in `McpConnectionManager` construction makes
`ToolInfo` the single shaped view that spec building, deferred tool
search, routing, and unavailable-tool placeholders can consume directly.

This also preserves the existing external behavior while the feature is
off, and keeps the feature-on behavior for code mode and hooks explicit
at the manager boundary.

## What Changed

- Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
into `McpConnectionManager::new`.
- Normalize MCP `ToolInfo` names in the manager using either
legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
adds `mcp__` without restoring the old trailing namespace suffix.
- Remove the core-side MCP name remapping path so specs, tool search,
session resolution, and unavailable-tool placeholder construction use
the manager-provided `ToolName` values directly.
- Keep code mode flattening on the `__` namespace separator.
- Preserve hook compatibility by giving non-prefixed MCP hook names
legacy `mcp__...` matcher aliases.
- Add/adjust integration and unit coverage for non-prefixed code-mode
behavior, hook matching with the feature on and off, and manager-level
legacy prefixing.

## Testing

- `cargo test -p codex-mcp --lib`
- `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
- `cargo test -p codex-core --lib mcp_tools -- --nocapture`
- `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
- `cargo test -p codex-core --test all mcp_tool -- --nocapture`
- `cargo test -p codex-core --test all search_tool -- --nocapture`
- `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
- `cargo test -p codex-core --test all
code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
--nocapture`
- `cargo test -p codex-tools`
- `cargo test -p codex-features`
2026-05-26 08:21:15 -07:00
jif-oai
b77be36896 fix: drop flake (#24588)
Dropping already commented out stuff
2026-05-26 15:07:26 +02:00
jif-oai
4f7d6b4ef7 chore: stop consuming legacy config profiles (#24076)
## Why

The old config-profile mechanism should no longer influence runtime
behavior now that profile selection has moved to file-based `--profile`
config files. Core already rejects a selected legacy `profile = "..."`
with a migration error in
[`core/src/config/mod.rs`](d6451fcb79/codex-rs/core/src/config/mod.rs (L2521-L2529)),
but a few residual consumers still read legacy `[profiles.*]` data while
performing managed-feature checks and personality migration.

That kept dead legacy profile state relevant after selection had been
removed, and could make personality migration depend on a stale or
missing old profile.

## What changed

- Stop scanning legacy `[profiles.*]` feature settings when validating
managed feature requirements.
- Make personality migration consider only top-level `personality` and
`model_provider` settings.
- Remove the now-unused `ConfigToml::get_config_profile` helper.
- Update personality migration coverage to verify that legacy profile
personality fields and missing legacy profile names no longer affect
that migration path.

This keeps the legacy `profile` / `profiles` config shape available for
the remaining compatibility and migration diagnostics; it only removes
these behavior consumers.

## Verification

- Updated `core/tests/suite/personality_migration.rs` for the new
legacy-profile behavior.
- Focused test command: `cargo test -p codex-core
personality_migration`.
2026-05-26 10:34:43 +02:00
Channing Conger
f94157a4b2 code-mode: merge stored values by key (#24159)
## Summary

Change code-mode stored value updates to merge writes by key instead of
replacing the session's complete stored-value map after each cell
completes.

Previously, each cell received a snapshot of stored values and returned
the complete resulting map. When multiple cells ran concurrently, a
later completion could overwrite values written by another cell because
it committed an older snapshot.

This change moves stored-value ownership into `CodeModeService`:

- Each runtime starts from the service's current stored values.
- Runtime completion reports only keys written by that cell.
- The service merges those writes into the current stored-value map on
successful completion.
- Core no longer replaces its stored-value state from a cell result.

As a result, concurrently executing cells can update different stored
keys without clobbering one another.

The move into CodeModeService is motivated by a desire to have this
lifetime tied to a new lifetime object on that side in a subsequent PR.
2026-05-22 19:09:02 -07:00
Abhinav
5c20513a1b Default function tools into tool hooks (#23757)
# Why

`PreToolUse`, `PostToolUse`, and `updatedInput` coverage for local
function tools currently depends on each handler remembering to wire up
the hook contract itself. That makes coverage easy to miss as new
function tools are added, even though most of them share the same basic
shape: a model-facing function call with JSON arguments.

# What

This makes `CoreToolRuntime` provide the default hook contract for
ordinary local function tools:

- build generic `PreToolUse` and `PostToolUse` payloads from the
function tool name and arguments
- apply `updatedInput` rewrites back into function-tool arguments
through the same default path
- let tool outputs override the post-hook input or response when they
have a more stable hook-facing contract

The exceptions stay explicit:

- hosted tools remain outside the generic local function path
- code-mode `wait` and `write_stdin` opt out for now
- `PostToolUse` feedback replaces only the model-visible response, so
code mode keeps its typed tool result

With the generic path in place, the MCP and extension-tool adapters no
longer need their own duplicate pre/post hook plumbing. The new coverage
exercises the registry default plus end-to-end local function behavior
for pre-hook blocking, `updatedInput` rewriting, and post-hook context.
2026-05-23 00:56:58 +00:00
mchen-oai
3c83e57bfa Add trace_id to TurnStartedEvent (#23980)
## Why
[Recent PR](https://github.com/openai/codex/pull/22709) removed
`trace_id` from `TurnContextItem`.

## What changed
- Add to `TurnStartedEvent` so rollout consumers can correlate turns
with telemetry traces.
- Note that the branch name is out of date because I originally re-added
to `TurnContextItem`, but we decided to move it to `TurnStartedEvent`.

## Verification
- `cargo test -p codex-protocol`
- `cargo test -p codex-core --lib
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core --test all
emits_warning_when_resumed_model_differs`
- `cargo test -p codex-rollout`
- `cargo test -p codex-state`
2026-05-22 13:10:56 -07:00
rhan-oai
dac98cb635 retry remote compaction v2 requests (#23951)
## Why

Remote compaction v2 sends a normal `/responses` request with a
compaction trigger. It should follow the retry semantics used by normal
Responses streaming calls for transient stream/request failures, while
keeping a smaller per-transport retry budget because compact attempts
can run much longer than normal turns.

## What changed

- Add a v2 compaction retry loop that uses `stream_max_retries`,
matching normal Responses turn retry mechanics.
- Cap the compact v2 retry budget at 2 retries per transport with
`min(stream_max_retries, 2)`.
- Retry retryable request-open and post-open stream collection failures
through the same loop.
- Use the existing 200ms exponential backoff and requested retry delay
handling used by normal turn retries.
- Emit the same `Reconnecting... n/max` stream-error notification
pattern.
- Fall back from WebSockets to HTTPS after the compact v2 stream retry
budget is exhausted, then reset the retry counter for HTTPS.
- Keep final remote-compaction failure logging after retries/fallback
are exhausted.
- Treat compact stream EOF before `response.completed` as a retryable
stream failure.
- Add compact v2 regression coverage with `request_max_retries = 0` and
`stream_max_retries = 2`, covering both request-open failure and
opened-stream EOF in one end-to-end test.

## Tests

- `just fmt`
- `cargo test -p codex-core remote_compact_v2`
- `just fix -p codex-core`
2026-05-22 10:14:14 -07:00
anp-oai
d53e68954a Prefer just test over cargo test in docs (#23910)
`cargo test` for the core and other crates fails on a fresh macOS
checkout without the right stack size variable. This change encourages
using the just test command that sets the environment up correctly.

As a bonus, this should encourage agents to get more benefit out of
nextest's parallel execution.
2026-05-22 16:58:14 +00:00
anp-oai
c83ba22359 Allow parallel MCP tool calls when annotated readOnly (#23750)
## Summary
- Treat MCP tools with `readOnlyHint: true` as parallel-safe even when
`supports_parallel_tool_calls` is unset or `false`.
- Keep server-level `supports_parallel_tool_calls` as an additive
override for non-read-only tools.
- Add focused unit coverage for the MCP handler eligibility decision.
- Update RMCP integration coverage to keep the serial baseline on a
mutable tool, verify read-only concurrency without server opt-in, and
preserve the server opt-in concurrency path separately.

## Testing
- `just fmt`
- `cargo test -p codex-core --lib tools::handlers::mcp::tests::`
- `cargo test -p codex-core --test all
stdio_mcp_read_only_tool_calls_run_concurrently_without_server_opt_in`
- `cargo test -p codex-core --test all
stdio_mcp_parallel_tool_calls_opt_in_runs_concurrently`
- `cargo test -p codex-rmcp-client`
2026-05-21 20:40:34 -07:00
Abhinav
16d85e2708 Add subagent identity to hook inputs (#22882)
# What

When a normal hook fires inside a thread-spawned subagent, Codex now
includes these optional top-level fields in the hook input:

- `agent_id`: the child thread id
- `agent_type`: the subagent role

Root-agent hook inputs omit these fields. `SubagentStart` and
`SubagentStop` keep their existing required `agent_id` and `agent_type`
fields because those events are inherently subagent-scoped.

This does not change matcher behavior. Tool hooks still match on tool
name, compact hooks still match on trigger, and `UserPromptSubmit` still
ignores matchers. Only `SubagentStart` and `SubagentStop` match on
`agent_type`.
2026-05-21 14:54:01 -07:00
Abhinav
24faf49b2a Remove plugin hooks feature flag (#22552)
# Why

This is a follow-up stacked on top of the `plugin_hooks` default-on
change. Once we are comfortable making plugin hooks part of the normal
plugin behavior, the separate feature flag stops buying us much and
leaves extra branching/cache state behind.

# What

- remove the `PluginHooks` feature and generated config-schema entries
- make plugin hook loading/listing follow plugin enablement directly
- drop plugin-manager cache/state that only existed to distinguish
hook-flag toggles
- remove tests and fixtures that modeled `plugin_hooks = true/false`
2026-05-21 19:15:18 +00:00
starr-openai
298e5cfce1 Route MCP servers through explicit environments (#23583)
## Summary
- route each configured MCP server through an explicit per-server
`environment_id` instead of a manager-wide remote toggle
- default omitted `environment_id` to `local`, resolve named ids through
`EnvironmentManager`, and fail only the affected MCP server when an
explicit id is unknown
- keep local stdio on the existing local launcher path for now, while
named-environment stdio uses the selected environment backend and
requires an absolute `cwd`
- allow local HTTP MCP servers to keep using the ambient HTTP client
when no local `Environment` is configured; named-environment HTTP MCPs
use that environment's HTTP client

## Validation
- devbox Bazel build: `bazel build --bes_backend= --bes_results_url=
//codex-rs/cli:codex //codex-rs/rmcp-client:test_stdio_server
//codex-rs/rmcp-client:test_streamable_http_server`
- devbox app-server config matrix with real `config.toml` /
`environments.toml` files covering omitted local, explicit local,
omitted local under remote default, explicit remote stdio, local HTTP
without local env, explicit remote HTTP, local stdio without local env,
unknown explicit env, and remote stdio without `cwd`
2026-05-21 17:19:54 +02:00
jif-oai
8a511d5881 cli: rename profile v2 flag to --profile (#23883)
## Why

Profile v2 is taking over the user-facing profile selection path, so the
CLI no longer needs to expose the transitional `--profile-v2` spelling.
This switches the public args surface to `--profile` before the
remaining legacy profile plumbing is removed separately.

## What

- Rebind `--profile` and `-p` to the v2 profile name argument that
selects `$CODEX_HOME/<name>.config.toml`.
- Stop parsing the legacy shared CLI profile argument while keeping its
implementation path in place for follow-up cleanup.
- Update CLI validation, profile-name parse errors, and the
legacy-profile collision message/tests to refer to `--profile`.

## Testing

- `cargo test -p codex-cli -p codex-config -p codex-protocol -p
codex-utils-cli`
2026-05-21 16:45:27 +02:00
jif-oai
2a25602783 [codex] Stabilize subagent start hook test (#23882)
## What

Remove the exact captured request-count assertion from the
`SubagentStart` hook integration test while still waiting for the child
request that matches the injected hook context.

## Why

The test owns the start-hook behavior and already verifies that the
child request reaches the context matcher plus that the start/session
hook logs have the expected invocations. Counting every request captured
by the response mock makes the test sensitive to lifecycle timing
outside that contract and has been flaky in CI.

## Testing

- `cargo test -p codex-core --test all
suite::subagent_notifications::subagent_start_replaces_session_start_and_injects_context
-- --exact`
2026-05-21 15:54:23 +02:00
jif-oai
20fedafff8 Trace logical websocket request after untraced warmup (#23581)
## Why

`prewarm_websocket` intentionally stays out of rollout inference
tracing, but the next traced websocket request can still reuse the
warmup `response_id` and send an empty `input` delta. If tracing records
that wire payload verbatim, replay sees an incremental request whose
parent was never traced and cannot reconstruct the conversation.

This fixes that at the producer boundary instead of relaxing
`rollout-trace` replay semantics around unresolved
`previous_response_id` values.

## What

- track whether the last websocket response came from an untraced warmup
and clear that state when the websocket session is reset or reconnected
- when a traced websocket request reuses that warmup parent, keep
sending the compressed websocket request on the wire but record the
logical `ResponsesApiRequest` in the rollout trace
- add a regression test that proves replay reconstructs the logical user
message even though the websocket follow-up carries
`previous_response_id = warm-1` with empty `input`
- update `InferenceTraceAttempt::record_started` docs to reflect that
callers may record a logical request rather than the exact transport
payload

## Testing

- `cargo test -p codex-core --test all
responses_websocket_request_prewarm_traces_logical_request`
2026-05-21 11:13:23 +02:00
Matthew Zeng
0a4179bb19 [codex] Add plugin id to MCP tool call items (#23737)
Add owning plugin id to MCP tool call items so we can better filter them
at plugin level.

## Summary
- add optional `plugin_id` to MCP tool-call items and legacy begin/end
events
- propagate plugin metadata into emitted core items and app-server v2
`ThreadItem::McpToolCall`
- preserve plugin ids through app-server replay/redaction paths and
regenerate v2 schema fixtures

## Testing
- `just write-app-server-schema`
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core mcp_tool_call_item_includes_plugin_id --lib`
- `cargo check -p codex-tui --tests`
- `cargo check -p codex-app-server --tests`
- `git diff --check`

## Notes
- `just fix -p codex-core` completed with two non-fatal
`too_many_arguments` warnings on the touched MCP notification helpers.
- A broader `cargo test -p codex-core` run passed core unit tests, then
hit shell/sandbox/snapshot failures in the integration target.
- A broader app-server downstream run hit the existing
`in_process::tests::in_process_start_clamps_zero_channel_capacity` stack
overflow; `cargo test -p codex-exec` also hit the existing sandbox
expectation mismatch in
`thread_lifecycle_params_include_legacy_sandbox_when_no_active_profile`.
2026-05-20 17:02:10 -07:00
Shijie Rao
370b13afc9 Honor client-resolved service tier defaults (#23537)
## Why

Model catalog responses can now advertise a nullable
`default_service_tier` for each model. Codex needs to preserve three
distinct states all the way from config/app-server inputs to inference:

- no explicit service tier, so the client may apply the current model
catalog default when FastMode is enabled
- explicit `default`, meaning the user intentionally wants standard
routing
- explicit catalog tier ids such as `priority`, `flex`, or future tiers

Keeping those states distinct prevents the UI from showing one tier
while core sends another, especially after model switches or app-server
`thread/start` / `turn/start` updates.

## What Changed

- Plumbed `default_service_tier` through model catalog protocol types,
app-server model responses, generated schemas, model cache fixtures, and
provider/model-manager conversions.
- Added the request-only `default` service tier sentinel and normalized
legacy config spelling so `fast` in `config.toml` still materializes as
the runtime/request id `priority`.
- Moved catalog default resolution to the TUI/client side, including
recomputing the effective service tier when model/FastMode-dependent
surfaces change.
- Updated app-server thread lifecycle config construction so
`serviceTier: null` preserves explicit standard-routing intent by
mapping to `default` instead of internal `None`.
- Kept core responsible for validating explicit tiers against the
current model and stripping `default` before `/v1/responses`, without
applying catalog defaults itself.

## Validation

- `CARGO_INCREMENTAL=0 cargo build -p codex-cli`
- `CARGO_INCREMENTAL=0 cargo test -p codex-app-server model_list`
- `cargo test -p codex-tui service_tier`
- `cargo test -p codex-protocol service_tier_for_request`
- `cargo test -p codex-core get_service_tier`
- `RUST_MIN_STACK=8388608 CARGO_INCREMENTAL=0 cargo test -p codex-core
service_tier`
2026-05-20 15:57:50 -07:00
Eric Traut
0e9d222178 Make goals feature on by default and no longer experimental (#23732)
## Why

The `goals` feature is ready to be available without requiring users to
opt into experimental features. Keeping it behind the beta flag leaves
persisted thread goals and automatic goal continuation disabled by
default.

This PR also marks the goal-related app server APIs and events as no
longer experimental.

## What changed

- Mark `goals` as `Stage::Stable`.
- Enable `goals` by default in `codex-rs/features/src/lib.rs`.
2026-05-20 15:07:35 -07:00
Abhinav
eee3e60db3 Add SubagentStop hook (#22873)
# What

<img width="1792" height="1024" alt="image"
src="https://github.com/user-attachments/assets/8f81d232-5813-4994-a61d-e42a05a93a3e"
/>

`SubagentStop` runs when a thread-spawned subagent turn is about to
finish. Thread-spawned subagents use `SubagentStop` instead of the
normal root-agent `Stop` hook.

Configured handlers match on `agent_type`. Hook input includes the
normal stop fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.
- `agent_transcript_path`: the child subagent transcript path.
- `transcript_path`: the parent thread transcript path.
- `last_assistant_message`: the final assistant message from the child
turn, when available.
- `stop_hook_active`: `true` when the child is already continuing
because an earlier stop-like hook blocked completion.

`SubagentStop` shares the same completion-control semantics as `Stop`,
scoped to the child turn:

- No decision allows the child turn to finish.
- `decision: "block"` with a non-empty `reason` records that reason as
hook feedback and continues the child with that prompt.
- `continue: false` stops the child turn. If `stopReason` is present,
Codex surfaces it as the stop reason.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStop`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `Stop` hooks and do not run `SubagentStop`.
This avoids exposing synthetic matcher labels for internal
implementation paths.

# Stack

1. #22782: add `SubagentStart`.
2. This PR: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.
2026-05-20 14:59:41 -07:00
Michael Bolin
896ee672cc windows-sandbox: feed setup from resolved permissions (#23167)
## Why

This is the next step in the Windows sandbox migration away from the
legacy `SandboxPolicy` abstraction. #22923 moved write-root and token
decisions onto `ResolvedWindowsSandboxPermissions`, but setup and
identity still accepted `SandboxPolicy` and converted internally. This
PR pushes that conversion outward so the setup path consumes the
resolved Windows permission view directly.

## What Changed

- Changed `SandboxSetupRequest` to carry
`ResolvedWindowsSandboxPermissions` instead of `SandboxPolicy` plus
policy cwd.
- Updated setup refresh/elevation and identity credential preparation to
use resolved permissions for read roots, write roots, network identity,
and deny-write payload planning.
- Removed the production `allow.rs` legacy wrapper; allow-path
computation now takes resolved permissions directly.
- Added a permissions-based world-writable audit entry point while
keeping the existing legacy wrapper for compatibility.
- Updated legacy ACL setup and the core Windows setup bridge to
construct resolved permissions at the boundary.
- Hardened the Windows sandbox integration test helper staging so Bazel
retries can reuse an already-staged helper if a prior sandbox helper
process still has the executable open.

## Verification

- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core --test all --no-run`
- `just fix -p codex-windows-sandbox`
- `just fix -p codex-core`
- Attempted `cargo check -p codex-windows-sandbox --target
x86_64-pc-windows-gnullvm`, but the local machine is missing
`x86_64-w64-mingw32-clang`; Windows CI should cover that target.











---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23167).
* #23715
* #23714
* __->__ #23167
2026-05-20 14:52:38 -07:00
Michael Bolin
e1ec0eee5f windows-sandbox: drive write roots from resolved permissions (#22923)
## Why

This is the third PR in the Windows sandbox `SandboxPolicy` ->
`PermissionProfile` migration stack.

#22896 introduced `ResolvedWindowsSandboxPermissions`, and #22918 moved
elevated runner IPC to carry `PermissionProfile`. This PR starts moving
the remaining setup/spawn helpers away from asking legacy enum questions
like “is this `WorkspaceWrite`?” and toward resolved runtime permission
questions like “does this profile require write capability roots?”

## What changed

- Added resolved-permissions helpers for network identity and
write-capability detection.
- Moved setup write-root gathering to operate on
`ResolvedWindowsSandboxPermissions`, with the legacy `SandboxPolicy`
wrapper left in place for existing call sites.
- Updated identity setup, elevated capture setup, and world-writable
audit denies to use resolved write roots.
- Updated spawn preparation to carry resolved permissions in
`SpawnContext` and use them for network blocking, setup write roots,
elevated capability SID selection, and legacy capability roots.
- Removed a now-unused legacy write-root helper.

## Verification

- `cargo test -p codex-windows-sandbox`
- `just fix -p codex-windows-sandbox`
- Existing stack checks are green on #22896 and #22918; CI has started
for this PR.
















---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22923).
* #23715
* #23714
* #23167
* __->__ #22923
2026-05-20 14:30:42 -07:00
Abhinav
af49d38373 Support compact SessionStart hooks (#21272)
# Why

Compaction replaces the live conversation history, so hooks that use
`SessionStart` to re-inject durable model context need a way to run
again after that rewrite.

Related - #19905 adds dedicated compact lifecycle hooks

# What

- add `compact` as a supported `SessionStart` source and matcher value
- change pending `SessionStart` state from a single slot to a small FIFO
queue so `resume` / `startup` / `clear` can be preserved alongside a
later `compact`
- drain all queued `SessionStart` sources before the next model request,
preserving their original order

# Testing

The new integration coverage verifies both the basic `compact` matcher
path and the stacked `resume` -> `compact` case where both hooks
contribute `additionalContext` to the next model turn.
2026-05-20 20:46:19 +00:00
richardopenai
000bf5ce6d Migrate exec-server remote registration to environments (#23633)
## Summary
- migrate exec-server remote registration naming from executor to
environment
- align CLI, public Rust exports, registry error messages, and relay
test fixtures with the environment registry contract
- keep the live registration path and response model consistent with
`/cloud/environment/{environment_id}/register`

## Verification
- `cargo test -p codex-exec-server
remote::tests::register_environment_posts_with_auth_provider_headers
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo test -p codex-exec-server --test relay
multiplexed_remote_environment_routes_independent_virtual_streams
--manifest-path /Users/richardlee/code/codex/codex-rs/Cargo.toml`
- `cargo check -p codex-cli --manifest-path
/Users/richardlee/code/codex/codex-rs/Cargo.toml` (still running when PR
opened; will update after completion if needed)
2026-05-20 00:25:04 -07:00
Ahmed Ibrahim
5a4202ad90 [codex] Preserve raw code-mode exec output by default (#23564)
## Why
Code mode can use nested unified exec calls as data sources. When those
calls omit `max_output_tokens`, code mode should receive raw command
output so the script can parse or summarize it itself. When code mode
does provide `max_output_tokens`, that explicit nested budget should be
respected, including values above the default unified exec limit, rather
than being capped before code mode sees the result.

## What
- Preserve direct unified exec truncation behavior, while letting
code-mode exec/write_stdin keep `max_output_tokens` as `None` unless
explicitly supplied.
- Make code-mode tool results use raw output when no explicit limit is
present, and use the explicit nested limit directly when one is
specified.
- Refactor unified exec output formatting so `truncated_output` takes
the caller-selected token budget.
- Add e2e integration coverage for explicit nested exec limits, omitted
nested exec limits, outer exec limit propagation, omitted-limit outputs
that exceed both the default and a small truncation policy, explicit
nested limits above those caps, and high explicit limits that still
compact larger command output.
- Reuse the code-mode turn setup helper while directly asserting the
exact exec output item in each test.

## Testing
- `just fmt`
- `git diff --check`
- Not run locally per repo guidance; CI should validate the e2e
integration tests.
2026-05-20 04:02:14 +00:00
Eric Traut
e43a2e297f Fix stale background terminal poll events (#23231)
## Why

Issue #23214 reports `/ps` showing no background terminals while the
status line still says it is waiting for a background terminal. The race
is in core: `write_stdin` can poll a process that exits before the
response returns. The process manager correctly returns `process_id:
None`, but the handler still emitted a `TerminalInteraction` event using
the requested session id, causing clients to believe a dead process was
still being polled.

Fixes #23214.

## What changed

- Suppress `TerminalInteraction` events for empty `write_stdin` polls
once `response.process_id` is `None`.
- Continue emitting interactions for non-empty stdin, even if that input
causes the process to exit before the response returns.
- Extend the unified exec integration test to assert completed empty
polls do not emit terminal interactions.

## Verification

- `cargo test -p codex-core --test all
unified_exec_emits_one_begin_and_one_end_event`
- `cargo test -p codex-core --test all
unified_exec_emits_terminal_interaction_for_write_stdin`

`cargo test -p codex-core` currently aborts in unrelated
`agent::control::tests::resume_agent_from_rollout_uses_edge_data_when_descendant_metadata_source_is_stale`
with a reproducible stack overflow.
2026-05-19 20:48:37 -07:00
Matthew Zeng
8335b56c33 Split plugin install discovery into list and request tools (#23372)
## Summary
- Add `list_available_plugins_to_install` as the inventory step for
plugin and connector install suggestions.
- Slim `request_plugin_install` so it only handles the actual
elicitation, instead of carrying the full discoverable list in its
prompt.
- Emit send-time telemetry when an install elicitation is dispatched,
including requested tool identity in the event payload.
- Emit install-result telemetry through `SessionTelemetry`, including
tool type, user response action, and completion status.
- Update registration and tests to cover the new two-step flow while
keeping the existing `tool_suggest` feature gate unchanged.

## Testing
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core request_plugin_install`
- `cargo test -p codex-core list_available_plugins_to_install`
- `cargo test -p codex-core
install_suggestion_tools_can_be_registered_without_search_tool`
- `cargo test -p codex-otel
manager_records_plugin_install_suggestion_metric`
- `cargo test -p codex-otel
manager_records_plugin_install_elicitation_sent_metric`
- `just fix -p codex-core`
- `just fix -p codex-tools`
- `just fix -p codex-otel`
- `cargo check -p codex-core`
2026-05-19 14:45:37 -07:00
Abhinav
d661ab70ed Add SubagentStart hook (#22782)
# What

`SubagentStart` runs once when Codex creates a thread-spawned subagent,
before that child sends its first model request. Thread-spawned
subagents use `SubagentStart` instead of the normal root-agent
`SessionStart` hook.

Configured handlers match on the subagent `agent_type`, using the same
value passed to `spawn_agent`. When no agent type is specified, Codex
uses the default agent type.

Hook input includes the normal session-start fields plus:

- `agent_id`: the child thread id.
- `agent_type`: the resolved subagent type.

`SubagentStart` may return `hookSpecificOutput.additionalContext`. That
context is added to the child conversation before the first model
request.

# Lifecycle Scope

Only thread-spawned subagents run `SubagentStart`.

Internal/system subagents such as Review, Compact, MemoryConsolidation,
and Other do not run normal `SessionStart` hooks and do not run
`SubagentStart`. This avoids exposing synthetic matcher labels for
internal implementation paths.

Also the `SessionStart` hook no longer fires for subagents, this matches
behavior with other coding agents' implementation

# Stack

1. This PR: add `SubagentStart`.
2. #22873: add `SubagentStop`.
3. #22882: add subagent identity to normal hook inputs.
2026-05-19 12:45:08 -07:00
viyatb-oai
3c76081876 Make deny canonical for filesystem permission entries (#23493)
## Why
Filesystem permission profiles used `none` for deny-read entries, which
is less direct than the action the entry actually represents. This
change makes `deny` the canonical filesystem permission spelling while
preserving compatibility for older configs that still send `none`.

## What changed
- rename `FileSystemAccessMode::None` to `Deny`
- serialize and generate schemas with `deny` as the canonical value
- retain `none` only as a legacy input alias for temporary config
compatibility
- update filesystem glob diagnostics and regression coverage to use the
canonical spelling
- refresh config and app-server schema fixtures to match the new wire
shape

## Validation
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core config_toml_deserializes_permission_profiles
--lib`
- `cargo test -p codex-core
read_write_glob_patterns_still_reject_non_subpath_globs --lib`

Earlier in the session, a broad `cargo test -p codex-core` run reached
unrelated pre-existing failures in timing/snapshot/git-info tests under
this environment; the targeted surfaces touched by this PR passed
cleanly.
2026-05-19 11:03:47 -07:00
jif-oai
05b8ce4354 chore: namespace v1 sub-agent tools (#23475)
## Why

The v1 sub-agent tools are a single tool family, but they were exposed
as separate flat function tools. This makes the model-visible surface
less clearly grouped and leaves the legacy names in the same flat
namespace as newer agent tooling.

## What

- Wraps the v1 `spawn_agent`, `send_input`, `resume_agent`,
`wait_agent`, and `close_agent` specs in the `multi_agent_v1` namespace.
- Registers the corresponding handlers with namespaced runtime tool
names.
- Updates tool-planning, deferred tool search, and sub-agent
notification tests to assert the namespace shape and child `spawn_agent`
lookup.

## Verification

- Updated `codex-core` coverage for the v1 multi-agent tool plan,
deferred tool search output, and sub-agent tool descriptions.
2026-05-19 19:46:17 +02:00
jif-oai
b3ae3de405 Defer v1 multi-agent tools behind tool search (#23144)
Summary: defer v1 multi-agent tools when tool_search and namespace tools
are available; keep concise searchable descriptions and move the v1
usage guidance into developer instructions; add targeted coverage.
Testing: not run per request; ran just fmt.
2026-05-19 15:04:35 +02:00
jif-oai
80fdd4688f Add body_after_prefix auto-compact token limit scope (#22870)
## Why

`model_auto_compact_token_limit` has only been able to budget the full
active context. That makes it hard to set a small "growth since
compaction" budget for sessions that preserve a large carried window
prefix: the preserved prefix can consume the whole budget and force
immediate repeated compaction.

This PR adds an opt-in `body_after_prefix` scope so callers can apply
`model_auto_compact_token_limit` to sampled output and later growth
after the current carried prefix, while still forcing compaction before
the full model context window is exhausted.

## What changed

- Adds `AutoCompactTokenLimitScope` with the existing `total` behavior
as the default and a new `body_after_prefix` mode:
[`config_types.rs`](973806b1cb/codex-rs/protocol/src/config_types.rs (L24-L37)).
- Threads `model_auto_compact_token_limit_scope` through config loading,
`Config`, `core-api`, and app-server v2 schema/TypeScript generation.
- Records the first observed input-token count for a `body_after_prefix`
compaction window and uses it as the baseline when deciding whether the
scoped auto-compaction budget is exhausted:
[`turn.rs`](973806b1cb/codex-rs/core/src/session/turn.rs (L743-L781)).
- Keeps a hard context-window cap in `body_after_prefix`, so scoped
budgeting cannot let the active context overrun the usable window.

## Verification

Added compact-suite coverage for the two key behaviors:
`body_after_prefix` does not re-compact just because the carried prefix
is larger than the scoped budget, and it still compacts when the total
active context reaches the configured context window:
[`compact.rs`](973806b1cb/codex-rs/core/tests/suite/compact.rs (L3003-L3128)).
2026-05-19 10:19:46 +00:00
sayan-oai
1dd9bf9a74 Remove explicit connector tool undeferral (#23390)
## Summary
- remove the explicit-connector carveout that kept mentioned app tools
directly exposed instead of deferred
- keep the surviving explicit-mention reconstruction only for analytics,
preserving `codex_app_mentioned` and `codex_app_used.invoke_type`
- trim the now-unused prompt/tool-exposure plumbing and refresh coverage
around always-defer behavior

## Verification
- `just fmt`
- `cargo test -p codex-analytics`
- `cargo test -p codex-core` *(one transient timeout in
`shell_snapshot::tests::macos_zsh_snapshot_includes_sections`; isolated
rerun passed)*
- `cargo test -p codex-core --lib
shell_snapshot::tests::macos_zsh_snapshot_includes_sections`
- `cargo test -p codex-core --test all
explicit_app_mentions_respect_always_defer`
- `cargo test -p codex-core --lib
mcp_tool_exposure::tests::always_defer_feature_defers_apps_too`
- `just fix -p codex-analytics`
- `just fix -p codex-core`
2026-05-18 21:33:46 -07:00
Eric Traut
a668379abf [5 of 7] Replace OverrideTurnContext with ThreadSettings (#22508)
**Stack position:** [5 of 7]

## Summary

This PR adds `Op::ThreadSettings`, a queued settings-only update
mechanism for changing stored thread settings without starting a new
turn. It also removes the legacy `Op::OverrideTurnContext` in the same
layer, so reviewers can see the replacement and deletion together.

## Changes

- Add `Op::ThreadSettings` for settings-only queued updates.
- Emit `ThreadSettingsApplied` with the effective thread settings
snapshot after core applies an update.
- Route settings-only updates through the same submission queue as user
input.
- Migrate remaining `OverrideTurnContext` tests and callers to the
queued `Op::ThreadSettings` path.
- Delete `Op::OverrideTurnContext` from the core protocol and submission
loop.

This stack addresses #20656 and #22090.

## Stack

1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508) (this PR)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)
2026-05-18 21:03:51 -07:00
Eric Traut
1a25d8b6e5 [3 of 7] Remove UserTurn (#23075)
**Stack position:** [3 of 7]

## Summary

This PR finishes the input-op consolidation by moving the remaining
`Op::UserTurn` callers onto `Op::UserInput` and deleting `Op::UserTurn`.
This touches a lot of files, but it is a low-risk mechanical migration.

## Stack

1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075) (this PR)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)
2026-05-18 19:56:00 -07:00
Eric Traut
84d941d07f [1 of 7] Add thread settings to UserInput (#23080)
**Stack position:** [1 of 7]

## Summary

The first three PRs in this stack are a cleanup pass before the actual
thread settings API work.

Today, core has several overlapping "user input" ops: `UserInput`,
`UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
much next-turn state they carry, which makes the later queued thread
settings update harder to reason about and review.

This PR starts that cleanup by adding the shared
`ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
it. Existing variants remain in place here, so this layer is mostly a
behavior-preserving API shape change plus mechanical constructor
updates.

## End State After PR3

By the end of PR3, `Op::UserInput` is the only "user input" core op. It
can carry optional thread settings overrides for callers that need to
update stored defaults with a turn, while callers without updates use
empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
deleted.

## End State After PR5

By the end of PR5, core will have only two ops for this area:

- `Op::UserInput` for user-input-bearing submissions.
- `Op::ThreadSettings` for settings-only updates.

## Stack

1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080) (this PR)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)
2026-05-18 18:48:35 -07:00
sayan-oai
daa11820b0 Remove ToolSearch feature toggle (#23389)
## Summary
- mark `ToolSearch` as removed and ignore stale config writes for its
legacy key
- make search tool exposure depend only on model capability, not a
feature toggle
- remove app-server enablement support and prune now-obsolete test
coverage/setup

## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core search_tool_requires_model_capability`
- `cargo test -p codex-app-server experimental_feature_enablement_set_`

## Notes
- This keeps the legacy config key as a no-op for compatibility while
removing the ability to toggle the behavior off cleanly.
- No developer-facing docs update outside the touched app-server README
was needed.
2026-05-19 01:24:39 +00:00
pakrym-oai
f2368b7de6 [codex] Trim unused TurnContextItem fields (#22709)
## Why

`TurnContextItem` is the durable baseline used to reconstruct context
diffs across resume/fork. Most of the old persisted-only fields on it
are no longer read, so keeping them in rollout snapshots adds schema
surface and state that can drift without affecting reconstruction.

`summary` is the exception: older Codex versions require it to
deserialize `turn_context` records, so keep writing a default
compatibility value until that schema surface can be removed safely.

## What changed

- Removed the unused persisted fields from `TurnContextItem`: trace ids,
user/developer instructions, output schema, and truncation policy.
- Kept `summary` with a compatibility comment and made
`TurnContext::to_turn_context_item` write `ReasoningSummary::Auto`
instead of live turn state.
- Updated rollout/context reconstruction fixtures for the retained
summary field.

## Verification

- `cargo test -p codex-protocol --lib turn_context_item`
- `cargo test -p codex-rollout
resume_candidate_matches_cwd_reads_latest_turn_context`
- `cargo test -p codex-state turn_context`
- `cargo test -p codex-core --lib
new_default_turn_captures_current_span_trace_id`
- `cargo test -p codex-core --lib
record_initial_history_resumed_turn_context_after_compaction_reestablishes_reference_context_item`
- `cargo test -p codex-core --test all
emits_warning_when_resumed_model_differs`
- `git diff --check`
2026-05-18 21:54:36 +00:00
pakrym-oai
82061660ae [codex] Remove legacy shell output formatting paths (#22706)
## Why

The client and tool pipeline still carried compatibility code for legacy
structured shell output. Current shell and apply_patch responses are
already plain text for model consumption, so keeping a
JSON-serialization path plus shell-item rewrite logic makes the request
formatter and tests preserve a format we do not need anymore.

## What Changed

- Removed the client-side shell output rewrite from
`core/src/client_common.rs`.
- Removed the structured exec-output formatter and the shell `freeform`
switch so tool emitters use one model-facing formatter.
- Collapsed apply_patch/shell serialization tests around the remaining
plain-text output expectations and removed duplicate one-variant
parameterized cases.
- Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility
input shape, but no longer treats it as a separate output-format mode.

## Validation

- `cargo test -p codex-core client_common`
- `cargo test -p codex-core shell_serialization`
- `cargo test -p codex-core apply_patch_cli`
- `just fix -p codex-core`

## Documentation

No external Codex documentation update is needed.
2026-05-18 09:57:54 -07:00
Michael Bolin
0a83353ca3 test: reduce core sandbox policy test setup (#23036)
## Why

`SandboxPolicy` is a legacy compatibility shape, but several core tests
still used it for ordinary turn setup even when the runtime path now
carries `PermissionProfile`. With the first cleanup PR merged, this
follow-up trims more core test scaffolding so remaining `SandboxPolicy`
matches are easier to classify as production compatibility,
legacy-boundary coverage, or explicit conversion tests.

## What Changed

- Updated apply-patch handler and runtime tests to pass
`PermissionProfile` directly.
- Changed sandboxing test helpers to build permission profiles without
first creating `SandboxPolicy` values.
- Converted request-permissions integration turns to pass
`PermissionProfile` through the test helper, leaving legacy sandbox
projection at the `Op::UserTurn` boundary.
- Converted unified exec integration helpers and direct turn submissions
to use `PermissionProfile` values instead of `SandboxPolicy` setup.
- Removed now-unused `SandboxPolicy` imports from the touched core
tests.

## Test Plan

- `just fmt`
- `cargo test -p codex-core --lib tools::sandboxing::tests`
- `cargo test -p codex-core --lib tools::runtimes::apply_patch::tests`
- `cargo test -p codex-core --lib tools::handlers::apply_patch::tests`
- `cargo test -p codex-core --lib unified_exec::process_manager::tests`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-core --test all unified_exec::`
- `just fix -p codex-core`
2026-05-17 08:39:41 -07:00
sayan-oai
061a614d85 multiagent: trim model-visible description, cap to 5 models (#23069)
## Why

The `spawn_agent` model override guidance is uncapped and bloating
context. We need to trim down each entry and cap total entries.

picked 5 as cap, we can change

## What changed

- Cap the model override summaries shown in `spawn_agent` to the first 5
picker-visible models, preserving the existing priority ordering from
the models manager.
- Condense each rendered entry to the actionable pieces the model needs:
  - use the model slug as the label
  - render compact reasoning effort lists with the default marked inline
- render only service tier IDs, and omit the clause when no tiers are
available
- Update coverage so the compact formatter shape and the top-5 cap are
exercised, and keep the end-to-end request assertion aligned with real
model metadata.

## Example

Before:

`- gpt-5.4 ('gpt-5.4\'): Strong model for everyday coding. Default
reasoning effort: medium. Supported reasoning efforts: low (Fast
responses with lighter reasoning), medium (Balances speed and reasoning
depth for everyday tasks), high (Greater reasoning depth for complex
problems), xhigh (Extra high reasoning depth for complex problems).
Supported service tiers: priority (Fast: 1.5x speed, increased usage).`

After:

`- 'gpt-5.4': Strong model for everyday coding. Reasoning efforts: low,
medium (default), high, xhigh. Service tiers: priority.`
2026-05-16 13:43:30 -07:00
Michael Bolin
d91bc15618 test: construct permission profiles directly (#23030)
## Why

`SandboxPolicy` is now a legacy compatibility shape, but several tests
still built a `SandboxPolicy` only to immediately convert it into
`PermissionProfile` for APIs that already accept canonical runtime
permissions. Those detours make it harder to audit where legacy sandbox
policy is still required, because boundary-only usages are mixed
together with ordinary test setup.

## What Changed

- Updated tests in `codex-core`, `codex-exec`, `codex-analytics`, and
`codex-config` to construct `PermissionProfile` values directly when the
code under test takes a permission profile.
- Changed exec-policy, request-permissions, session, and sandbox test
helpers to pass `PermissionProfile` through instead of converting from
`SandboxPolicy` internally.
- Left `SandboxPolicy` in place where tests are explicitly exercising
legacy compatibility or request/response boundaries.

## Test Plan

- `cargo test -p codex-analytics -p codex-config`
- `cargo test -p codex-core --lib safety::tests`
- `cargo test -p codex-core --lib exec_policy::tests::`
- `cargo test -p codex-core --lib exec::tests`
- `cargo test -p codex-core --lib guardian_review_session_config`
- `cargo test -p codex-core --lib tools::network_approval::tests`
- `cargo test -p codex-core --lib
tools::runtimes::shell::unix_escalation::tests`
- `cargo test -p codex-core --lib managed_network`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-exec sandbox`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23030).
* #23036
* __->__ #23030
2026-05-16 12:12:37 -07:00
Curtis 'Fjord' Hawthorne
8543e39885 Preserve image detail in app-server inputs (#20693)
## Summary

- Add optional image detail to user image inputs across core, app-server
v2, thread history/event mapping, and the generated app-server
schemas/types.
- Preserve requested detail when serializing Responses image inputs:
omitted detail stays on the existing `high` default, while explicit
`original` keeps local images on the original-resolution path.
- Support `high`/`original` consistently for tool image outputs,
including MCP `codex/imageDetail`, code-mode image helpers, and
`view_image`.
2026-05-15 15:04:04 -07:00
jif-oai
5d30764fe9 Run compact hooks for remote compaction v2 (#22828)
## Why

Remote compaction v2 is the `/responses` implementation of
session-history compaction, but it still needs to preserve the
observable contract of the legacy `/responses/compact` path. In
particular, users and integrations that rely on `PreCompact` and
`PostCompact` hooks should not see different behavior when
`remote_compaction_v2` is enabled.

## What Changed

- Runs `PreCompact` before issuing the remote compaction v2 request,
including `Interrupted` analytics when a pre-hook stops execution.
- Runs `PostCompact` after a successful v2 compaction and aborts the
turn if the post-hook stops execution.
- Adds `compact_remote_parity` coverage that compares legacy and v2
compaction across manual transcript shapes, automatic pre-turn
compaction, automatic mid-turn compaction, hook payloads, replacement
history, follow-up request payloads, and API-key `service_tier=fast`
behavior.
- Registers the new parity suite under `core/tests/suite`.

Relevant code:

-
[`compact_remote_v2.rs`](af63745cb5/codex-rs/core/src/compact_remote_v2.rs)
-
[`compact_remote_parity.rs`](af63745cb5/codex-rs/core/tests/suite/compact_remote_parity.rs)

## Verification

- Added `core/tests/suite/compact_remote_parity.rs` to assert parity
between legacy remote compaction and remote compaction v2 for the
affected request, hook, rollout-history, and follow-up paths.
- Existing `compact_remote_v2` unit coverage still exercises v2
replacement-history retention and compaction-output collection.
2026-05-15 15:26:21 +02:00
jif-oai
0322ac3df8 [codex] Use compaction_trigger item for remote compaction v2 (#22809)
## Why

Remote compaction v2 was still using `context_compaction` as both the
request trigger and the compacted output shape. The Responses API now
has the landed contract for this flow: Codex sends a dedicated `{
"type": "compaction_trigger" }` input item, and the backend returns the
standard `compaction` output item with encrypted content.

This aligns the v2 path with that wire contract while preserving the
existing local compacted-history post-processing behavior.

## What changed

- Add `ResponseItem::CompactionTrigger` and regenerate the app-server
protocol schema fixtures.
- Send `compaction_trigger` from `remote_compaction_v2` instead of a
payload-less `context_compaction`.
- Collect exactly one backend `compaction` output item, then reuse the
existing compacted-history rebuilding path.
- Treat the trigger item as a transient request marker rather than model
output or persisted rollout/memory content.

## Verification

- `cargo test -p codex-protocol compaction_trigger`
- `cargo test -p codex-core remote_compact_v2`
- `cargo test -p codex-core compact_remote_v2`
- `cargo test -p codex-core
responses_websocket_sends_response_processed_after_remote_compaction_v2`
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol schema_fixtures`
2026-05-15 11:40:35 +02:00
Michael Bolin
8a5306ff88 app-server: use permission ids and runtime workspace roots (#22611)
## Why

This PR builds on [#22610](https://github.com/openai/codex/pull/22610)
and is the app-server side of the migration from mutable per-turn
`SandboxPolicy` replacement toward selecting immutable permission
profiles by id plus mutable runtime workspace roots.

Once permission profiles can carry their own immutable
`workspace_roots`, app-server no longer needs to mutate the selected
`PermissionProfile` just to represent thread-specific filesystem
context. The mutable part now lives on the thread as explicit
`runtimeWorkspaceRoots`, while `:workspace_roots` remains symbolic until
the sandbox is realized for a turn.

## What Changed

- Replaced the v2 permission-selection wrapper surface with plain
profile ids for `thread/start`, `thread/resume`, `thread/fork`, and
`turn/start`.
- Removed the API surface for profile modifications
(`PermissionProfileSelectionParams`,
`PermissionProfileModificationParams`,
`ActivePermissionProfileModification`).
- Added experimental `runtimeWorkspaceRoots` fields to the thread
lifecycle and turn-start APIs.
- Threaded runtime workspace roots through core session/thread
snapshots, turn overrides, app-server request handling, and command
execution permission resolution.
- Kept session permission state symbolic so later runtime root updates
and cwd-only implicit-root retargeting rebind `:workspace_roots`
correctly.
- Updated the embedded clients just enough to send and restore the new
thread state.
- Refreshed the generated schema/TypeScript artifacts and the app-server
README to match the new contract.

## Verification

Targeted coverage for this layer lives in:

- `codex-rs/app-server-protocol/src/protocol/v2/tests.rs`
- `codex-rs/app-server/tests/suite/v2/thread_start.rs`
- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- `codex-rs/core/src/session/tests.rs`

The key regression checks exercise that:

- `runtimeWorkspaceRoots` resolve against the effective cwd on thread
start.
- Profile-declared workspace roots are excluded from the runtime
workspace roots returned by app-server.
- A turn-level runtime workspace-root update persists onto the thread
and is returned by `thread/resume`.
- A named permission profile selected on one turn remains symbolic so a
later runtime-root-only turn update changes the actual sandbox writes.
- A cwd-only turn update retargets the implicit runtime cwd root while
preserving additional runtime roots.
- The protocol fixtures and generated client artifacts stay in sync with
the string-based permission selection contract.











---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/22611).
* #22612
* __->__ #22611
2026-05-14 23:00:05 -07:00