## Why
Steered input was queued as a `ResponseInputItem`, then parsed back into
a user message before recording. That path loses information that only
exists on `UserInput`, such as UI text elements.
This change keeps turn-local pending input typed as either original
`UserInput` or existing response items, so steered user input reaches
user-message recording without being reconstructed from a response item.
## What changed
- Add `TurnInput` for active-turn pending input.
- Queue `Session::steer_input` as `TurnInput::UserInput`.
- Run pending-input hook inspection only for `TurnInput::UserInput`.
- Process drained pending input item by item: accepted items are
recorded, blocked items append hook context and are skipped.
- Remove the pending-input prepend/requeue path.
## Validation
- `just fmt`
- `just fix -p codex-core`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib
session::tests::task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input
-- --nocapture`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib steer_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --lib pending_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core --test all
pending_input`
- `RUST_MIN_STACK=16777216 cargo test -p codex-core` (unit tests passed:
1835 passed, 0 failed, 4 ignored; integration `all` target failed due
missing helper binaries such as `codex`/`test_stdio_server` plus
unrelated MCP/search/code-mode expectations)
## Why
Pending model input was split across `Session`, `TurnState`, and the
agent mailbox. That made it easy for new paths to manage queued user
input or mailbox delivery outside the intended ownership boundary.
This PR consolidates the model-facing input lifecycle behind the session
input queue so turn-local pending input, next-turn queued items, and
mailbox delivery coordination are owned in one place.
## What Changed
- Added `session/input_queue.rs` to own pending input queues and mailbox
delivery coordination.
- Removed the standalone `agent/mailbox.rs` channel wrapper and store
mailbox items directly in the input queue.
- Moved pending-input mutations off `TurnState`; `TurnState` now exposes
the queue-owned storage directly for now.
- Routed abort cleanup, mailbox delivery phase changes, next-turn queued
items, and active-turn pending input through `InputQueue`.
- Boxed stack-heavy agent resume/fork startup futures that the refactor
pushed over the default test stack.
- Updated session, task, goal, stream-event, and multi-agent call sites
and tests to use the new queue ownership.
## Verification
- `cargo test -p codex-core --lib agent::control::tests`
- `cargo test -p codex-core --lib
agent::control::tests::resume_closed_child_reopens_open_descendants --
--exact`
- `cargo test -p codex-core --lib
agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
-- --exact`
- `cargo test -p codex-core --lib
agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role
-- --exact`
- `cargo test -p codex-core` was also run; it completed with 1814
passed, 4 ignored, and one timeout in
`agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role`,
which passed when rerun in isolation.
## Problem
This addresses several user-reported cases where active goals were
paused even though the user had not explicitly asked for that
transition:
- the guardian approval-review circuit breaker interrupted a turn and
implicitly paused the goal
- a shutdown in one app-server instance could pause a goal while a
second instance was still actively running the same thread
- steering-style interrupts could also pause the goal even though they
are meant to redirect work, not stop the goal lifecycle
The common problem was that core treated `TurnAbortReason::Interrupted`
as an implicit request to transition the persisted goal to `paused`.
That made unrelated interrupt paths mutate goal state as a side effect,
and in the multi-app-server case it allowed stale process teardown to
pause a live goal owned by another running client.
After this change, transitioning a goal to `paused` is always an
explicit action performed by a client or another intentional goal-state
mutation. It is never an implicit transition triggered by generic
interrupt handling.
Refs #22884.
## What changed
- Remove the goal runtime path that paused active goals after
interrupted task aborts.
- Drop the now-unused abort reason from `GoalRuntimeEvent::TaskAborted`.
- Update the focused regression coverage so an interrupted active goal
still accounts usage but remains `active`.
## Why
Extension lifecycle hooks sit on the host/extension boundary, but the
current trait surface only allows synchronous callbacks. That forces
extensions that need to seed, rehydrate, observe, or flush
extension-owned state during thread and turn transitions to either block
inside the callback or move async work into separate host plumbing.
This PR makes those lifecycle callbacks awaitable so extension
implementations can perform async work directly at the lifecycle point
where the host already has the relevant session, thread, or turn stores
available.
## What changed
- Makes `ThreadLifecycleContributor` and `TurnLifecycleContributor`
async in `codex-extension-api`.
- Awaits thread start/resume/stop and turn start/stop/abort lifecycle
callbacks from `codex-core`.
- Updates the guardian and memories extensions to implement the async
lifecycle trait surface.
- Updates the existing lifecycle tests to use async contributor
implementations.
- Adds `async-trait` to the crates that now expose or implement these
async object-safe lifecycle traits.
## Testing
- Existing `codex-core` lifecycle tests were updated to cover async
implementations for thread stop and turn abort ordering.
## Summary
- carry the per-turn extension data through RunningTask so abort
handling can rebuild SessionTaskContext
- update stale test ExtensionData::new() callsites to pass the turn id
## Testing
- Not run after PR branch creation; CI will cover.
## Summary
- run registered TurnItemContributor hooks for parsed stream output
items
- plumb the active turn extension store into stream item handling
- preserve existing memory citation parsing as fallback after
contributors run
## Tests
- cargo test -p codex-core stream_events_utils -- --nocapture
- just fmt
- just fix -p codex-core
- git diff --check
## Why
`TurnContext::cwd` and `TurnContext::resolve_path` are being phased out
in favor of using the selected turn environment cwd directly.
Deprecating both APIs makes any new direct dependency visible while
preserving the existing migration path for current callers.
## What Changed
- Marked `TurnContext::cwd` and `TurnContext::resolve_path` as
deprecated with guidance to use the selected turn environment cwd
instead.
- Added exact `#[allow(deprecated)]` suppressions at each existing
direct usage site, including tests, rather than adding crate-wide
suppression.
- Kept the change behavior-preserving: current cwd reads, writes, and
path resolution continue to use the same values.
## Verification
- `just fmt`
- `cargo check -p codex-core`
- `cargo check -p codex-core --tests`
- `git diff --check`
## Summary
- add a scoped level_id to ExtensionData and expose it through
level_id()
- remove thread_id/turn_id parameters from extension contributor inputs
where the scoped ExtensionData already carries that identity
- move turn-scoped extension data onto TurnContext so token usage and
lifecycle contributors can share the same turn store
## Testing
- cargo check -p codex-extension-api -p codex-core --tests
- cargo test -p codex-extension-api
- cargo test -p codex-guardian
- cargo test -p codex-core --lib
record_token_usage_info_notifies_extension_contributors
- cargo test -p codex-core --lib
submission_loop_channel_close_emits_thread_stop_lifecycle
- cargo test -p codex-core --lib
submission_loop_channel_close_aborts_active_turn_before_thread_stop_lifecycle
- just fix -p codex-extension-api
- just fix -p codex-guardian
- just fix -p codex-core
- just fmt
## Note
- Attempted cargo test -p codex-core; it aborted in
agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
with the existing stack overflow before the full suite completed.
## Why
Extensions can already contribute prompt, tool, turn-item, and
thread-lifecycle behavior, but there was no explicit host-owned hook for
per-turn setup and cleanup. That makes extension-private turn state
awkward: an extension either has to stash it outside the turn lifecycle
or depend on core runtime objects.
This adds a small turn lifecycle boundary. Extensions receive stable
identifiers plus the existing session, thread, and turn `ExtensionData`
stores, while core keeps owning task scheduling, cancellation, and turn
teardown.
## What Changed
- Added `TurnLifecycleContributor` with `on_turn_start`, `on_turn_stop`,
and `on_turn_abort` callbacks in `codex-rs/ext/extension-api`.
- Added typed `TurnStartInput`, `TurnStopInput`, and `TurnAbortInput`
payloads that expose `thread_id`, `turn_id`, `session_store`,
`thread_store`, and `turn_store`.
- Registered and re-exported turn lifecycle contributors through
`ExtensionRegistry` and `ExtensionRegistryBuilder`.
- Wired `Session` to emit turn start, stop, and abort callbacks from the
existing turn/task lifecycle paths.
- Carried the turn-scoped `ExtensionData` through `RunningTask` and
`RemovedTask` so stop/abort callbacks receive the same turn store
created at turn start.
## Verification
- Not run locally.
## Why
Tool families already disagree on what their existing `duration` fields
mean, so lifecycle latency should live on the shared item envelope
instead of being inferred from per-tool execution fields. Carrying that
envelope through app-server notifications gives downstream consumers one
reusable timing signal without pretending every tool has the same
execution semantics.
## What changed
- Adds `started_at_ms` to core `ItemStartedEvent` values and
`completed_at_ms` to core `ItemCompletedEvent` values.
- Populates those timestamps in the shared session lifecycle emitters,
so protocol-native items get timing without each producer tracking its
own clock state.
- Exposes `startedAtMs` on app-server `item/started` notifications and
`completedAtMs` on `item/completed` notifications.
- Maps the lifecycle timestamps through the app-server boundary while
leaving legacy-converted notifications nullable when no lifecycle
timestamp exists.
- Regenerates the app-server JSON schema and TypeScript fixtures for the
notification-envelope change and updates downstream fixtures that
construct those notifications directly.
- Extends the existing web-search and image-generation integration flows
to assert the new lifecycle timestamps on the native item events.
## Verification
- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-tui -p codex-exec
-p codex-app-server-client`
- `cargo test -p codex-core --test all web_search_item_is_emitted`
- `cargo test -p codex-core --test all
image_generation_call_event_is_emitted`
- `cargo test -p codex-app-server-protocol`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/20514).
* #18748
* #18747
* #17090
* #17089
* __->__ #20514
## Why
This adds the `remote_compaction_v2` client path so remote compaction
can run through the normal Responses stream and install a
`context_compaction` item that trigger a compaction.
The goal is to migrate some of the compaction logic on the client side
We keeps the v2 transport behind a feature flag while letting follow-up
requests reuse the compacted context instead of falling back to the
legacy compaction item shape.
## What changed
- add `ResponseItem::ContextCompaction` and refresh the generated
app-server / schema / TypeScript fixtures that expose response items on
the wire
- add `core/src/compact_remote_v2.rs` to send compaction through the
standard streamed Responses client, require exactly one
`context_compaction` output item, and install that item into compacted
history
- route manual compact and auto-compaction through the v2 path when
`remote_compaction_v2` is enabled, while keeping the existing remote
compaction path as the fallback
- preserve the new item type across history retention, follow-up request
construction, telemetry, rollout persistence, and rollout-trace
normalization
- add targeted coverage for the feature flag, `context_compaction`
serialization, rollout-trace normalization, and remote-compaction
follow-up behavior
## Verification
- added protocol tests for `context_compaction`
serialization/deserialization in `protocol/src/models.rs`
- added rollout-trace coverage for `context_compaction` normalization in
`rollout-trace/src/reducer/conversation_tests.rs`
- added remote compaction integration coverage for v2 follow-up reuse
and mixed compaction output streams in
`core/tests/suite/compact_remote.rs`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`/goal` is supposed to keep Codex working until the goal is actually
done. The previous continuation logic had two ways to stop early: the
continuation prompt told the model to wait for new input when it felt
blocked, and the runtime suppressed another continuation turn after a
continuation finished without any tool calls.
That made goals stop short even when the agent could still keep making
progress (I received a few reports of this from users). It also relied
on a brittle heuristic that treated "no registry tool calls" as
equivalent to "should stop."
## What changed
- removed the continuation prompt sentence that told the model to stop
and wait for new input when it could not continue productively
- removed the goal runtime suppression heuristic that stopped
auto-continuation after a no-tool continuation turn
- deleted the continuation-activity bookkeeping and left `tool_calls` as
telemetry only
- added focused regressions for the two intended behaviors: completed
no-tool continuation turns still continue, while `request_user_input`
keeps the existing turn open instead of spawning a new continuation
## Why
Several legacy `EventMsg` variants were still emitted or mapped even
though clients either ignored them or had moved to item/lifecycle
events. `Op::Undo` had also degraded to an unavailable shim, so this
removes that dead task path instead of preserving a command that cannot
do useful work.
`McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
intentionally kept because useful consumers still depend on them: MCP
startup completion drives readiness behavior, and the begin events let
app-server/core consumers surface in-progress web-search and
image-generation items before the final payload arrives.
## What Changed
- Removed weak legacy event variants and payloads from `codex-protocol`,
including legacy agent deltas, background events, and undo lifecycle
events.
- Kept/restored `EventMsg::McpStartupComplete`,
`EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
serializer and emission coverage.
- Updated core, rollout, MCP server, app-server thread history,
review/delegate filtering, and tests to rely on the useful replacement
events that remain.
- Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
slash-command comments.
- Stopped agent job/background progress and compaction retry notices
from emitting `BackgroundEvent` payloads.
## Verification
- `cargo check -p codex-protocol -p codex-app-server-protocol -p
codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-core --test all suite::items`
- `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
-p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
startup TUI tests.
## Why
`multi_agents_v2` is meant to be independently gated from the older
`collab` feature. The tool registry still treated the
collaboration-style agent tools as `collab`-only, so enabling
`multi_agents_v2` without `collab` omitted the v2 agent tools. Review
and guardian sub-sessions also need to keep agent spawning disabled even
when the outer session has `multi_agents_v2` enabled.
## What changed
- Include the collab-backed agent tools when either `multi_agents_v2` or
`collab` is enabled.
- Explicitly disable `multi_agents_v2` for review and guardian review
sub-sessions, matching the existing `spawn_csv` and `collab`
restrictions.
- Add a registry test that enables `multi_agents_v2`, disables `collab`,
and verifies the v2 agent tools are present while legacy `send_input`
and `resume_agent` remain hidden.
## Testing
- Added
`test_build_specs_multi_agent_v2_does_not_require_collab_feature`.
## Why
Slow Codex turns are easier to debug when token usage is visible in the
trace itself, without joining against separate analytics. This adds
token usage to existing turn-handling spans for regular user turns only.
[Example
turn](https://openai.datadoghq.com/apm/trace/9d353efa2cb5de1f4c5b93dc33c3df04?colorBy=service&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3555541504891512675&spanViewType=metadata&traceQuery=)
<img width="1447" height="967" alt="Screenshot 2026-04-24 at 3 03 07 PM"
src="https://github.com/user-attachments/assets/ab7bb187-e7fc-41f0-a366-6c44610b2b2c"
/>
## What Changed
Added response-level token fields on completed handle_responses spans:
gen_ai.usage.input_tokens
gen_ai.usage.cache_read.input_tokens
gen_ai.usage.output_tokens
codex.usage.reasoning_output_tokens
codex.usage.total_tokens
Added aggregate token fields on regular turn spans:
codex.turn.token_usage.*
Added an explicit regular-turn opt-in via
SessionTask::records_turn_token_usage_on_span() so this is not coupled
to span-name strings.
## Testing
- `cargo test -p codex-otel`
- `cargo test -p codex-core
turn_and_completed_response_spans_record_token_usage`
- `just fmt`
- `just fix -p codex-core`
- `just fix -p codex-otel`
- Manual local Electron/app-server smoke test: regular user turn emits
the new span fields
Known status: `cargo test -p codex-core` was attempted and failed in
unrelated existing areas: config approvals, request-permissions,
git-info ordering, and subagent metadata persistence.
## Why
- Without change: MCP tool calls receive
`_meta["x-codex-turn-metadata"]` with `session_id` and `turn_id`.
- Issue: MCP servers may want the turn start timestamp to measure
internal latency relative to turn start.
## What Changed
- With change: turn metadata now includes `turn_started_at_unix_ms`,
which is propagated to MCP tool calls in
`_meta["x-codex-turn-metadata"]`.
## Verification
- `codex-rs/core/src/mcp_tool_call_tests.rs`
- `codex-rs/core/src/turn_metadata_tests.rs`
- `codex-rs/core/src/turn_timing_tests.rs`
- `codex-rs/core/tests/responses_headers.rs`
- `codex-rs/core/tests/suite/search_tool.rs`
## Summary
- Remove `ghost_snapshot` / `GhostCommit` from the Responses API surface
and generated SDK/schema artifacts.
- Keep legacy config loading compatible, but make undo a no-op that
reports the feature is unavailable.
- Clean up core history, compaction, telemetry, rollout, and tests to
stop carrying ghost snapshot items.
## Testing
- Unit tests passed for `codex-protocol`, `codex-core` targeted undo and
compaction flows, `codex-rollout`, and `codex-app-server-protocol`.
- Regenerated config and app-server schemas plus Python SDK artifacts
and verified they match the checked-in outputs.
## Why
After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.
## What Changed
- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.
## Verification
- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
## Why
Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.
## What Changed
- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
## Why
This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.
`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.
The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.
## What Changed
- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.
## Why
A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.
## What changed
- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.
## Verification
- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
## Why
Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.
## What changed
- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.
## Testing
- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
## Why
`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.
This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.
## What changed
- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.
## Testing
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
## Why
`codex-models-manager` had grown to own provider-specific concerns:
constructing OpenAI-compatible `/models` requests, resolving provider
auth, emitting request telemetry, and deciding how provider catalogs
should be sourced. That made the manager harder to reuse for providers
whose model catalog is not fetched from the OpenAI `/models` endpoint,
such as Amazon Bedrock.
This change moves provider-specific model discovery behind
provider-owned implementations, so the models manager can focus on
refresh policy, cache behavior, picker ordering, and model metadata
merging.
## What Changed
- Introduced a `ModelsManager` trait with separate `OpenAiModelsManager`
and `StaticModelsManager` implementations.
- Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives
outside `codex-models-manager`.
- Moved `/models` request construction, provider auth resolution,
timeout handling, and request telemetry into `codex-model-provider` via
`OpenAiModelsEndpoint`.
- Added provider-owned `models_manager(...)` construction so configured
OpenAI-compatible providers use `OpenAiModelsManager`, while
static/catalog-backed providers can return `StaticModelsManager`.
- Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock
model IDs.
- Updated core/session/thread manager code and tests to depend on
`Arc<dyn ModelsManager>`.
- Moved offline model test helpers into
`codex_models_manager::test_support`.
## Metadata References
The Bedrock catalog metadata is based on the official Amazon Bedrock
OpenAI model documentation:
- [Amazon Bedrock OpenAI
models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
lists the Bedrock model IDs, text input/output modalities, and `128,000`
token context window for `gpt-oss-20b` and `gpt-oss-120b`.
- [Amazon Bedrock `gpt-oss-120b` model
card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html)
lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the
`bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities,
and `128K` context window.
- [OpenAI `gpt-oss-120b` model
docs](https://developers.openai.com/api/docs/models/gpt-oss-120b)
document configurable reasoning effort with `low`, `medium`, and `high`,
plus text input/output modality.
The display names, default reasoning effort, and priority ordering are
Codex-local catalog choices.
## Test Plan
- Manually verified app-server model listing with an AWS profile:
```shell
CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \
--codex-bin ./target/debug/codex \
-c 'model_provider="amazon-bedrock"' \
-c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \
-c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \
model-list
```
The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0`
as the default model and `openai.gpt-oss-20b-1:0` as the second listed
model, both text-only and supporting low/medium/high reasoning effort.
## Summary
Short circuit the convo if auto-review hits too many denials
## Testing
- [x] Added unit tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Guardian analytics includes time-to-first-token, but the Guardian
reviewer runs as a normal Codex session and `TurnCompleteEvent` did not
expose TTFT. The timing needs to flow through the standard
turn-completion protocol so Guardian review analytics can consume the
same value as the rest of the session machinery.
## What changed
Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and
populates it from `TurnTiming`. The value is carried through app-server
thread history, rollout reconstruction, TUI/app-server adapters, and
Guardian review session handling.
Guardian review analytics now captures TTFT from the reviewer
turn-complete event when available. Existing tests and fixtures are
updated to set the new optional field to `None` where TTFT is not
relevant.
## Verification
- `cargo clippy -p codex-tui --tests -- -D warnings`
- `cargo clippy -p codex-core --lib --tests -- -D warnings`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696).
* __->__ #17696
* #17695
* #17693
* #18278
* #18953
## Summary
This is the runtime/foundation half of the Windows sandbox unified-exec
work.
- add Windows sandbox `unified_exec` session support in
`windows-sandbox-rs` for both:
- the legacy restricted-token backend
- the elevated runner backend
- extend the PTY/process runtime so driver-backed sessions can support:
- stdin streaming
- stdout/stderr separation
- exit propagation
- PTY resize hooks
- add Windows sandbox runtime coverage in `codex-windows-sandbox` /
`codex-utils-pty`
This PR does **not** enable Windows sandbox `UnifiedExec` for product
callers yet because hooking this up to app-server comes in the next PR.
Windows sandbox advertising is intentionally kept aligned with `main`,
so sandboxed Windows callers still fall back to `ShellCommand`.
This PR isolates the runtime/session layer so it can be reviewed
independently from product-surface enablement.
---------
Co-authored-by: jif-oai <jif@openai.com>
Co-authored-by: Codex <noreply@openai.com>
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
## Summary
- Add `codex-model-provider` as the runtime home for model-provider
behavior that does not belong in `codex-core`, `codex-login`, or
`codex-api`.
- The new crate wraps configured `ModelProviderInfo` in a
`ModelProvider` trait object that can resolve the API provider config,
provider-scoped auth manager, and request auth provider for each call.
- This centralizes provider auth behavior in one place today, and gives
us an extension point for future provider-specific auth, model listing,
request setup, and related runtime behavior.
## Tests
Ran tests manually to make sure that provider auth under different
configs still work as expected.
---------
Co-authored-by: pakrym-oai <pakrym@openai.com>
**Summary**
- prevent managed requirements.toml network settings from leaking into
DangerFullAccess / yolo turns by gating managed proxy attachment on
sandbox mode
- keep guardian/sandboxed modes on the managed proxy path, while making
true yolo bypass the proxy entirely, including /shell full-access
commands
## Summary
- add an exec-server `envPolicy` field; when present, the server starts
from its own process env and applies the shell environment policy there
- keep `env` as the exact environment for local/embedded starts, but
make it an overlay for remote unified-exec starts
- move the shell-environment-policy builder into `codex-config` so Core
and exec-server share the inherit/filter/set/include behavior
- overlay only runtime/sandbox/network deltas from Core onto the
exec-server-derived env
## Why
Remote unified exec was materializing the shell env inside Core and
forwarding the whole map to exec-server, so remote processes could
inherit the orchestrator machine's `HOME`, `PATH`, etc. This keeps the
base env on the executor while preserving Core-owned runtime additions
like `CODEX_THREAD_ID`, unified-exec defaults, network proxy env, and
sandbox marker env.
## Validation
- `just fmt`
- `git diff --check`
- `cargo test -p codex-exec-server --lib`
- `cargo test -p codex-core --lib unified_exec::process_manager::tests`
- `cargo test -p codex-core --lib exec_env::tests`
- `cargo test -p codex-core --lib exec_env_tests` (compile-only; filter
matched 0 tests)
- `cargo test -p codex-config --lib shell_environment` (compile-only;
filter matched 0 tests)
- `just bazel-lock-update`
## Known local validation issue
- `just bazel-lock-check` is not runnable in this checkout: it invokes
`./scripts/check-module-bazel-lock.sh`, which is missing.
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: pakrym-oai <pakrym@openai.com>
The rollout writer now keeps an owned/monitored task handle, returns
real Result acks for flush/persist/shutdown, retries failed flushes by
reopening the rollout file, and keeps buffered items until they are
successfully written. Session flushes are now real durability barriers
for fork/rollback/read-after-write paths, while turn completion surfaces
a warning if the rollout still cannot be saved after recovery.
## Summary
- preserve legacy Windows elevated sandbox behavior for existing
policies
- add elevated-only support for split filesystem policies that can be
represented as readable-root overrides, writable-root overrides, and
extra deny-write carveouts
- resolve those elevated filesystem overrides during sandbox transform
and thread them through setup and policy refresh
- keep failing closed for explicit unreadable (`none`) carveouts and
reopened writable descendants under read-only carveouts
- for explicit read-only-under-writable-root carveouts, materialize
missing carveout directories during elevated setup before applying the
deny-write ACL
- document the elevated vs restricted-token support split in the core
README
## Example
Given a split filesystem policy like:
```toml
":root" = "read"
":cwd" = "write"
"./docs" = "read"
"C:/scratch" = "write"
```
the elevated backend now provisions the readable-root overrides,
writable-root overrides, and extra deny-write carveouts during setup and
refresh instead of collapsing back to the legacy workspace-only shape.
If a read-only carveout under a writable root is missing at setup time,
elevated setup creates that carveout as an empty directory before
applying its deny-write ACE; otherwise the sandboxed command could
create it later and bypass the carveout. This is only for explicit
policy carveouts. Best-effort workspace protections like `.codex/` and
`.agents/` still skip missing directories.
A policy like:
```toml
"/workspace" = "write"
"/workspace/docs" = "read"
"/workspace/docs/tmp" = "write"
```
still fails closed, because the elevated backend does not reopen
writable descendants under read-only carveouts yet.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Carry `AbsolutePathBuf` through tool cwd parsing/resolution instead of
resolving workdirs to raw `PathBuf`s.
- Type exec/sandbox request cwd fields as `AbsolutePathBuf` through
`ExecParams`, `ExecRequest`, `SandboxCommand`, and unified exec runtime
requests.
- Keep `PathBuf` conversions at external/event boundaries and update
existing tests/fixtures for the typed cwd.
## Validation
- `cargo check -p codex-core --tests`
- `cargo check -p codex-sandboxing --tests`
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-core --lib tools::handlers::`
- `just fix -p codex-sandboxing`
- `just fix -p codex-core`
- `just fmt`
Full `codex-core` test suite was not run locally; per repo guidance I
kept local validation targeted.
Addresses #15527
Problem: Nested `codex exec` commands could source a shell snapshot that
re-exported the parent `CODEX_THREAD_ID`, so commands inside the nested
session were attributed to the wrong thread.
Solution: Reapply the live command env's `CODEX_THREAD_ID` after
sourcing the snapshot.
## Summary
- reduce public module visibility across Rust crates, preferring private
or crate-private modules with explicit crate-root public exports
- update external call sites and tests to use the intended public crate
APIs instead of reaching through module trees
- add the module visibility guideline to AGENTS.md
## Validation
- `cargo check --workspace --all-targets --message-format=short` passed
before the final fix/format pass
- `just fix` completed successfully
- `just fmt` completed successfully
- `git diff --check` passed
Stacked on #16508.
This removes the temporary `codex-core` / `codex-login` re-export shims
from the ownership split and rewrites callsites to import directly from
`codex-model-provider-info`, `codex-models-manager`, `codex-api`,
`codex-protocol`, `codex-feedback`, and `codex-response-debug-context`.
No behavior change intended; this is the mechanical import cleanup layer
split out from the ownership move.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
This continues the compile-time cleanup from #16630. `SessionTask`
implementations are monomorphized, but `Session` stores the task behind
a `dyn` boundary so it can drive and abort heterogenous turn tasks
uniformly. That means we can move the `#[async_trait]` expansion off the
implementation trait, keep a small boxed adapter only at the storage
boundary, and preserve the existing task lifecycle semantics while
reducing the amount of generated async-trait glue in `codex-core`.
One measurement caveat showed up while exploring this: a warm
incremental benchmark based on `touch core/src/tasks/mod.rs && cargo
check -p codex-core --lib` was basically flat, but that was the wrong
benchmark for this change. Using package-clean `codex-core` rebuilds,
like #16630, shows the real win.
Relevant pre-change code:
- [`SessionTask` with
`#[async_trait]`](3c7f013f97/codex-rs/core/src/tasks/mod.rs (L129-L182))
- [`RunningTask` storing `Arc<dyn
SessionTask>`](3c7f013f97/codex-rs/core/src/state/turn.rs (L69-L77))
## What changed
- Switched `SessionTask::{run, abort}` to native RPITIT futures with
explicit `Send` bounds.
- Added a private `AnySessionTask` adapter that boxes those futures only
at the `Arc<dyn ...>` storage boundary.
- Updated `RunningTask` to store `Arc<dyn AnySessionTask>` and removed
`#[async_trait]` from the concrete task impls plus test-only
`SessionTask` impls.
## Timing
Benchmarked package-clean `codex-core` rebuilds with dependencies left
warm:
```shell
cargo check -p codex-core --lib >/dev/null
cargo clean -p codex-core >/dev/null
/usr/bin/time -p cargo +nightly rustc -p codex-core --lib -- \
-Z time-passes \
-Z time-passes-format=json >/dev/null
```
| revision | rustc `total` | process `real` | `generate_crate_metadata`
| `MIR_borrow_checking` | `monomorphization_collector_graph_walk` |
| --- | ---: | ---: | ---: | ---: | ---: |
| parent `3c7f013f9735` | 67.21s | 67.71s | 24.61s | 23.43s | 22.43s |
| this PR `2cafd783ac22` | 35.08s | 35.60s | 8.01s | 7.25s | 7.15s |
| delta | -47.8% | -47.4% | -67.5% | -69.1% | -68.1% |
For completeness, the warm touched-file benchmark stayed flat (`1.96s`
parent vs `1.97s` this PR), which is why that benchmark should not be
used to evaluate this refactor.
## Verification
- Ran `cargo test -p codex-core`; this change compiled and task-related
tests passed before hitting the same unrelated 5
`config::tests::*guardian*` failures already present on the parent
stack.