## Why
SQLite state was still being opened from consumer paths, including lazy
`OnceCell`-backed thread-store call sites. That let one process
construct multiple state DB connections for the same Codex home, which
makes SQLite lock contention and `database is locked` failures much
easier to hit.
State DB lifetime should be chosen by main-like entrypoints and tests,
then passed through explicitly. Consumers should use the supplied
`Option<StateDbHandle>` or `StateDbHandle` and keep their existing
filesystem fallback or error behavior when no handle is available.
The startup path also needs to keep the rollout crate in charge of
SQLite state initialization. Opening `codex_state::StateRuntime`
directly bypasses rollout metadata backfill, so entrypoints should
initialize through `codex_rollout::state_db` and receive a handle only
after required rollout backfills have completed.
## What Changed
- Initialize the state DB in main-like entrypoints for CLI, TUI,
app-server, exec, MCP server, and the thread-manager sample.
- Pass `Option<StateDbHandle>` through `ThreadManager`,
`LocalThreadStore`, app-server processors, TUI app wiring, rollout
listing/recording, personality migration, shell snapshot cleanup,
session-name lookup, and memory/device-key consumers.
- Remove the lazy local state DB wrapper from the thread store so
non-test consumers use only the supplied handle or their existing
fallback path.
- Make `codex_rollout::state_db::init` the local state startup path: it
opens/migrates SQLite, runs rollout metadata backfill when needed, waits
for concurrent backfill workers up to a bounded timeout, verifies
completion, and then returns the initialized handle.
- Keep optional/non-owning SQLite helpers, such as remote TUI local
reads, as open-only paths that do not run startup backfill.
- Switch app-server startup from direct
`codex_state::StateRuntime::init` to the rollout state initializer so
app-server cannot skip rollout backfill.
- Collapse split rollout lookup/list APIs so callers use the normal
methods with an optional state handle instead of `_with_state_db`
variants.
- Restore `getConversationSummary(ThreadId)` to delegate through
`ThreadStore::read_thread` instead of a LocalThreadStore-specific
rollout path special case.
- Keep DB-backed rollout path lookup keyed on the DB row and file
existence, without imposing the filesystem filename convention on
existing DB rows.
- Verify readable DB-backed rollout paths against `session_meta.id`
before returning them, so a stale SQLite row that points at another
thread's JSONL falls back to filesystem search and read-repairs the DB
row.
- Keep `debug prompt-input` filesystem-only so a one-off debug command
does not initialize or backfill SQLite state just to print prompt input.
- Keep goal-session test Codex homes alive only in the goal-specific
helper, rather than leaking tempdirs from the shared session test
helper.
- Update tests and call sites to pass explicit state handles where DB
behavior is expected and explicit `None` where filesystem-only behavior
is intended.
## Validation
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p
codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p
codex-tui -p codex-exec -p codex-cli --tests`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout state_db_`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout find_thread_path`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout find_thread_path -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout try_init_ -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p
codex-rollout --lib -- -D warnings`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-thread-store
read_thread_falls_back_when_sqlite_path_points_to_another_thread --
--nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-thread-store`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
shell_snapshot`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all personality_migration`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find`
- `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find::find_prefers_sqlite_path_by_id --
--nocapture`
- `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
interrupt_accounts_active_goal_before_pausing`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-app-server get_auth_status -- --test-threads=1`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-app-server --lib`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout
-p codex-app-server --tests`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout
-p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p
codex-exec -p codex-cli`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p
codex-app-server`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p
codex-rollout`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core`
- `just argument-comment-lint -p codex-core`
- `just argument-comment-lint -p codex-rollout`
Focused coverage added in `codex-rollout`:
- `recorder::tests::state_db_init_backfills_before_returning` verifies
the rollout metadata row exists before startup init returns.
- `state_db::tests::try_init_waits_for_concurrent_startup_backfill`
verifies startup waits for another worker to finish backfill instead of
disabling the handle for the process.
-
`state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill`
verifies startup does not hang indefinitely on a stuck backfill lease.
-
`tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename`
verifies DB-backed lookup accepts valid existing rollout paths even when
the filename does not include the thread UUID.
-
`tests::find_thread_path_falls_back_when_db_path_points_to_another_thread`
verifies DB-backed lookup ignores a stale row whose existing path
belongs to another thread and read-repairs the row after filesystem
fallback.
Focused coverage updated in `codex-core`:
- `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a
DB-preferred rollout file with matching `session_meta.id`, so it still
verifies that valid SQLite paths win without depending on stale/empty
rollout contents.
`cargo test -p codex-app-server thread_list_respects_search_term_filter
-- --test-threads=1 --nocapture` was attempted locally but timed out
waiting for the app-server test harness `initialize` response before
reaching the changed thread-list code path.
`bazel test //codex-rs/thread-store:thread-store-unit-tests
--test_output=errors` was attempted locally after the thread-store fix,
but this container failed before target analysis while fetching `v8+`
through BuildBuddy/direct GitHub. The equivalent local crate coverage,
including `cargo test -p codex-thread-store`, passes.
A plain local `cargo check -p codex-rollout -p codex-app-server --tests`
also requires system `libcap.pc` for `codex-linux-sandbox`; the
follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in
this container.
## Why
This is a prep PR in the multi-environment process-tool stack. It
separates ownership/config cleanup from the behavior change that teaches
process tools to route by selected environment, so the follow-up PR can
focus on model-facing `environment_id` behavior.
## Stack
1. https://github.com/openai/codex/pull/20646 - `EnvironmentContext`
rendering for selected environments
2. https://github.com/openai/codex/pull/20669 - selected-environment
ownership and tool config prep (this PR)
3. https://github.com/openai/codex/pull/20647 - process-tool
`environment_id` routing
## What Changed
- keep the resolved turn environment list wrapped in
`ResolvedTurnEnvironments` through `TurnContext` instead of unwrapping
it back to a raw `Vec`
- add `TurnContext::resolve_path_against` so cwd-relative path
resolution has one shared helper
- replace the old tool config boolean with `ToolEnvironmentMode::{None,
Single, Multiple}`
## Testing
- Tests not run locally; this prep refactor is covered by GitHub CI for
the stack.
Co-authored-by: Codex <noreply@openai.com>
## Why
`McpToolCall` was still an app-server item synthesized from deprecated
legacy begin/end events. Recent item migrations moved this ownership
into core `TurnItem`s, so MCP tool calls now follow the same canonical
lifecycle and leave legacy events as compatibility fanout.
Keeping the core item close to the v2 `ThreadItem::McpToolCall` shape
also avoids spreading MCP result semantics across app-server conversion
code. Core now owns whether a completed call is `completed` or `failed`,
and whether the payload is a tool result or an error.
## What changed
- Added core `TurnItem::McpToolCall` with flattened `server`, `tool`,
`arguments`, `status`, `result`, and `error` fields.
- Updated MCP tool call emitters, including MCP resource tools, to emit
`ItemStarted`/`ItemCompleted` around directly constructed core MCP
items.
- Updated app-server v2 conversion to project the core MCP item into
`ThreadItem::McpToolCall` without deriving status or splitting `Result`
locally.
- Ignored live deprecated MCP legacy fanout in app-server v2 to avoid
duplicate item notifications, while keeping thread history replay on the
legacy event path.
## Verification
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core --lib mcp_tool_call`
- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server
mcp_tool_call_completion_notification_contains_truncated_large_result`
Tool suggest still misfires when model needs tool_search, updating the
prompts to further disambiguate it:
- [x] rename it from `tool_suggest` to `request_plugin_install`
- [x] rephrase "suggestion" to "install" in the tool descriptions.
- [x] disambiguate "the tool" vs "the plugin/connector".
Tested with the Codex App and verified it still works.
## Why
Image-view results should be represented as a core-produced turn item
instead of being reconstructed by app-server. At the same time, existing
rollout/history paths still understand the legacy `ViewImageToolCall`
event, so this keeps that event as compatibility output generated from
the new item lifecycle.
## What changed
- Added `TurnItem::ImageView` to `codex-protocol`.
- Emitted image-view item start/completion directly from the core
`view_image` handler.
- Kept `ViewImageToolCall` as a legacy event and generate it from
completed `TurnItem::ImageView` items.
- Kept `thread_history.rs` on the legacy `ViewImageToolCall` replay
path, with `ImageView` item lifecycle events ignored there.
- Updated app-server protocol conversion, rollout persistence, and
affected exhaustive event matches for the new item plus legacy fan-out
shape.
## Verification
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server -p
codex-app-server --lib`
- `cargo test -p codex-core --test all
view_image_tool_attaches_local_image`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-rollout -p codex-rollout-trace -p
codex-mcp-server`
- `git diff --check`
## Summary
- make selected turn environments the source of truth for session
runtime cwd and MCP runtime environment selection
- keep local/no-selection fallback behavior intact
- add coverage for duplicate selected environments, cwd resolution, and
MCP runtime environment selection
## Validation
- git diff --check
- rustfmt was run on touched Rust files during the implementation
workflow
CI should provide the full Bazel/test signal.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
Apply-patch file changes are now part of the core turn item stream, so
v2 clients can consume the same first-class item lifecycle path used by
other turn items instead of relying on app-server-specific remapping
from legacy patch events.
## What changed
- Added a core `TurnItem::FileChange` carrying apply-patch changes and
completion metadata.
- Updated the apply-patch tool emitter to send `ItemStarted` /
`ItemCompleted` with the new `FileChange` item while preserving legacy
`PatchApplyBegin` / `PatchApplyEnd` fan-out.
- Updated app-server v2 conversion to render the new core item directly
and stopped `event_mapping` from remapping old patch begin/end events
into item notifications.
- Kept thread history reconstruction based on the existing old
apply-patch events for rollout compatibility.
## Verification
- `cargo test -p codex-protocol -p codex-app-server-protocol`
- `cargo test -p codex-core --test all
apply_patch_tool_executes_and_emits_patch_events`
- `cargo test -p codex-app-server bespoke_event_handling`
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
## Why
Several legacy `EventMsg` variants were still emitted or mapped even
though clients either ignored them or had moved to item/lifecycle
events. `Op::Undo` had also degraded to an unavailable shim, so this
removes that dead task path instead of preserving a command that cannot
do useful work.
`McpStartupComplete`, `WebSearchBegin`, and `ImageGenerationBegin` are
intentionally kept because useful consumers still depend on them: MCP
startup completion drives readiness behavior, and the begin events let
app-server/core consumers surface in-progress web-search and
image-generation items before the final payload arrives.
## What Changed
- Removed weak legacy event variants and payloads from `codex-protocol`,
including legacy agent deltas, background events, and undo lifecycle
events.
- Kept/restored `EventMsg::McpStartupComplete`,
`EventMsg::WebSearchBegin`, and `EventMsg::ImageGenerationBegin` with
serializer and emission coverage.
- Updated core, rollout, MCP server, app-server thread history,
review/delegate filtering, and tests to rely on the useful replacement
events that remain.
- Removed `Op::Undo`, `UndoTask`, the undo test module, and stale TUI
slash-command comments.
- Stopped agent job/background progress and compaction retry notices
from emitting `BackgroundEvent` payloads.
## Verification
- `cargo check -p codex-protocol -p codex-app-server-protocol -p
codex-core -p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-protocol -p codex-app-server-protocol -p
codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- `cargo test -p codex-core --test all suite::items`
- `just fix -p codex-protocol -p codex-app-server-protocol -p codex-core
-p codex-rollout -p codex-rollout-trace -p codex-mcp-server`
- Earlier coverage on this PR also included `codex-mcp`, `codex-tui`,
core library tests, MCP/plugin/delegate/review/agent job tests, and MCP
startup TUI tests.
## Why
On Windows, Codex runs shell commands through a top-level
`powershell.exe -NoProfile -Command ...` wrapper. `execpolicy` was
matching that wrapper instead of the inner command, so prefix rules like
`["git", "push"]` did not fire for PowerShell-wrapped commands even
though the same normalization already happens for `bash -lc` on Unix.
This change makes the Windows shell wrapper transparent to rule matching
while preserving the existing Windows unmatched-command safelist and
dangerous-command heuristics.
## What changed
- add `parse_powershell_command_plain_commands()` in
`shell-command/src/powershell.rs` to unwrap the top-level PowerShell
`-Command` body with `extract_powershell_command()` and parse it with
the existing PowerShell AST parser
- update `core/src/exec_policy.rs` so `commands_for_exec_policy()`
treats top-level PowerShell wrappers like `bash -lc` and evaluates rules
against the parsed inner commands
- carry a small `ExecPolicyCommandOrigin` through unmatched-command
evaluation and expose `is_safe_powershell_words()` /
`is_dangerous_powershell_words()` so Windows safelist and
dangerous-command checks still work after unwrap
- add Windows-focused tests for wrapped PowerShell prompt/allow matches,
wrapper parsing, and unmatched safe/dangerous inner commands, and
re-enable the end-to-end `execpolicy_blocks_shell_invocation` test on
Windows
## Testing
- `cargo test -p codex-shell-command`
## Why
`item/fileChange/outputDelta` text output was only the tool's summary or
error text and not used by client surfaces.
We keep `item/fileChange/outputDelta` in the app-server protocol as a
deprecated compatibility entry, but the server no longer emits it.
## What changed
- stop the `apply_patch` runtime from emitting `ExecCommandOutputDelta`
events
- simplify `item_event_to_server_notification` so command output deltas
always map to `item/commandExecution/outputDelta`
- remove the app-server bookkeeping that tried to detect whether an
output delta belonged to a file change
- mark `item/fileChange/outputDelta` as a deprecated legacy protocol
entry in the v2 types, schema, and README
- simplify the file-change approval tests so they only wait for
completion instead of expecting output-delta notifications
## Testing
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-thread-manager-sample`
- `cargo test -p codex-app-server-protocol
protocol::event_mapping::tests::exec_command_output_delta_maps_to_command_execution_output_delta
-- --exact`
- `cargo test -p codex-app-server
turn_start_file_change_approval_accept_for_session_persists_v2 --
--exact` *(failed before the test assertions because the wiremock
`/responses` mock received 0 requests in setup)*
## Why
Large MCP tool call outputs can make rollout JSONL files enormous. In
the session that motivated this change, the biggest JSONL records were:
- `event_msg/mcp_tool_call_end`
- `response_item/function_call_output`
both containing the same unbounded MCP payloads - just 3 MCP tool calls
that each were multi-hundred MBs 😱
This PR truncates both of those JSONL records.
## How
#### For `response_item/function_call_output`
Unified exec already bounds tool output before it is injected into
model-facing history, which also keeps the corresponding rollout
`response_item/function_call_output` records small.
MCP should follow the same pattern: truncate the model-facing tool
output at the tool-output boundary, while leaving code-mode/raw hook
consumers alone.
#### For `event_msg/mcp_tool_call_end`
`McpToolCallEnd` also needs its own bounded event copy because it is the
app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`.
Unfortunately this is _not_ downstream of the `ToolOutput` trait.
## Model behavior
Model behavior is actually unchanged as a result of this PR.
Before this PR, MCP output was:
1. Converted to `FunctionCallOutput`.
2. Recorded into in-memory history.
3. Truncated by `ContextManager::record_items()` before later model
turns saw it.
After this branch, MCP output is truncated earlier, in
`McpToolOutput::response_payload()`, using the same helper. Then
`ContextManager::record_items()` sees an already-truncated output and
effectively has little/no additional work to do.
So the model should still see the same kind of truncated function-call
output. The practical difference is where truncation happens: earlier,
before rollout persistence/app-server emission can see the giant
payload.
## Verification
- `cargo test -p codex-core mcp_tool_output`
- `cargo test -p codex-core
mcp_tool_call::tests::truncate_mcp_tool_result_for_event`
- `cargo test -p codex-core
mcp_post_tool_use_payload_uses_model_tool_name_args_and_result`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`
## Why
After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.
## What
- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.
## Stack
1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882
## Reviewer Notes
The generated schema files account for much of the raw diff. The core
behavior is in:
- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- remove the Windows-specific unified-exec environment block from tool
selection
- keep `unified_exec` default-off on Windows unless the feature is
explicitly enabled
- normalize model-provided `shell_type = unified_exec` to
`shell_command` when the feature is disabled
- drop obsolete tests tied to the removed environment gate and keep the
feature-flag regression coverage
## Why
Now that the session/long-lived process backend is implemented for the
Windows sandbox, we don't need to hard disable it anymore. We will be
rolling out slowly using a feature gate.
## Impact
This allows manual Windows opt-in in CLI and app-backed flows while
preserving the existing default-off behavior for Windows users.
---------
Co-authored-by: canvrno-oai <kbond@openai.com>
Co-authored-by: Codex <noreply@openai.com>
## Why
This bug is exposed by Guardian/auto-review approvals. With the managed
network proxy enabled, a blocked network request can be reported back
through the network approval service as an approval denial after the
command has already started. Before this change, the shell and unified
exec runtimes registered those network approval calls, but did not have
a way to observe an async proxy denial as a cancellation/failure signal
for the running process.
The result was confusing: Guardian/auto-review could correctly deny
network access, but the command path could keep running or unregister
the approval without surfacing the denial as the command failure.
## What Changed
- `NetworkApprovalService` now attaches a cancellation token to active
and deferred network approvals.
- Proxy-denial outcomes are recorded only for active registrations,
cancel the owning token, and are consumed when the approval is
finalized.
- The shell runtime combines the normal command timeout with the
network-denial cancellation token.
- Unified exec stores the deferred network approval object, terminates
tracked processes when the proxy denial arrives, and returns the denial
as a process failure while polling or completing the process.
- Tool orchestration passes the active network approval cancellation
token into the sandbox attempt and preserves deferred approval errors
instead of silently unregistering them.
- App-server `command/exec` now handles the combined
timeout-or-cancellation expiration variant used by the runtime.
## Verification
- `cargo test -p codex-core network_approval --lib`
- `cargo clippy -p codex-app-server --all-targets -- -D warnings`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
---------
Co-authored-by: Codex <noreply@openai.com>
Summary:
- Add codex-thread-manager-sample, a one-shot binary that starts a
ThreadManager thread, submits a prompt, and prints the final assistant
output.
- Pass ThreadStore into ThreadManager::new and expose
thread_store_from_config for existing callsites.
- Build the sample Config directly with only --model and prompt inputs.
Verification:
- just fmt
- cargo check -p codex-thread-manager-sample -p codex-app-server -p
codex-mcp-server
- git diff --check
Tests: Not run per request.
## Why
`agents.max_depth` is a legacy multi-agent v1 guard. Multi-agent v2 uses
task-path routing and its own session/thread limits, so v2 should not
reject nested `spawn_agent` calls just because the thread-spawn depth
has reached the v1 maximum.
Keeping the v1 depth guard active in v2 prevents deeper task trees even
though the v2 path still needs the depth value only for lineage and
task-path metadata.
## What Changed
- Removed the depth-limit rejection from the multi-agent v2
`spawn_agent` handler while still computing child depth for lineage/path
metadata.
- Made the depth-based disabling of legacy `SpawnCsv`/`Collab` tools
apply only when `Feature::MultiAgentV2` is disabled.
- Added `multi_agent_v2_spawn_agent_ignores_configured_max_depth` to
cover a v2 child spawning another agent when `agent_max_depth = 1`,
while the existing v1 depth-limit tests continue to enforce the legacy
behavior.
## Verification
- `cargo test -p codex-core
multi_agent_v2_spawn_agent_ignores_configured_max_depth -- --nocapture`
- `cargo test -p codex-core depth_limit -- --nocapture`
- `cargo test -p codex-core tools::handlers::multi_agents::tests --
--nocapture`
Messages sent with `followup_task` already arrive at their target
recipient promptly (at message boundaries while sampling, or after the
pending tool call completes) -- having `interrupt` is not worth the
added complexity.
## Why
Unsupported features must fail closed and Codex must not expose
OpenAI-hosted fallback paths when the active provider cannot support
them. In practice, Bedrock should not surface app connectors, MCP
servers, tool search/suggestions, image generation, web search, or JS
REPL until those paths are explicitly supported for that provider.
This PR moves that decision into provider-owned capability metadata
instead of scattering Bedrock-specific checks across callers.
## What changed
- Adds `ProviderCapabilities` to `codex-model-provider`, with default
support for existing providers and a Bedrock override that disables
unsupported launch surfaces.
- Adds `ToolCapabilityBounds` to `codex-tools` so provider capability
limits can clamp otherwise-enabled tool config.
- Applies capability bounds when building session and review-thread tool
config.
- Routes MCP/app connector configuration through
`McpManager::mcp_config`, which filters configured MCP servers and app
connectors based on the active provider.
- Updates app-server MCP list/read paths to use the filtered MCP config.
- Adds coverage for default provider capabilities, Bedrock disabled
capabilities, and optional tool-surface clamping.
## Testing
built locally and verified that bedrock responses api now return without
errors calling unsupported tools.
## Summary
- Add `disable_tool_suggest` to app and plugin config, schema, and
TypeScript output
- Exclude disabled connectors and plugins from tool suggestion discovery
- Persist "never show again" tool-suggestion choices back into
`config.toml`
- Update config docs and add coverage for connector and plugin
suppression
## Testing
- Added and updated unit tests for config persistence and tool-suggest
filtering
- Not run (not requested)
## Why
MultiAgentV2 `wait_agent` currently clamps short waits to a fixed 10
second minimum. That default is still useful for preventing tight
polling loops, but it is too rigid for environments that need faster
mailbox wake-up checks or a larger minimum to discourage frequent
polling.
This PR makes the minimum wait timeout configurable from the existing
MultiAgentV2 feature config section, so operators can tune the behavior
without changing the legacy multi-agent tool surface.
## What Changed
- Added `features.multi_agent_v2.min_wait_timeout_ms`.
- Defaulted the new setting to the existing 10 second floor.
- Validated the configured value as `1..=3600000`, matching the existing
one hour maximum wait bound.
- Applied the configured minimum to MultiAgentV2 `wait_agent` runtime
clamping.
- Plumbed the configured minimum into the `wait_agent` tool schema,
including the effective default when the minimum is above the normal 30
second default.
- Regenerated `core/config.schema.json`.
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib multi_agent_v2`
- `just fix -p codex-core`
Keep extracting memories out of core and moving the write trigger in the
app-server
This is temporary and it should move at the client level as a follow-up
This makes core fully independant from `codex-memories-write`
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`ThreadConfigSnapshot` is used by app-server and thread metadata code as
a stable view of active runtime settings. Keeping both `sandbox_policy`
and `permission_profile` in the snapshot duplicates permission state and
makes it possible for the legacy projection to drift from the canonical
profile.
The legacy `sandbox` value is still needed at app-server compatibility
boundaries, so this PR derives it on demand from the snapshot profile
and cwd instead of storing it.
## What Changed
- Removes `ThreadConfigSnapshot.sandbox_policy`.
- Adds `ThreadConfigSnapshot::sandbox_policy()` as a compatibility
projection from `permission_profile` plus `cwd`.
- Updates app-server response/metadata code and tests to call the
projection only where legacy fields still exist.
- Keeps snapshot construction profile-only so split filesystem rules,
disabled enforcement, and external enforcement remain represented by the
canonical profile.
## Verification
- `cargo test -p codex-app-server
thread_response_permission_profile_preserves_enforcement --lib`
- `cargo test -p codex-core
dispatch_reclaims_stale_global_lock_and_starts_consolidation --lib`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19775).
* #19900
* #19899
* #19776
* __->__ #19775
## Why
The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.
## What changed
- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.
## Compatibility
Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.
## Follow-up
This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.
## Verification
- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
fixes#19486
### Problem
Right now dynamic deferred tools are filtered at normal-turn prompt
building time, rather than upstream while building the `ToolRouter`
itself. This causes issues because dynamic deferred tools are then
wrongly included in the router's `model_visible_specs`, which is what
the compaction request-building flow relies on.
### Fix
Move the dynamic deferred tool filtering to `ToolRouter` creation time
to solve this problem for every request that relies on `ToolRouter` for
`model_visible_specs`, which solves the issue generically.
### Tests
Added unit + integration tests to ensure dynamic deferred tools are
omitted from `model_visible_specs` and compaction request respectively.
Tested against live `/compact` endpoint; raw deferred dynamic tools
without `tool_search` returned `400` (current bug), while the filtered
payload (this fix) returns `200`.
## Why
`features.multi_agent_v2.max_concurrent_threads_per_session` is meant to
be the MultiAgentV2-specific session thread cap: it counts the root
thread and all open subagent threads. The previous implementation kept
this surface tied to `agents.max_threads`, which made it a global
subagent-only cap and allowed the legacy setting to coexist with
MultiAgentV2.
## What Changed
- Added `max_concurrent_threads_per_session` to
`[features.multi_agent_v2]` with default `4`.
- Removed the `[agents] max_concurrent_threads_per_session` alias to
`agents.max_threads`.
- When MultiAgentV2 is enabled, reject `agents.max_threads` and derive
the existing internal subagent slot limit as
`max_concurrent_threads_per_session - 1`.
- Regenerated `core/config.schema.json` and added coverage for the new
config semantics.
## Result
```
➜ codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3
╭────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0) │
│ │
│ model: gpt-5.5 xhigh fast /model to change │
│ directory: ~/code/codex │
╰────────────────────────────────────────────────────╯
Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager.
› Can you try to spawn 4 agents
• I’ll try to start four lightweight agents at once and report exactly what the runtime accepts.
• Spawned Russell [no-apps] (gpt-5.5 xhigh)
└ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Spawned Descartes [no-apps] (gpt-5.5 xhigh)
└ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Agent spawn failed
└ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• Agent spawn failed
└ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work.
• The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed.
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.
## What Changed
- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.
## Verification
- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
## Why
Several execution paths still converted profile-backed permissions into
`SandboxPolicy` and then rebuilt runtime permissions from that legacy
shape. Those round trips are unnecessary after the preceding PRs and can
lose split filesystem semantics. Core approval and escalation should
carry the resolved profile directly.
## What Changed
- Removes `sandbox_policy` from `ResolvedPermissionProfile`; the
resolved permission object now carries the canonical `PermissionProfile`
directly.
- Updates exec-policy fallback, shell/unified-exec interception,
escalation reruns, and related tests to pass profiles instead of legacy
policies.
- Removes legacy additional-permission merge helpers that built an
effective `SandboxPolicy` before rebuilding runtime permissions.
- Keeps legacy projections only at compatibility boundaries that still
require `SandboxPolicy`, not in core permission computation.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394).
* #19737
* #19736
* #19735
* #19734
* #19395
* __->__ #19394
## Why
Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.
## What Changed
- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
## Why
Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.
To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.
This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.
## What Changed
- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI
## Testing
- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`
## Notes
- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
## Why
After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.
## What Changed
- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
## Why
This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.
`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.
The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.
## What Changed
- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.
## Verification
- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
## Why
`sandbox_permissions = "require_escalated"` is treated as an explicit
request to approve the command and run it outside the
filesystem/platform sandbox. Before this change, shell and unified exec
still registered managed network approval context and could inject
Codex-managed proxy state into the child process, which meant an
approved escalated command could still hit a second network approval
path.
This PR makes that escalation boundary consistent: once a command is
explicitly approved to run outside the sandbox, Codex does not also
route that process through the managed network proxy.
## Security impact
Command/filesystem sandbox approval now implies network approval for
that command. If an untrusted command or script is allowed to run with
`require_escalated`, its network calls are unsandboxed: Codex-managed
network allowlists and denylists are not respected for that process, so
the command can exfiltrate any data it can read.
## What changed
- Skip managed network approval specs for
`SandboxPermissions::RequireEscalated`.
- Pass `network: None` into shell, zsh-fork shell, and unified exec
sandbox preparation for explicitly escalated requests.
- Strip Codex-managed proxy environment variables when
`CODEX_NETWORK_PROXY_ACTIVE` is present, while preserving user proxy env
when the Codex marker is absent.
- Add regression coverage for the prepared exec request so the old
behavior cannot silently reappear.
## Verification
- `cargo test -p codex-core explicit_escalation`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.
## Why
A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.
## What changed
- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.
## Verification
- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
Adds the model-facing goal tools on top of the app-server API from PR 2.
## Why
Once goals are persisted and exposed to clients, the model needs a
small, constrained tool surface for goal workflows. The tool contract
should let the model inspect goals, create them only when explicitly
requested, and mark them complete without giving it broad control over
user/runtime-owned state.
## What changed
- Added `get_goal`, `create_goal`, and `update_goal` tool specs behind
the `goals` feature flag.
- Added core goal tool handlers that validate objectives and token
budgets before mutating persisted state.
- Constrained `create_goal` to create only when no goal exists, with
optional `token_budget` only when a budget is explicitly provided.
- Tightened the `create_goal` instructions so the model does not infer
goals from ordinary task requests.
- Constrained `update_goal` to expose only goal completion; pause,
resume, clear, and budget-limited transitions remain user- or
runtime-controlled.
- Registered the goal tools in the tool registry and kept them out of
review contexts where they should not appear.
## Verification
- Added tool-registry coverage for feature gating and tool availability.
- Added core session tests for create/get/update behavior, duplicate
goal rejection, budget validation, and completion-only updates.
## Why
`ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`:
`FullAccess` meant the historical read-only/workspace-write modes could
read the full filesystem, while `Restricted` tried to carry partial
readable roots. The partial-read model now belongs in
`FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on
`SandboxPolicy` makes every legacy projection reintroduce lossy
read-root bookkeeping and creates unnecessary noise in the rest of the
permissions migration.
This PR makes the legacy policy model narrower and explicit:
`SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent
the old full-read sandbox modes only. Split readable roots, deny-read
globs, and platform-default/minimal read behavior stay in the runtime
permissions model.
## What changed
- Removes `ReadOnlyAccess` from
`codex_protocol::protocol::SandboxPolicy`, including the generated
`access` and `readOnlyAccess` API fields.
- Updates legacy policy/profile conversions so restricted filesystem
reads are represented only by `FileSystemSandboxPolicy` /
`PermissionProfile` entries.
- Keeps app-server v2 compatible with legacy `fullAccess` read-access
payloads by accepting and ignoring that no-op shape, while rejecting
legacy `restricted` read-access payloads instead of silently widening
them to full-read legacy policies.
- Carries Windows sandbox platform-default read behavior with an
explicit override flag instead of depending on
`ReadOnlyAccess::Restricted`.
- Refreshes generated app-server schema/types and updates tests/docs for
the simplified legacy policy shape.
## Verification
- `cargo check -p codex-app-server-protocol --tests`
- `cargo check -p codex-windows-sandbox --tests`
- `cargo test -p codex-app-server-protocol sandbox_policy_`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19449
## Why
The profile conversion path still required a `cwd` even when it was only
translating a legacy `SandboxPolicy` into a `PermissionProfile`. That
made profile producers invent an ambient `cwd`, which is exactly the
anchoring we are trying to remove from permission-profile data. A legacy
workspace-write policy can be represented symbolically instead: `:cwd =
write` plus read-only `:project_roots` metadata subpaths.
This PR creates that cwd-free base so the rest of the stack can stop
threading cwd through profile construction. Callers that actually need a
concrete runtime filesystem policy for a specific cwd still have an
explicitly named cwd-bound conversion.
## What Changed
- `PermissionProfile::from_legacy_sandbox_policy` now takes only
`&SandboxPolicy`.
- `FileSystemSandboxPolicy::from_legacy_sandbox_policy` is now the
symbolic, cwd-free projection for profiles.
- The old concrete projection is retained as
`FileSystemSandboxPolicy::from_legacy_sandbox_policy_for_cwd` for
runtime/boundary code that must materialize legacy cwd behavior.
- Workspace-write profiles preserve `CurrentWorkingDirectory` and
`ProjectRoots` special entries instead of materializing cwd into
absolute paths.
## Verification
- `cargo check -p codex-protocol -p codex-core -p
codex-app-server-protocol -p codex-app-server -p codex-exec -p
codex-exec-server -p codex-tui -p codex-sandboxing -p
codex-linux-sandbox -p codex-analytics --tests`
- `just fix -p codex-protocol -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-exec -p codex-exec-server -p codex-tui -p
codex-sandboxing -p codex-linux-sandbox -p codex-analytics`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19414).
* #19395
* #19394
* #19393
* #19392
* #19391
* __->__ #19414
## Why
Agent interruptions currently always persist a model-visible
interrupted-turn marker before emitting `TurnAborted`. That marker is
useful by default because it gives the next model turn context about a
deliberately interrupted task, but some deployments need to suppress
that history injection entirely while still keeping the client-visible
interruption event.
## What changed
- Add `[agents] interrupt_message = false` to disable the model-visible
interrupted-turn marker.
- Resolve the setting into `Config::agent_interrupt_message_enabled`,
defaulting to `true` so existing behavior is unchanged.
- Apply the setting to both live interrupted turns and interrupted fork
snapshots.
- Keep emitting `TurnAborted` even when the history marker is disabled.
- Regenerate `core/config.schema.json` for the new
`agents.interrupt_message` field.
## Testing
- `cargo test -p codex-core load_config_resolves_agent_interrupt_message
-- --nocapture`
- `cargo test -p codex-core
disabled_interrupted_fork_snapshot_appends_only_interrupt_event --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_developer_input_message --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_can_disable_interrupted_marker --
--nocapture`
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo check -p codex-core`
## Summary
- Thread `agent_max_threads` into `ToolsConfig` and
`SpawnAgentToolOptions`.
- Render the configured `max_concurrent_threads_per_session` value in
the MultiAgentV2 `spawn_agent` description.
- Cover the description text in `codex-tools` unit tests and
`codex-core` tool spec tests.
## Validation
- `just fmt`
- `cargo test -p codex-tools`
- `cargo test -p codex-core spawn_agent_description`
- `git diff --check`
## Notes
- `cargo test -p codex-core` was also attempted, but unrelated
environment-sensitive tests failed with the active local environment.
Examples: approvals reviewer defaults observed `AutoReview` instead of
`User`, request-permissions event tests did not emit events, and
proxy-env tests saw `http://127.0.0.1:50604` from the active proxy
environment.
Co-authored-by: Codex <noreply@openai.com>
## Why
`MultiAgentV2` follow-up messages are delivered to agents as
assistant-authored `InterAgentCommunication` envelopes. When
`followup_task` used `interrupt: true`, the interrupted-turn guidance
was still persisted as a contextual user message, so model-visible
history made a system-generated interruption boundary look
user-authored.
This keeps interruption guidance consistent with the rest of the v2
inter-agent message stream while preserving the legacy marker shape for
non-v2 sessions.
## What changed
- Make `interrupted_turn_history_marker` feature-aware.
- Record the interrupted-turn marker as an assistant `OutputText`
message when `Feature::MultiAgentV2` is enabled.
- Keep the existing user contextual fragment for non-v2 sessions.
- Apply the same feature-aware marker to interrupted fork snapshots.
- Add coverage for the live `followup_task` interrupt path and the
helper-level v2 marker shape.
## Testing
- `cargo test -p codex-core
multi_agent_v2_followup_task_interrupts_busy_child_without_losing_message
-- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_interrupted_marker_uses_assistant_output_message --
--nocapture`
- `cargo test -p codex-core interrupted_fork_snapshot -- --nocapture`
we were not respecting turn's `truncation_policy` to clamp output tokens
for `unified_exec` and `write_stdin`.
this meant truncation was only being applied by `ContextManager` before
the output was stored in-memory (so it _was_ being truncated from
model-visible context), but the full output was persisted to rollout on
disk.
now we respect that `truncation_policy` and `ContextManager`-level
truncation remains a backup.
### Tests
added tests, tested locally.