Add thread/start environment_id plumbing and explicit local/remote selection support while preserving the current default environment behavior. Include focused exec-server and app-server coverage for the new thread environment selection paths.
Co-authored-by: Codex <noreply@openai.com>
# Why
Add product analytics for hook handler executions so we can understand
which hooks are running, where they came from, and whether they
completed, failed, stopped, or blocked work.
# What
- add the new `codex_hook_run` analytics event and payload plumbing in
`codex-rs/analytics`
- emit hook-run analytics from the shared hook completion path in
`codex-rs/core`
- classify hook source from the loaded hook path as `system`, `user`,
`project`, or `unknown`
```
{
"event_type": "codex_hook_run",
"event_params": {
"thread_id": "string",
"turn_id": "string",
"model_slug": "string",
"hook_name": "string, // any HookEventName
"hook_source": "system | user | project | unknown",
"status": "completed | failed | stopped | blocked"
}
}
```
---------
Co-authored-by: Codex <noreply@openai.com>
This was flagged by the Codex Security tool: the `state` lock was held
longer than necessary, which included being held across an `async` call,
increasing the potential for deadlock.
While this was flagged by the Codex Security tool, I will look into
enabling
https://rust-lang.github.io/rust-clippy/stable/index.html#await_holding_lock
in a follow-up PR (though unfortunately, that Clippy rule claims it
reports false positives when `drop()` is used to drop a guard instead of
using the end of block scope to drop). Though I can't seem to find a
Clippy rule that checks for opportunities to drop a guard as soon as it
is no longer referenced, in general.
**Summary**
- prevent managed requirements.toml network settings from leaking into
DangerFullAccess / yolo turns by gating managed proxy attachment on
sandbox mode
- keep guardian/sandboxed modes on the managed proxy path, while making
true yolo bypass the proxy entirely, including /shell full-access
commands
## Summary
- wrap realtime startup context in
`<startup_context>...</startup_context>` tags
- prefix V2 mirrored user text and relayed backend text with `[USER]` /
`[BACKEND]`
- remove the V2 progress suffix and replace the final V2 handoff output
with a short completion acknowledgement while preserving the existing V1
wrapper
## Testing
- cargo test -p codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
-- --exact
- cargo test -p codex-app-server webrtc_v2_background_agent_
- cargo test -p codex-app-server webrtc_v2_text_input_is_
- cargo test -p codex-core conversation_user_text_turn_is_
## Why
#17763 moved sandbox-state delivery for MCP tool calls to request
`_meta` via the `codex/sandbox-state-meta` experimental capability.
Keeping the older `codex/sandbox-state` capability meant Codex still
maintained a second transport that pushed updates with the custom
`codex/sandbox-state/update` request at server startup and when the
session sandbox policy changed.
That duplicate MCP path is redundant with the per-tool-call metadata
path and makes the sandbox-state contract larger than needed. The
existing managed network proxy refresh on sandbox-policy changes is
still needed, so this keeps that behavior separate from the removed MCP
notification.
## What Changed
- Removed the exported `MCP_SANDBOX_STATE_CAPABILITY` and
`MCP_SANDBOX_STATE_METHOD` constants.
- Removed detection of `codex/sandbox-state` during MCP initialization
and stopped sending `codex/sandbox-state/update` at server startup.
- Removed the `McpConnectionManager::notify_sandbox_state_change`
plumbing while preserving the managed network proxy refresh when a user
turn changes sandbox policy.
- Slimmed `McpConnectionManager::new` so startup paths pass only the
initial `SandboxPolicy` needed for MCP elicitation state.
- Kept `codex/sandbox-state-meta` support intact; servers that opt in
still receive the current `SandboxState` on tool-call request `_meta`
([remaining call
path](ff2d3c1e72/codex-rs/core/src/mcp_tool_call.rs (L487-L526))).
- Added regression coverage for refreshing the live managed network
proxy on a per-turn sandbox-policy change.
## Verification
- `cargo test -p codex-core
new_turn_refreshes_managed_network_proxy_for_sandbox_change`
- `cargo test -p codex-mcp`
Builds on top of #17659
Move the filesystem + sqlite thread listing-related operations inside of
a local ThreadStore implementation and call ThreadStore from the places
that used to perform these filesystem/sqlite operations.
This is the first of a series of PRs that will implement the rest of the
local ThreadStore.
Testing:
- added unit tests for the thread store implementation
- adjusted some unit tests in the realtime + personality packages whose
callsites changed. Specifically I'm trying to hide ThreadMetadata inside
of the local implementation and make ThreadMetadata a sqlite
implementation detail concern rather than a public interface, preferring
the more generate StoredThread interface instead
- added a corner case test for the personality migration package that
wasn't covered by the existing test suite
- adjust the behavior of searched thread listing to run the existing
local rollout repair/backfill pass _before_ querying SQLite results, so
callers using ThreadStore::list_threads do not miss matches after a
partial metadata warm-up
## Summary
Stack PR 2 of 4 for feature-gated agent identity support.
This PR adds agent identity registration behind
`features.use_agent_identity`. It keeps the app-server protocol
unchanged and starts registration after ChatGPT auth exists rather than
requiring a client restart.
## Stack
- PR1: https://github.com/openai/codex/pull/17385 - add
`features.use_agent_identity`
- PR2: https://github.com/openai/codex/pull/17386 - this PR
- PR3: https://github.com/openai/codex/pull/17387 - register agent tasks
when enabled
- PR4: https://github.com/openai/codex/pull/17388 - use `AgentAssertion`
downstream when enabled
## Validation
Covered as part of the local stack validation pass:
- `just fmt`
- `cargo test -p codex-core --lib agent_identity`
- `cargo test -p codex-core --lib agent_assertion`
- `cargo test -p codex-core --lib websocket_agent_task`
- `cargo test -p codex-api api_bridge`
- `cargo build -p codex-cli --bin codex`
## Notes
The full local app-server E2E path is still being debugged after PR
creation. The current branch stack is directionally ready for review
while that follow-up continues.
stacked on #17402.
MCP tools returned by `tool_search` (deferred tools) get registered in
our `ToolRegistry` with a different format than directly available
tools. this leads to two different ways of accessing MCP tools from our
tool catalog, only one of which works for each. fix this by registering
all MCP tools with the namespace format, since this info is already
available.
also, direct MCP tools are registered to responsesapi without a
namespace, while deferred MCP tools have a namespace. this means we can
receive MCP `FunctionCall`s in both formats from namespaces. fix this by
always registering MCP tools with namespace, regardless of deferral
status.
make code mode track `ToolName` provenance of tools so it can map the
literal JS function name string to the correct `ToolName` for
invocation, rather than supporting both in core.
this lets us unify to a single canonical `ToolName` representation for
each MCP tool and force everywhere to use that one, without supporting
fallbacks.
## Summary
Adds `thread_source` field to the existing Codex turn metadata sent to
Responses API
- Sends `thread_source: "user"` for user-initiated sessions: CLI, VS
Code, and Exec
- Sends `thread_source: "subagent"` for subagent sessions
- Omits `thread_source` for MCP, custom, and unknown session sources
- Uses the existing turn metadata transport:
- HTTP requests send through the `x-codex-turn-metadata` header
- WebSocket `response.create` requests send through
`client_metadata["x-codex-turn-metadata"]`
## Testing
- `cargo test -p codex-protocol
session_source_thread_source_name_classifies_user_and_subagent_sources`
- `cargo test -p codex-core turn_metadata_state`
- `cargo test -p codex-core --test responses_headers
responses_stream_includes_turn_metadata_header_for_git_workspace_e2e --
--nocapture`
This changes multi-agent v2 mailbox handling so incoming inter-agent
messages no longer preempt an in-flight sampling stream at reasoning or
commentary output-item boundaries.
## Why
For more advanced MCP usage, we want the model to be able to emit
parallel MCP tool calls and have Codex execute eligible ones
concurrently, instead of forcing all MCP calls through the serial block.
The main design choice was where to thread the config. I made this
server-level because parallel safety depends on the MCP server
implementation. Codex reads the flag from `mcp_servers`, threads the
opted-in server names into `ToolRouter`, and checks the parsed
`ToolPayload::Mcp { server, .. }` at execution time. That avoids relying
on model-visible tool names, which can be incomplete in
deferred/search-tool paths or ambiguous for similarly named
servers/tools.
## What was added
Added `supports_parallel_tool_calls` for MCP servers.
Before:
```toml
[mcp_servers.docs]
command = "docs-server"
```
After:
```toml
[mcp_servers.docs]
command = "docs-server"
supports_parallel_tool_calls = true
```
MCP calls remain serial by default. Only tools from opted-in servers are
eligible to run in parallel. Docs also now warn to enable this only when
the server’s tools are safe to run concurrently, especially around
shared state or read/write races.
## Testing
Tested with a local stdio MCP server exposing real delay tools. The
model/Responses side was mocked only to deterministically emit two MCP
calls in the same turn.
Each test called `query_with_delay` and `query_with_delay_2` with `{
"seconds": 25 }`.
| Build/config | Observed | Wall time |
| --- | --- | --- |
| main with flag enabled | serial | `58.79s` |
| PR with flag enabled | parallel | `31.73s` |
| PR without flag | serial | `56.70s` |
PR with flag enabled showed both tools start before either completed;
main and PR-without-flag completed the first delay before starting the
second.
Also added an integration test.
Additional checks:
- `cargo test -p codex-tools` passed
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server` passed
- `git diff --check` passed
Cap mirrored user text sent to realtime with the existing 300-token turn
budget while preserving the full model turn.
Adds integration coverage for capped realtime mirror payloads.
---------
Co-authored-by: Codex <noreply@openai.com>
Addresses #16255
Problem: Incomplete Responses streams could leave completed custom tool
outputs out of cleanup and retry prompts, making persisted history
inconsistent and retries stale.
Solution: Route stream and output-item errors through shared cleanup,
and rebuild retry prompts from fresh session history after the first
attempt.
- Let typed user messages submit while realtime is active and mirror
accepted text into the realtime text stream.
- Add integration coverage and snapshot for outbound realtime text.
## Description
This PR introduces `review_id` as the stable identifier for guardian
reviews and exposes it in app-server `item/autoApprovalReview/started`
and `item/autoApprovalReview/completed` events.
Internally, guardian rejection state is now keyed by `review_id` instead
of the reviewed tool item ID. `target_item_id` is still included when a
review maps to a concrete thread item, but it is no longer overloaded as
the review lifecycle identifier.
## Motivation
We'd like to give users the ability to preempt a guardian review while
it's running (approve or decline).
However, we can't implement the API that allows the user to override a
running guardian review because we didn't have a unique `review_id` per
guardian review. Using `target_item_id` is not correct since:
- with execve reviews, there can be multiple execve calls (and therefore
guardian reviews) per shell command
- with network policy reviews, there is no target item ID
The PR that actually implements user overrides will use `review_id` as
the stable identifier.
## Motivation
The `SessionStart` hook already receives `startup` and `resume` sources,
but sessions created from `/clear` previously looked like normal startup
sessions. This makes it impossible for hook authors to distinguish
between these with the matcher.
## Summary
- Add `InitialHistory::Cleared` so `/clear`-created sessions can be
distinguished from ordinary startup sessions.
- Add `SessionStartSource::Clear` and wire it through core, app-server
thread start params, and TUI clear-session flow.
- Update app-server protocol schemas, generated TypeScript, docs, and
related tests.
https://github.com/user-attachments/assets/9cae3cb4-41c7-4d06-b34f-966252442e5c
The rollout writer now keeps an owned/monitored task handle, returns
real Result acks for flush/persist/shutdown, retries failed flushes by
reopening the rollout file, and keeps buffered items until they are
successfully written. Session flushes are now real durability barriers
for fork/rollback/read-after-write paths, while turn completion surfaces
a warning if the rollout still cannot be saved after recovery.
## Summary
This PR adds the parent conversation/session id to the subagent-start
analytics event for Guardian subagents.
Previously, Guardian sessions were emitted as subagent
thread-initialized events, but their `parent_thread_id` was serialized
as `null`. After this change, the `codex_thread_initialized` analytics
event for a Guardian child session includes the parent user conversation
id.
## Summary
- Replace the manual `/notify-owner` flow with an inline confirmation
prompt when a usage-based workspace member hits a credits-depleted
limit.
- Fetch the current workspace role from the live ChatGPT
`accounts/check/v4-2023-04-27` endpoint so owner/member behavior matches
the desktop and web clients.
- Keep owner, member, and spend-cap messaging distinct so we only offer
the owner nudge when the workspace is actually out of credits.
## What Changed
- `backend-client`
- Added a typed fetch for the current account role from
`accounts/check`.
- Mapped backend role values into a Rust workspace-role enum.
- `app-server` and protocol
- Added `workspaceRole` to `account/read` and `account/updated`.
- Derived `isWorkspaceOwner` from the live role, with a fallback to the
cached token claim when the role fetch is unavailable.
- `tui`
- Removed the explicit `/notify-owner` slash command.
- When a member is blocked because the workspace is out of credits, the
error now prompts:
- `Your workspace is out of credits. Request more from your workspace
owner? [y/N]`
- Choosing `y` sends the existing owner-notification request.
- Choosing `n`, pressing `Esc`, or accepting the default selection
dismisses the prompt without sending anything.
- Selection popups now honor explicit item shortcuts, which is how the
`y` / `n` interaction is wired.
## Reviewer Notes
- The main behavior change is scoped to usage-based workspace members
whose workspace credits are depleted.
- Spend-cap reached should not show the owner-notification prompt.
- Owners and admins should continue to see `/usage` guidance instead of
the member prompt.
- The live role fetch is best-effort; if it fails, we fall back to the
existing token-derived ownership signal.
## Testing
- Manual verification
- Workspace owner does not see the member prompt.
- Workspace member with depleted credits sees the confirmation prompt
and can send the nudge with `y`.
- Workspace member with spend cap reached does not see the
owner-notification prompt.
### Workspace member out of usage
https://github.com/user-attachments/assets/341ac396-eff4-4a7f-bf0c-60660becbea1
### Workspace owner
<img width="1728" height="1086" alt="Screenshot 2026-04-09 at 11 48
22 AM"
src="https://github.com/user-attachments/assets/06262a45-e3fc-4cc4-8326-1cbedad46ed6"
/>
## Summary
- omit serialized Responses instructions when an app-server base
instruction override is empty
- skip empty developer instruction messages and add v2 coverage for the
empty-override request shape
## Validation
- just fmt
- git diff --check
- [x] Expand tool search to custom MCPs.
- [x] Rename several variables/fields to be more generic.
Updated tool & server name lifecycles:
**Raw Identity**
ToolInfo.server_name is raw MCP server name.
ToolInfo.tool.name is raw MCP tool name.
MCP calls route back to raw via parse_tool_name() returning
(tool.server_name, tool.tool.name).
mcpServerStatus/list now groups by raw server and keys tools by
Tool.name: mod.rs:599
App-server just forwards that grouped raw snapshot:
codex_message_processor.rs:5245
**Callable Names**
On list-tools, we create provisional callable_namespace / callable_name:
mcp_connection_manager.rs:1556
For non-app MCP, provisional callable name starts as raw tool name.
For codex-apps, provisional callable name is sanitized and strips
connector name/id prefix; namespace includes connector name.
Then qualify_tools() sanitizes callable namespace + name to ASCII alnum
/ _ only: mcp_tool_names.rs:128
Note: this is stricter than Responses API. Hyphen is currently replaced
with _ for code-mode compatibility.
**Collision Handling**
We do initially collapse example-server and example_server to the same
base.
Then qualify_tools() detects distinct raw namespace identities behind
the same sanitized namespace and appends a hash to the callable
namespace: mcp_tool_names.rs:137
Same idea for tool-name collisions: hash suffix goes on callable tool
name.
Final list_all_tools() map key is callable_namespace + callable_name:
mcp_connection_manager.rs:769
**Direct Model Tools**
Direct MCP tool declarations use the full qualified sanitized key as the
Responses function name.
The raw rmcp Tool is converted but renamed for model exposure.
**Tool Search / Deferred**
Tool search result namespace = final ToolInfo.callable_namespace:
tool_search.rs:85
Tool search result nested name = final ToolInfo.callable_name:
tool_search.rs:86
Deferred tool handler is registered as "{namespace}:{name}":
tool_registry_plan.rs:248
When a function call comes back, core recombines namespace + name, looks
up the full qualified key, and gets the raw server/tool for MCP
execution: codex.rs:4353
**Separate Legacy Snapshot**
collect_mcp_snapshot_from_manager_with_detail() still returns a map
keyed by qualified callable name.
mcpServerStatus/list no longer uses that; it uses
McpServerStatusSnapshot, which is raw-inventory shaped.
## Summary
App-server v2 already receives turn-scoped `clientMetadata`, but the
Rust app-server was dropping it before the outbound Responses request.
This change keeps the fix lightweight by threading that metadata through
the existing turn-metadata path rather than inventing a new transport.
## What we're trying to do and why
We want turn-scoped metadata from the app-server protocol layer,
especially fields like Hermes/GAAS run IDs, to survive all the way to
the actual Responses API request so it is visible in downstream
websocket request logging and analytics.
The specific bug was:
- app-server protocol uses camelCase `clientMetadata`
- Responses transport already has an existing turn metadata carrier:
`x-codex-turn-metadata`
- websocket transport already rewrites that header into
`request.request_body.client_metadata["x-codex-turn-metadata"]`
- but the Rust app-server never parsed or stored `clientMetadata`, so
nothing from the app-server request was making it into that existing
path
This PR fixes that without adding a new header or a second metadata
channel.
## How we did it
### Protocol surface
- Add optional `clientMetadata` to v2 `TurnStartParams` and
`TurnSteerParams`
- Regenerate the JSON schema / TypeScript fixtures
- Update app-server docs to describe the field and its behavior
### Runtime plumbing
- Add a dedicated core op for app-server user input carrying turn-scoped
metadata: `Op::UserInputWithClientMetadata`
- Wire `turn/start` and `turn/steer` through that op / signature path
instead of dropping the metadata at the message-processor boundary
- Store the metadata in `TurnMetadataState`
### Transport behavior
- Reuse the existing serialized `x-codex-turn-metadata` payload
- Merge the new app-server `clientMetadata` into that JSON additively
- Do **not** replace built-in reserved fields already present in the
turn metadata payload
- Keep websocket behavior unchanged at the outer shape level: it still
sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
string now contains the merged fields
- Keep HTTP fallback behavior unchanged except that the existing
`x-codex-turn-metadata` header now includes the merged fields too
### Request shape before / after
Before, a websocket `response.create` looked like:
```json
{
"type": "response.create",
"client_metadata": {
"x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
}
}
```
Even if the app-server caller supplied `clientMetadata`, it was not
represented there.
After, the same request shape is preserved, but the serialized payload
now includes the new turn-scoped fields:
```json
{
"type": "response.create",
"client_metadata": {
"x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
}
}
```
## Validation
### Targeted tests added / updated
- protocol round-trip coverage for `clientMetadata` on `turn/start` and
`turn/steer`
- protocol round-trip coverage for `Op::UserInputWithClientMetadata`
- `TurnMetadataState` merge test proving client metadata is added
without overwriting reserved built-in fields
- websocket request-shape test proving outbound `response.create`
contains merged metadata inside
`client_metadata["x-codex-turn-metadata"]`
- app-server integration tests proving:
- `turn/start` forwards `clientMetadata` into the outbound Responses
request path
- websocket warmup + real turn request both behave correctly
- `turn/steer` updates the follow-up request metadata
### Commands run
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
--lib`
- `cargo test -p codex-core --test all
responses_websocket_preserves_custom_turn_metadata_fields`
- `cargo test -p codex-app-server --test all client_metadata`
- `cargo test -p codex-app-server --test all
turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
-- --nocapture`
- `just fmt`
- `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
-p codex-app-server`
- `just fix -p codex-exec -p codex-tui-app-server`
- `just argument-comment-lint`
### Full suite note
`cargo test` in `codex-rs` still fails in:
-
`suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`
I verified that same failure on a clean detached `HEAD` worktree with an
isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
## Summary
- keep pending steered input buffered until the active user prompt has
received a model response
- keep steering pending across auto-compact when there is real
model/tool continuation to resume
- allow queued steering to follow compaction immediately when the prior
model response was already final
- keep pending-input follow-up owned by `run_turn` instead of folding it
into `SamplingRequestResult`
- add regression coverage for mid-turn compaction, final-response
compaction, and compaction triggered before the next request after tool
output
## Root Cause
Steered input was drained at the top of every `run_turn` loop. After
auto-compaction, the loop continued and immediately appended any pending
steer after the compact summary, making a queued prompt look like the
newest task instead of letting the model first resume interrupted
model/tool work.
## Implementation Notes
This patch keeps the follow-up signals separated:
- `SamplingRequestResult.needs_follow_up` means model/tool continuation
is needed
- `sess.has_pending_input().await` means queued user steering exists
- `run_turn` computes the combined loop condition from those two signals
In `run_turn`:
```rust
let has_pending_input = sess.has_pending_input().await;
let needs_follow_up = model_needs_follow_up || has_pending_input;
```
After auto-compact we choose whether the next request may drain
steering:
```rust
can_drain_pending_input = !model_needs_follow_up;
```
That means:
- model/tool continuation + pending steer: compact -> resume once
without draining steer
- completed model answer + pending steer: compact -> drain/send the
steer immediately
- fresh user prompt: do not drain steering before the model has answered
the prompt once
The drain is still only `sess.get_pending_input().await`; when
`can_drain_pending_input` is false, core uses an empty local vec and
leaves the steer pending in session state.
## Validation
- PASS `cargo test -p codex-core --test all steered_user_input --
--nocapture`
- PASS `just fmt`
- PASS `git diff --check`
- NOT PASSING HERE `just fix -p codex-core` currently stops before
linting this change on an unrelated mainline test-build error:
`core/src/tools/spec_tests.rs` initializes `ToolsConfigParams` without
`image_generation_tool_auth_allowed`; this PR does not touch that file.
Addresses #15943
Problem: Name-based resume could stop on a newer session_index entry
whose rollout was never persisted, shadowing an older saved thread with
the same name.
Solution: Materialize rollouts before indexing thread names and make
name lookup skip unresolved entries until it finds a persisted rollout.
Currently, when a MCP server sends an elicitation to Codex running in
Full Access (`sandbox_policy: DangerFullAccess` + `approval_policy:
Never`), the elicitations are auto-cancelled.
This PR updates the automatic handling of MCP elicitations to be
consistent with other approvals in full-access, where they are
auto-approved. Because MCP elicitations may actually require user input,
this mechanism is limited to empty form elicitations.
## Changeset
- Add policy helper shared with existing MCP tool call approval
auto-approve
- Update `ElicitationRequestManager` to auto-approve elicitations in
full access when `can_auto_accept_elicitation` is true.
- Add tests
Co-authored-by: Codex <noreply@openai.com>
**Disabling Image-Gen for Non-SIWC Codex Users**
We are only enabling image-gen feature for SIWC Codex users until there
comes a fix in ResponsesAPI to omit output from responses.completed, to
prevent the following issues:
1. websocket blows up due to heavier load (images) than before (text)
2. http parser streams through n^2 of n-base64 bytes (sum of base64s of
all images generated in turn) that causes long delays in
turn_completion.
## Summary
- Carry `AbsolutePathBuf` through tool cwd parsing/resolution instead of
resolving workdirs to raw `PathBuf`s.
- Type exec/sandbox request cwd fields as `AbsolutePathBuf` through
`ExecParams`, `ExecRequest`, `SandboxCommand`, and unified exec runtime
requests.
- Keep `PathBuf` conversions at external/event boundaries and update
existing tests/fixtures for the typed cwd.
## Validation
- `cargo check -p codex-core --tests`
- `cargo check -p codex-sandboxing --tests`
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-core --lib tools::handlers::`
- `just fix -p codex-sandboxing`
- `just fix -p codex-core`
- `just fmt`
Full `codex-core` test suite was not run locally; per repo guidance I
kept local validation targeted.
Allow multi_agent_v2 features to have its own temporary configuration
under `[features.multi_agent_v2]`
```
[features.multi_agent_v2]
enabled = true
usage_hint_enabled = false
usage_hint_text = "Custom delegation guidance."
hide_spawn_agent_metadata = true
```
Absent `usage_hint_text` means use the default hint.
```
[features]
multi_agent_v2 = true
```
still works as the boolean shorthand.