Commit Graph

5767 Commits

Author SHA1 Message Date
pakrym-oai
17d552fb4d [codex] Remove external websocket session resets (#23384)
## Why

Compaction now installs replacement history inside the session, but the
turn and compaction callers were still reaching into
`ModelClientSession` to reset websocket transport state after that
install. That made a transport-level reset part of the compaction API
even though websocket incremental request selection already checks
whether the next request is a strict extension of the previous one and
falls back to a full `response.create` when it is not.

## What changed

- Removed the compaction-side calls to `reset_websocket_session` from
`compact.rs` and `session/turn.rs`.
- Simplified pre-sampling and mid-turn compaction helpers so they return
`CodexResult<()>` instead of carrying a reset flag.
- Made `ModelClientSession::reset_websocket_session` private to
`client.rs`, leaving only the websocket timeout recovery path inside the
client as a caller.

## Validation

- `cargo test -p codex-core --test all
responses_websocket_creates_on_non_prefix`
- `cargo test -p codex-core --test all
steered_user_input_waits_for_model_continuation_after_mid_turn_compact`
- `cargo test -p codex-core --test all
pre_sampling_compact_runs_on_switch_to_smaller_context_model`
2026-05-19 01:13:38 +00:00
Michael Bolin
3fd79b7986 app-server: use profile ids in v2 permission params (#23360)
## Why

The v2 app-server permission profile fields are experimental, but the
previous migration kept a legacy object payload for profile selection.
That made clients aware of server-owned `activePermissionProfile`
metadata such as `extends`, and it kept a
`legacy_additional_writable_roots` path even though
`runtimeWorkspaceRoots` now owns runtime workspace-root selection.

This PR makes the client contract match the intended model: clients
select a permission profile by id, and the server resolves and reports
active profile provenance in response payloads.

Follow-up to #22611.

## What Changed

- Changed `thread/start`, `thread/resume`, `thread/fork`, and
`turn/start` permission profile selection to plain profile id strings.
- Changed `command/exec.permissionProfile` to a plain profile id string
for the same client/server ownership split.
- Removed `PermissionProfileSelectionParams` and the legacy `{ type:
"profile", modifications: [...] }` compatibility deserializer.
- Updated app-server, TUI, and `codex exec` call sites to send only ids,
while keeping `activePermissionProfile` as server response metadata.
- Updated app-server docs and schema fixtures for the revised
`command/exec.permissionProfile` shape.

## Verification

- `cargo test -p codex-app-server-protocol`
- `RUST_MIN_STACK=8388608 cargo test -p codex-app-server`
- `cargo test -p codex-exec`
- `RUST_MIN_STACK=8388608 cargo test -p codex-tui`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23360).
* #23368
* __->__ #23360
2026-05-18 17:28:50 -07:00
marksteinbrick-oai
5696167fe8 [codex-analytics] preserve user thread source for exec threads (#23376)
## Why
- Follows #20949.
- The above moved `thread_source` attribution from the reducer to
explicit caller provided metadata
- The `codex exec` path still omitted this metadata, leaving
exec-created threads without `thread_source`


## What Changed
- Ensures exec threads are marked as user created (`thread_source =
"user"`)
- Preserves thread-source metadata in exec’s startup session event


## Verification
- Updated unit tests to validate exec `thread_source` propagation.
- `cargo +1.93.0 test -p codex-exec --manifest-path codex-rs/Cargo.toml`
- `cargo +1.93.1 build -p codex-cli --manifest-path codex-rs/Cargo.toml`
- Validated locally with a freshly built `codex exec` run:
  - Startup logs showed `thread_source: Some(User)`.
  - Rollout metadata recorded `"thread_source":"user"`.
2026-05-18 17:13:49 -07:00
Felipe Coury
a66712c95d fix(tui): warn on unsupported iTerm2 pet versions (#23371)
## Why

Older iTerm2 builds can be detected as supporting the image transport
that terminal pets use, but in practice they fail to render the pet flow
correctly. Instead of silently attempting image rendering, Codex should
tell the user that their iTerm2 version is too old and that upgrading is
the fix.

## What Changed

- gate iTerm2 pet auto-detection on version `3.6.0` or newer
- show a dedicated upgrade message for older or unknown iTerm2 versions
instead of the generic unsupported-terminal warning
- keep the existing generic unsupported-terminal path for non-iTerm
terminals
- add regression coverage for iTerm2 version parsing and the old-iTerm
warning path

## How to Test

1. Start Codex in iTerm2 3.6 or newer.
2. Run `/pets`.
3. Confirm the pets picker opens instead of showing a warning.
4. Start Codex in an older iTerm2 build, or exercise the equivalent test
path.
5. Run `/pets`.
6. Confirm Codex warns that pets require iTerm2 3.6 or newer and tells
the user to upgrade.
7. Also verify that a non-iTerm unsupported terminal still shows the
generic unsupported-terminal message.

Targeted tests:
- `cargo test -p codex-terminal-detection`
- `cargo test -p codex-tui pets::`
- `cargo test -p codex-tui slash_pets_on_unsupported_terminal`
- `cargo test -p codex-tui slash_pets_on_old_iterm2`
2026-05-18 20:24:09 -03:00
pakrym-oai
afa0101ae2 [codex] Move pending input into input queue (#22728)
## Why

Pending model input was split across `Session`, `TurnState`, and the
agent mailbox. That made it easy for new paths to manage queued user
input or mailbox delivery outside the intended ownership boundary.

This PR consolidates the model-facing input lifecycle behind the session
input queue so turn-local pending input, next-turn queued items, and
mailbox delivery coordination are owned in one place.

## What Changed

- Added `session/input_queue.rs` to own pending input queues and mailbox
delivery coordination.
- Removed the standalone `agent/mailbox.rs` channel wrapper and store
mailbox items directly in the input queue.
- Moved pending-input mutations off `TurnState`; `TurnState` now exposes
the queue-owned storage directly for now.
- Routed abort cleanup, mailbox delivery phase changes, next-turn queued
items, and active-turn pending input through `InputQueue`.
- Boxed stack-heavy agent resume/fork startup futures that the refactor
pushed over the default test stack.
- Updated session, task, goal, stream-event, and multi-agent call sites
and tests to use the new queue ownership.

## Verification

- `cargo test -p codex-core --lib agent::control::tests`
- `cargo test -p codex-core --lib
agent::control::tests::resume_closed_child_reopens_open_descendants --
--exact`
- `cargo test -p codex-core --lib
agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
-- --exact`
- `cargo test -p codex-core --lib
agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role
-- --exact`
- `cargo test -p codex-core` was also run; it completed with 1814
passed, 4 ignored, and one timeout in
`agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role`,
which passed when rerun in isolation.
2026-05-18 15:43:01 -07:00
Matthew Zeng
a66e0e9c4b Include plugin id in plugin MCP tool metadata (#23353)
Adding the id of the plugin that contains the MCP (if any) so we can
apply filters at plugin level.

## Summary
- carry the plugin owner into MCP runtime provenance
- attach `plugin_id` to outbound plugin-backed MCP tool-call `_meta`
- avoid misattributing user-configured MCP servers that shadow plugin
server names

## Testing
- `just fmt`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
- `cargo test -p codex-mcp`
- `cargo test -p codex-core
plugin_mcp_tool_call_request_meta_includes_plugin_id`
- `cargo test -p codex-core
to_mcp_config_omits_plugin_id_when_user_server_shadows_plugin_mcp`
- `cargo test -p codex-core
rebuild_preserving_session_layers_refreshes_plugin_derived_mcp_config`
- `git diff --check`

## Notes
- Attempted `cargo test -p codex-core`; it aborted in
`agent::control::tests::resume_agent_from_rollout_skips_descendants_when_parent_resume_fails`
with a stack overflow before the full suite completed.
2026-05-18 15:33:33 -07:00
pakrym-oai
f2368b7de6 [codex] Trim unused TurnContextItem fields (#22709)
## Why

`TurnContextItem` is the durable baseline used to reconstruct context
diffs across resume/fork. Most of the old persisted-only fields on it
are no longer read, so keeping them in rollout snapshots adds schema
surface and state that can drift without affecting reconstruction.

`summary` is the exception: older Codex versions require it to
deserialize `turn_context` records, so keep writing a default
compatibility value until that schema surface can be removed safely.

## What changed

- Removed the unused persisted fields from `TurnContextItem`: trace ids,
user/developer instructions, output schema, and truncation policy.
- Kept `summary` with a compatibility comment and made
`TurnContext::to_turn_context_item` write `ReasoningSummary::Auto`
instead of live turn state.
- Updated rollout/context reconstruction fixtures for the retained
summary field.

## Verification

- `cargo test -p codex-protocol --lib turn_context_item`
- `cargo test -p codex-rollout
resume_candidate_matches_cwd_reads_latest_turn_context`
- `cargo test -p codex-state turn_context`
- `cargo test -p codex-core --lib
new_default_turn_captures_current_span_trace_id`
- `cargo test -p codex-core --lib
record_initial_history_resumed_turn_context_after_compaction_reestablishes_reference_context_item`
- `cargo test -p codex-core --test all
emits_warning_when_resumed_model_differs`
- `git diff --check`
2026-05-18 21:54:36 +00:00
Owen Lin
1752f374a8 Improve codex remote-control CLI UX (#22878)
## Description

This PR makes `codex remote-control` behave like a foreground CLI
command by default. Running it now starts remote control, waits for
readiness, prints a clear status message with the machine name, and
stays alive until Ctrl-C.

Users who want daemon behavior can use `codex remote-control start`, and
`codex remote-control stop` now prints concise human-readable output.
`--json` remains available for scripts.

Implementation-wise, this now verifies the real app-server state instead
of just assuming startup worked. The CLI starts or connects to
app-server, probes its control socket, calls the `remoteControl/enable`
API, and waits for the remote-control status response/notification
before printing success.

For daemon mode, `codex remote-control start` also reports which managed
app-server binary was used, including its path and best-effort `codex
--version`, so failures are easier to diagnose.

## Examples

Example output:
```
> codex remote-control
Starting app-server with remote control enabled...
This machine is available for remote control as com-97826.
Press Ctrl-C to stop.
```

Error case using daemon (currently expected based on our publicly
released CLI version):
```
> ./target/debug/codex remote-control start
Starting app-server daemon with remote control enabled...
Error: app server did not become ready on /Users/owen/.codex/app-server-control/app-server-control.sock

Daemon used app-server:
  path: /Users/owen/.codex/packages/standalone/current/codex
  version: 0.130.0

Managed app-server stderr (/Users/owen/.codex/app-server-daemon/app-server.stderr.log):
  error: unexpected argument '--remote-control' found
  
  Usage: codex app-server [OPTIONS] [COMMAND]
  
  For more information, try '--help'.

Caused by:
    0: failed to connect to /Users/owen/.codex/app-server-control/app-server-control.sock
    1: No such file or directory (os error 2)
```

## What changed

- `codex remote-control` now runs remote control in the foreground and
prints a Ctrl-C stop hint.
- `codex remote-control start` starts the daemon and waits for remote
control readiness before reporting success.
- `codex remote-control stop` reports stopped/not-running status in
plain language.
- Startup failures now include recent managed app-server stderr to make
daemon issues easier to diagnose.
- Added coverage for CLI output, readiness waiting, foreground shutdown,
and stderr log tailing.
2026-05-18 13:39:02 -07:00
starr-openai
732b12b1ef Reduce rust-ci-full Windows nextest timeout flakes (#23253)
## Why
Recent `rust-ci-full` failures were dominated by transient Windows
timeout clusters in process-heavy tests such as `suite::resume`,
`suite::cli_stream`, `suite::auth_env`,
`start_thread_uses_all_default_environments_from_codex_home`, and
`connect_stdio_command_initializes_json_rpc_client_on_windows`.

The goal here is to make those known flaky paths less likely to fail
full CI without relaxing the global nextest timeout policy.

## What changed
- Enable one global nextest retry with `retries = 1` so a single
transient failure can recover.
- Add a `windows_process_heavy` test group with `max-threads = 2` for
the recurring Windows subprocess/session-heavy timeout families.
- Add Windows-only slow-timeout overrides for that process-heavy group.
- Add a narrower Windows-only timeout override for
`start_thread_uses_all_default_environments_from_codex_home`, which
still exceeded the broader Windows bucket in both Windows full-CI lanes.
- Increase the `rust-ci-full` nextest job timeout from `45m` to `60m` so
Windows ARM64 still has job-level headroom after retries and targeted
per-test timeout increases.
- Keep the global `slow-timeout` unchanged at `15s`.

## Validation
Validated through `rust-ci-full` GitHub Actions reruns on this PR.

Observed improvement on the tuned Windows lanes:
- Windows x64 went from `5 timed out` to `0 timed out`.
- Windows ARM64 went from `2 timed out` to `0 timed out`.
- `start_thread_uses_all_default_environments_from_codex_home` recovered
as a flaky pass on Windows ARM64 instead of timing out.

The remaining failing tests in those runs were unrelated hard failures
outside this nextest timeout tuning.
2026-05-18 13:06:39 -07:00
jif-oai
c69cde3547 Add tool lifecycle extension contributor (#23309)
## Why

Extensions that need to track runtime progress currently have no typed
host signal for tool execution. The goal extension in particular needs
to observe tool attempts without inspecting tool payloads, owning tool
implementations, or staying coupled to core-only runtime plumbing.

This adds a narrow lifecycle contributor API for host-owned tool
execution: extensions can observe when an accepted tool call starts and
how it finishes, while policy hooks and tool handlers continue to own
payload rewriting, blocking, and execution.

Relevant code:

-
[`ToolLifecycleContributor`](3ad2850ffc/codex-rs/ext/extension-api/src/contributors.rs (L119))
defines the extension-facing observer contract.
-
[`tool_lifecycle.rs`](3ad2850ffc/codex-rs/ext/extension-api/src/contributors/tool_lifecycle.rs)
defines the typed start/finish inputs, source, and outcome enums.
- [`notify_tool_start` /
`notify_tool_finish`](3ad2850ffc/codex-rs/core/src/tools/lifecycle.rs)
bridges core tool dispatch into the extension registry.

## What Changed

- Added `ToolLifecycleContributor` to `codex-extension-api`, including:
  - `ToolStartInput`
  - `ToolFinishInput`
  - `ToolCallSource`
  - `ToolCallOutcome`
- Added registration and lookup support on `ExtensionRegistryBuilder` /
`ExtensionRegistry`.
- Wired core tool dispatch to notify lifecycle contributors for:
  - accepted tool starts
  - completed tool calls, including the tool output success marker
  - pre-tool-use blocks
  - failures before or after the handler runs
  - cancellation/abort in the parallel tool path
- Registered the goal extension as a lifecycle contributor and added the
outcome filter it will use for goal progress accounting.

## Test Coverage

- Added `dispatch_notifies_tool_lifecycle_contributors` to cover
lifecycle notification ordering and outcomes for successful and
handler-failed tool calls.
2026-05-18 21:55:57 +02:00
Celia Chen
4dbca61e20 fix: default unknown tool schemas to empty schemas (#22380)
## Why

Some tool providers, especially MCP servers and dynamic tool sources,
can supply schema nodes that omit `type` and have no recognized JSON
Schema shape hints. Previously, `sanitize_json_schema` filled those
unknown nodes in as `string`, which made the schema parseable but
invented a scalar constraint that the provider did not specify. For
description-only fields, that could incorrectly steer tool arguments
away from the provider's actual accepted shape.

The Responses API accepts permissive empty schemas such as `{}` at
nested property positions, so Codex should preserve that permissive
meaning instead of coercing unknown schema nodes into a misleading
scalar type.

## What Changed

- Changed the no-hints fallback in `codex-rs/tools/src/json_schema.rs`
to clear unrecognized object schema nodes to `{}`.
- Empty schemas now remain `{}` rather than becoming `type: "string"`.
- Description-only or otherwise metadata-only nested property schemas
now become `{}` while surrounding object/array/string/number inference
still applies when recognized hints are present.
- Updated `codex-tools` and `codex-core` tests to cover top-level empty
schemas, nested empty schemas, metadata-only malformed schemas, dynamic
tools, and MCP tool specs.

## Verification

- `cargo test -p codex-tools`
- `cargo test -p codex-core
test_mcp_tool_property_missing_type_defaults_to_empty_schema`
- Manually verified the real Responses API behavior for both
empty-schema positions:
- Top-level function `parameters: {}` is accepted and echoed back as
`{"type":"object","properties":{}}`; when forced to call the tool,
Responses emitted empty object arguments: `"arguments": "{}"`.
- Nested property schema `{}` is accepted and preserved as `{}`; when
forced to call a tool with `metadata.extra`, Responses emitted
`"arguments": "{\"metadata\":{\"extra\":\"codex schema sanitizer
behavior\"}}"`.
2026-05-18 12:41:10 -07:00
starr-openai
10f7dc6eb5 codex: route global AGENTS reads through LOCAL_FS (#23343)
## Summary
- make `load_global_instructions` read through an `ExecutorFileSystem`
- call global AGENTS reads with explicit `LOCAL_FS` so they stay tied to
local codex-home state

## Validation
- `bazel test --bes_backend= --bes_results_url=
--test_filter=instruction_sources_include_global_before_agents_md_docs
//codex-rs/core:core-unit-tests` on `dev`
2026-05-18 19:26:10 +00:00
Owen Lin
139365a4bb feat(app-server): add optional thread_id to experimentalFeature/list (#23335)
## Why

`experimentalFeature/list` reports effective feature enablement, but
currently does not resolve it against a working directory where
project-local config.toml files can exist and toggle on/off features
when merged into the effective config after resolving the various config
layers. That means we effectively (and incorrectly) ignore features set
in project-local config.

To address that, this PR exposes an optional `thread_id` param which
allows us to load the thread's `cwd.

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server experimental_feature_list`
2026-05-18 12:12:14 -07:00
Felipe Coury
8e52578e66 feat(tui): handle paste in session picker (#23338)
## Why

The session picker already supports typed search, but it ignored
bracketed paste events entirely. On macOS terminals this makes pasted
text look like a no-op on the resume screen, which is especially
noticeable when a user wants to paste part of a thread name, branch, or
path into the search field.

## What Changed

- route `TuiEvent::Paste(String)` into the session picker instead of
dropping it
- normalize pasted search text into a single-line query by collapsing
whitespace
- ignore whitespace-only pastes
- reuse the existing `set_query(...)` path so pasted searches keep the
same filtering and pagination behavior as typed input
- add focused tests for append behavior, whitespace normalization,
whitespace-only paste, and the existing search-loading path

This PR is stacked on top of #23234 and contains only the net change
relative to `etraut/clarify-resume-hints`.

## How to Test

1. Start Codex in a terminal that emits bracketed paste, for example
iTerm2 on macOS.
2. Open the resume picker so the search UI is visible.
3. Copy a term that should match one of the visible sessions, then paste
it into the picker.
4. Confirm the query updates immediately and the list filters as if the
text had been typed.
5. Also verify that pasting text with newlines or tabs still produces a
usable single-line search query.
6. Also verify that normal typed search still works and that `Esc` still
clears the query / exits as before.

Targeted tests:
- `cargo test -p codex-tui`

---------

Co-authored-by: Eric Traut <etraut@openai.com>
2026-05-18 19:04:41 +00:00
Eric Traut
55f6bbc667 goals: keep pause transitions explicit (#23088)
## Problem

This addresses several user-reported cases where active goals were
paused even though the user had not explicitly asked for that
transition:

- the guardian approval-review circuit breaker interrupted a turn and
implicitly paused the goal
- a shutdown in one app-server instance could pause a goal while a
second instance was still actively running the same thread
- steering-style interrupts could also pause the goal even though they
are meant to redirect work, not stop the goal lifecycle

The common problem was that core treated `TurnAbortReason::Interrupted`
as an implicit request to transition the persisted goal to `paused`.
That made unrelated interrupt paths mutate goal state as a side effect,
and in the multi-app-server case it allowed stale process teardown to
pause a live goal owned by another running client.

After this change, transitioning a goal to `paused` is always an
explicit action performed by a client or another intentional goal-state
mutation. It is never an implicit transition triggered by generic
interrupt handling.

Refs #22884.

## What changed

- Remove the goal runtime path that paused active goals after
interrupted task aborts.
- Drop the now-unused abort reason from `GoalRuntimeEvent::TaskAborted`.
- Update the focused regression coverage so an interrupted active goal
still accounts usage but remains `active`.
2026-05-18 11:58:40 -07:00
Eric Traut
ae03d073b3 TUI: replay in-progress MCP calls as started (#23236)
Fixes #22300.

## Summary
MCP tool calls can appear in thread history while still in progress.
During replay, `handle_thread_item` routed every
`ThreadItem::McpToolCall` to the completion handler, so an in-progress
item with no result or error was rendered as `MCP tool call completed
without a result`.

This updates replay handling to mirror command executions: `InProgress`
MCP calls go through `on_mcp_tool_call_started`, while completed and
failed calls continue through the completion path.

## Validation
- `cargo test -p codex-tui
replayed_in_progress_mcp_tool_call_stays_active`
2026-05-18 11:34:31 -07:00
Eric Traut
53a1f4c29e TUI: route elicitation responses to request thread (#23241)
## Why

Fixes #21894.

When the TUI handles an MCP elicitation, the request payload already
includes the thread that generated the elicitation.
`ChatWidget::handle_elicitation_request_now` was ignoring that value and
using the currently visible chat thread instead. In a multi-session TUI,
that can send `resolve_elicitation` to an older visible thread rather
than the session that owns the pending elicitation, producing
`elicitation request not found` and leaving the prompt unresolved.

## What changed

- Parse `McpServerElicitationRequestParams.thread_id` in the ChatWidget
elicitation handler and use it for app-link, form, fallback approval,
and auto-decline resolution paths.
- Keep the existing visible-thread fallback only for malformed request
payloads with an invalid thread id.
- Update the invalid URL elicitation regression test so the visible
thread and request thread intentionally differ.
2026-05-18 11:33:13 -07:00
Eric Traut
4ac3ea20a2 Clarify resume hints for renamed threads (#23234)
Addresses #23181

## Why
Renamed threads can share names, so hints that suggest resuming directly
by name are ambiguous. Issue #23181 asks for the picker hint to include
the thread name and thread ID in parens so users can disambiguate
safely.

## What
- Adds a shared resume hint formatter for named threads: run `codex
resume`, then select `<name> (<thread-id>)`.
- Uses that hint for /rename confirmations, TUI session summaries, and
CLI/TUI exit messages.
- Keeps direct `codex resume <thread-id>` guidance for unnamed threads.

## Verification
Manually verified that message after `/rename` and after `/exit` include
session ID in parens.

---------

Co-authored-by: Felipe Coury <felipe.coury@openai.com>
2026-05-18 11:32:02 -07:00
Eric Traut
0d344aca9b goal: pause continuation loops on usage limits and blockers (#23094)
Addresses #22833, #22245, #23067

## Why
`/goal` can keep synthesizing turns even when the next turn cannot make
meaningful progress. Hard usage exhaustion can replay failing turns, and
repeated permission or external-resource blockers can keep burning
tokens while waiting for user or system intervention.

## What changed
- Add resumable `blocked` and `usageLimited` goal states. As with
`paused`, goal continuation stops with these states.
- Move to `usageLimited` after usage-limit failures.
- Allow the built-in `update_goal` tool to set `blocked` only under
explicit repeated-impasse guidance. Updated goal continuation prompt to
specify that agent should use `blocked` only when it has made at least
three attempts to get past an impasse.

Most of the files touched by this PR are because of the small app server
protocol update.

## Validation

I manually reproduced a number of situations where an agent can run into
a true impasse and verified that it properly enters `blocked` state. I
then resumed and verified that it once again entered `blocked` state
several turns later if the impasse still exists.

I also manually reproduced the usage-limit condition by creating a
simulated responses API endpoint that returns 429 errors with the
appropriate error message. Verified that the goal runtime properly moves
the goal into `usageLimited` state and TUI UI updates appropriately.
Verified that `/goal resume` resumes (and immediately goes back into
`ussageLImited` state if appropriate).


## Follow-up PRs

Small changes will be needed to the GUI clients to properly handle the
two new states.
2026-05-18 11:28:53 -07:00
efrazer-oai
d32cb2c6ac fix: harden plugin creator sharing validation (#22893)
# Summary

Before this change, the sample plugin creator could emit
placeholder-heavy manifests that fail workspace sharing, and it chose a
repo-local marketplace implicitly whenever it ran from inside a git
checkout.

This PR makes generated plugins share-ready by default. It switches
creation to the personal marketplace unless the caller explicitly opts
into repo-local paths, adds a validator that mirrors the workspace
plugin ingestion contract, and updates the skill prompt and docs to
describe the real flow.

The goal is to stop malformed generated plugins before they reach
sharing and to make the default placement match the personal marketplace
behavior users expect.

## Changes

- Generate share-safe plugin manifests instead of `[TODO: ...]`
placeholder payloads.
- Default plugin and marketplace creation to `~/plugins` and
`~/.agents/plugins/marketplace.json`.
- Keep repo-local marketplace creation available through explicit
`--path` and `--marketplace-path` arguments.
- Add `validate_plugin.py` to check manifests, companion files, skill
frontmatter, skill agent YAML, asset paths, and backend-shaped contracts
before sharing.
- Refresh the plugin creator skill text, reference docs, and default
prompt to describe validation and the personal default.

## Design decisions

- The validator tracks the workspace ingestion schema directly,
including the required `defaultPrompt` alias handling and skill
`agents/openai.yaml` checks.
- The validator keeps one intentional extra preflight rule: leftover
`[TODO: ...]` placeholders are rejected before sharing even when a
single placeholder would not independently violate backend type
validation.
- Repo-local creation stays possible, but it is now explicit instead of
cwd-sensitive.

## Testing

Tests: targeted Python syntax checks, plugin skill validation, staged
diff whitespace validation, 15 generated plugin smoke runs, backend
manifest-schema acceptance for all 15 generated bundles, and a git-repo
cwd regression proving the creator still writes to the personal
marketplace by default.
2026-05-18 11:22:42 -07:00
starr-openai
8c14b08dd1 Upload rust full CI JUnit reports (#23273)
## Why

`rust-ci-full` failures currently leave downstream investigation
reconstructing basic test facts from raw logs. `cargo nextest` can emit
standard JUnit XML for each lane, which gives us a small structured
artifact for post-run failure analysis without changing the test
execution model.

## What changed

- enable nextest JUnit output in `codex-rs/.config/nextest.toml`
- upload the lane-scoped JUnit XML artifact from each `rust-ci-full`
test lane

## Verification

- `rust-ci-full` run `26018931531` on head
`52d77c60e79b36859d944ef28a36b014055c5c48` produced JUnit artifacts for
macOS, Linux x64 remote, Windows x64, and Windows ARM64 test lanes
- `rust-ci-full` run `26021241006` on the same head produced the missing
Linux ARM JUnit artifact after the first run lost that runner before
export
- downloaded all five lane JUnit artifacts and verified each contains
non-empty test counters and failure data
2026-05-18 11:10:37 -07:00
iceweasel-oai
b1c13b6fe5 Simplify legacy Windows sandbox ACL persistence (#22569)
## Why

The legacy Windows sandbox still carried a `persist_aces` mode switch,
even though the only path that meaningfully applies filesystem ACEs
today is `workspace-write`, which already uses the persistent behavior.
Legacy read-only sessions rely on the read-only capability SID rather
than per-command filesystem ACE mutation, so the temporary cleanup
branch had become conceptual overhead without a corresponding behavioral
need.

Removing that split makes the ACL lifecycle match the current sandbox
model more directly and trims the guard/revocation plumbing from the
legacy launcher paths.

## What changed

- Removed the `persist_aces` parameter from legacy ACL preparation.
- Made legacy deny-read handling always use the persistent
reconciliation path.
- Dropped guard tracking and post-exit ACE revocation from both capture
and unified-exec legacy flows.
- Kept workspace `.codex` / `.agents` protection tied directly to
`WorkspaceWrite` instead of an intermediate persistence flag.

## Verification

- `cargo fmt -p codex-windows-sandbox`
- `git diff --check`
- `cargo test -p codex-windows-sandbox`
  - 85 passed, 2 ignored, 2 (unrelated) failed locally.
2026-05-18 11:00:03 -07:00
starr-openai
9286ff2805 Fix remote turn diff display roots (#23261)
## Why

`TurnDiffTracker` computes a display root so turn diffs can be rendered
repo-relative. For remote exec-server turns, the selected turn `cwd` may
exist only inside the selected environment, but `run_turn` was
discovering the git root through the local host filesystem. When that
lookup failed, nested remote-session diffs fell back to the nested `cwd`
and showed `/tmp/...`-prefixed paths instead of repo-relative paths.

## What changed

- Resolve the diff display root from the primary selected turn
environment when one exists, using that environment's filesystem and
`cwd`.
- Add `codex_git_utils::get_git_repo_root_with_fs(...)` so git-root
discovery can run against an `ExecutorFileSystem`, including remote
environments.
- Reuse that helper from `resolve_root_git_project_for_trust(...)` and
add coverage for `.git` gitdir-pointer detection.

## Validation

- Devbox Bazel: `//codex-rs/core:core-unit-tests
--test_filter=get_git_repo_root_with_fs_detects_gitdir_pointer`
- Devbox Docker-backed remote-env repro: `//codex-rs/core:core-all-test
--test_filter=apply_patch_turn_diff_paths_stay_repo_relative_when_session_cwd_is_nested`
2026-05-18 10:53:49 -07:00
Felipe Coury
bb43044cba fix(tui): show shutdown feedback on exit (#23323)
## Why

Ctrl+C can take a noticeable amount of time to finish when the TUI is
waiting for the app-server thread shutdown path to complete. Before this
change, the UI could look like it had not accepted the shutdown request
because the composer and cursor remained in their normal interactive
state during that wait.

This PR makes the accepted shutdown visible immediately. It does not add
an artificial sleep or change the shutdown timeout; it only draws one
final feedback frame before continuing through the existing shutdown
flow.

## What Changed

- On `ExitMode::ShutdownFirst`, the TUI now renders shutdown feedback
before awaiting the existing thread shutdown future.
- The bottom pane disables composer input, which hides the cursor
through the existing disabled-input cursor path.
- The composer shows `Shutting down...` as the disabled input hint and
suppresses footer content so the shutdown acknowledgement is not
competing with shortcut/status text.
- The logout path uses the same feedback path before shutting down.

## How to Test

1. Start Codex from this branch.
2. Press `Ctrl+C` to request shutdown.
3. If shutdown takes long enough to observe, confirm the composer
changes to `› Shutting down...`, the cursor disappears, and no footer
hint is rendered below it.
4. Regression check: repeat with text already typed in the composer and
confirm the visible row still switches to `Shutting down...` while the
draft remains preserved internally until the process exits.

Targeted tests:

- `cargo test -p codex-tui
shutdown_in_progress_disables_input_and_uses_hint_without_footer`
- `cargo test -p codex-tui bottom_pane::footer::tests::`

## Local Validation Note

`cargo test -p codex-tui` still aborts in
`app::tests::discard_side_thread_removes_agent_navigation_entry` with a
stack overflow. That same test also failed when run alone locally, and
the failure appears unrelated to this shutdown feedback path.
2026-05-18 14:41:14 -03:00
iceweasel-oai
d335b00212 windows: link MSVC release binaries with static CRT (#22905)
## Why

Windows release artifacts currently import `VCRUNTIME140.dll` and
`VCRUNTIME140_1.dll`. That becomes observable on clean Windows machines
that do not already have the VC++ runtime available globally:

- Desktop Store launches can fail after the app relocates `codex.exe`
out of `WindowsApps`, which means an MSIX-level VCLibs dependency does
not protect the relocated CLI/app-server process.
- The npm CLI path reproduces the same missing-DLL startup failure when
`System32\vcruntime140_1.dll` is hidden and `PATH` is stripped of
incidental fallback copies.

In that setup, the existing Windows binary exits with `0xC0000135` /
`-1073741515` before Codex code runs.

## What changed

- Add `-C target-feature=+crt-static` to the existing MSVC-only Cargo
rustflags in `codex-rs/.cargo/config.toml`.
- Preserve the existing `/STACK:8388608` linker setting in the same
target block.

This keeps the change scoped to Windows MSVC builds and avoids altering
non-Windows or GNU target behavior.

## Verification

I built an x64 Windows release probe with static CRT linkage and the
normal 8 MiB stack reserve, then verified:

- `dumpbin /dependents codex.exe` no longer reports `VCRUNTIME140.dll`
or `VCRUNTIME140_1.dll`.
- `dumpbin /headers codex.exe` reports `800000 size of stack reserve`.
- With `System32\vcruntime140_1.dll` hidden and `PATH` stripped to
Windows system directories only:
  - the old npm CLI path exits `-1073741515`
- the rebuilt static-CRT `codex.exe --version` succeeds with exit code
`0`
  - the rebuilt TUI starts successfully

I also confirmed `codex.exe app-server --listen ws://127.0.0.1:0` starts
and binds normally with the static-CRT artifact.
2026-05-18 10:32:33 -07:00
jif-oai
3f2b7ede0b nit: read prompt (#23332) 2026-05-18 19:25:27 +02:00
pakrym-oai
82061660ae [codex] Remove legacy shell output formatting paths (#22706)
## Why

The client and tool pipeline still carried compatibility code for legacy
structured shell output. Current shell and apply_patch responses are
already plain text for model consumption, so keeping a
JSON-serialization path plus shell-item rewrite logic makes the request
formatter and tests preserve a format we do not need anymore.

## What Changed

- Removed the client-side shell output rewrite from
`core/src/client_common.rs`.
- Removed the structured exec-output formatter and the shell `freeform`
switch so tool emitters use one model-facing formatter.
- Collapsed apply_patch/shell serialization tests around the remaining
plain-text output expectations and removed duplicate one-variant
parameterized cases.
- Kept the `ApplyPatchModelOutput::ShellCommandViaHeredoc` compatibility
input shape, but no longer treats it as a separate output-format mode.

## Validation

- `cargo test -p codex-core client_common`
- `cargo test -p codex-core shell_serialization`
- `cargo test -p codex-core apply_patch_cli`
- `just fix -p codex-core`

## Documentation

No external Codex documentation update is needed.
2026-05-18 09:57:54 -07:00
Eric Traut
adca1b643f [1 of 2] Optimize TUI startup terminal probes (#23175)
## Why

Codex TUI startup still feels slower than 0.117.0 after the app-server
move in 0.118.0. A visible chunk of launch-to-input latency comes from
serial terminal startup probes: cursor position, keyboard enhancement
support, and default foreground/background color queries can each wait
on terminal responses before the first usable frame.

Refs #16335.

## What

This PR batches the terminal startup probes into one bounded probe. It
also reuses the probed cursor position and default colors during TUI
setup, fast-paths the primary-device-attributes fallback as keyboard
enhancement unsupported, and keeps lightweight startup timing logs for
future tuning.

The startup telemetry is intentionally left in production: it records
phase timings for terminal probes and initial-frame scheduling so future
startup regressions can be diagnosed from normal logs rather than
re-adding one-off debug instrumentation.

## Benchmark

In the local pty startup benchmark, the pre-optimization `main` baseline
was about 250.5ms median from launch to accepted chat input. This
probe-only branch measured about 152ms median, for an approximate
savings of 95-100ms.

## Stack

1. [#23175: [1 of 2] Optimize TUI startup terminal
probes](https://github.com/openai/codex/pull/23175) — this PR
2. [#23176: [2 of 2] Start fresh TUI thread in
background](https://github.com/openai/codex/pull/23176) — layered on
this PR

## Verification

- `cargo test -p codex-tui`
2026-05-18 09:04:02 -07:00
Eric Traut
e734cb5713 Hide ChatGPT usage link for non-OpenAI status (#23127)
Addresses #22778

## Summary

Provider deployments such as Bedrock manage rate limits and billing
outside ChatGPT, so the `/status` link to the ChatGPT usage page is
irrelevant and confusing for those users. Custom providers that are
explicitly configured to use OpenAI/ChatGPT auth still point at
OpenAI-backed usage, so they should keep the link.

## Changes

- Render the ChatGPT usage note only when the configured provider uses
OpenAI auth.
- Keep the note hidden when `/status` displays a provider such as
Bedrock that manages limits elsewhere.
- Add regression coverage for both Bedrock and a custom OpenAI-auth
proxy provider.

## Manual Repro

1. Configure Codex with a non-OpenAI-auth provider, for example
`model_provider = "amazon-bedrock"`.
2. Start the TUI and run `/status`.
3. Confirm the status card shows the custom provider, for example `Model
provider: Amazon Bedrock`, and does not show
`https://chatgpt.com/codex/settings/usage`.
4. Configure a custom provider that proxies to OpenAI and has
OpenAI/ChatGPT auth enabled.
5. Run `/status` again and confirm the ChatGPT usage link appears for
that OpenAI-auth provider.
2026-05-18 09:02:38 -07:00
Eric Traut
deb159d9ff Fix TUI stream cleanup after turn errors (#23128)
## Summary

Fixes #22726.

After a Responses stream disconnect, the live TUI could keep accepting
prompts while leaving partially streamed assistant output in its
transient streaming-cell form. That made fenced diffs or SVG/XML-like
content appear as raw transcript text until the user closed the TUI and
resumed the same session, which rebuilt the transcript from saved
history.

This change finalizes the active answer stream before generic
failed-turn cleanup clears the stream controller, so the live transcript
takes the same source-backed markdown consolidation path as a successful
turn.

## Reviewer repro

1. Start a local Codex TUI session.
2. Trigger an assistant turn that streams markdown content, especially a
fenced diff or SVG/XML-like block.
3. Force or encounter a non-retry stream disconnect before the turn
completes.
4. Continue using the same still-open TUI session.
5. Before this fix, the live history can stay raw/plain even though
`codex resume` renders the same session normally.
6. After this fix, the failed-turn path consolidates the partial stream
before rendering the error, so the live TUI keeps normal transcript
rendering.
2026-05-18 09:00:57 -07:00
Eric Traut
af6ffb6ebb Support --output-schema for exec resume (#23123)
## Why

`codex exec resume` should have the same structured-output support as
top-level `codex exec`. Without `--output-schema`, multi-turn automation
has to choose between resumed session context and schema-validated JSON
output.

Fixes #22998.

## What changed

- Marked `--output-schema` as a global `codex exec` flag so it can be
passed after `resume`.
- Reused the existing output schema plumbing so resumed turns attach the
schema to the final response request while preserving session context.
2026-05-18 08:55:22 -07:00
Eric Traut
fce10e009d tui: keep cleared Fast tier from reappearing after side-thread resume (#23121)
## Why

After turning Fast mode off in the TUI, returning from a side thread
could make `Fast` appear again in the main chat widget. The opt-out
itself was still persisted; the display was being rebuilt from stale
cached `ThreadSessionState` data, which made it look like Fast had been
re-enabled.

Fixes #23104.

## What changed

- Keep the active thread's cached `service_tier` in sync whenever the
user persists a service-tier selection.
- Update both the primary-thread snapshot and the thread event store so
restored TUI state reflects the current tier.
- Add a focused regression test for clearing a cached Fast tier.

## Manual repro

1. Start a TUI session where `Fast` is enabled by default.
2. Run `/fast` and turn Fast mode off. Confirm `Fast` disappears from
the chat widget display.
3. Re-enter thread navigation via either path:
   - Run `/side test`, then return to the main thread.
   - Run `/agent`, enter a child thread, then return to the main thread.
4. Before this fix, `Fast` reappears in the main chat widget display
even though the opt-out was already persisted.
5. After this fix, `Fast` stays cleared.

## Verification

- `cargo test -p codex-tui
app::thread_session_state::tests::service_tier_sync_updates_active_cached_session
-- --exact`
2026-05-18 08:52:18 -07:00
jif-oai
4ca60ef9ff Emit goal update events from goal extension tools (#23306)
## Why

Goal creation and completion are moving through the goal extension, but
the rest of Codex still observes goal state through `ThreadGoalUpdated`
events. Without an event from the extension-owned tool path, a
model-initiated `create_goal` or `update_goal` can mutate the backend
and return a tool result while app-server and TUI listeners miss the
goal state transition.

## What changed

- Added `GoalEventEmitter` as a small wrapper around the host
`ExtensionEventSink` to build `EventMsg::ThreadGoalUpdated` events for
goal updates.
- Threaded the registry event sink into `GoalExtension` and the
`GoalToolExecutor`s created by the extension. The public
`GoalExtension::new` constructor keeps a `NoopExtensionEventSink`
fallback for standalone use.
- Emitted a goal update after successful `create_goal` and `update_goal`
tool calls. Until `ToolCall` exposes the current turn submission id,
these events use the tool call id as the event id and leave `turn_id`
unset.

Relevant code:

-
[`GoalEventEmitter::thread_goal_updated`](1fe2d73890/codex-rs/ext/goal/src/events.rs (L19-L32))
- [`GoalToolExecutor` emission
points](1fe2d73890/codex-rs/ext/goal/src/tool.rs (L161-L190))

## Testing

- `cargo test -p codex-goal-extension`
2026-05-18 16:14:37 +02:00
jif-oai
b631d92170 chore: make token usage async (#23305)
Make the `TokenUsageContributor` async. This will be required for future
extension and it's basically free
2026-05-18 15:59:06 +02:00
jif-oai
500ef67ed1 chore: goal resumed metrics (#23301)
Add metrics for goal resume
2026-05-18 15:19:23 +02:00
jif-oai
7ee7fe239f chore: isolate thread goal storage behind GoalStore (#23295)
## Why

Thread goal persistence is being prepared for a dedicated storage
boundary. Before that split, goal-specific reads, writes, accounting,
and cleanup were exposed directly on `StateRuntime`, so core and
app-server callsites stayed coupled to the full runtime instead of a
goal-specific store.

This PR introduces that boundary without changing the goal wire API or
current persistence behavior. Callers now go through
`StateRuntime::thread_goals()` and the new `GoalStore`, while
`GoalStore` still uses the existing state DB pool underneath.

## What changed

- Added `GoalStore` in `state/src/runtime/goals.rs` and exposed it from
`StateRuntime` via `thread_goals()`.
- Moved thread-goal reads, writes, status updates, pause, delete, and
usage accounting onto `GoalStore`.
- Updated core session goal handling, app-server goal RPCs, resume
snapshots, and goal tests to use the store boundary.
- Kept thread deletion responsible for cascading goal cleanup by
deleting the goal through the store only after a thread row is removed.

## Testing

- Existing goal persistence, resume, and accounting tests were updated
to exercise the new `GoalStore` access path.
2026-05-18 14:47:05 +02:00
jif-oai
6a8173588c feat: add extension event sink capability (#23293)
## Why

Extensions can already expose typed contributions and receive host
capabilities such as `AgentSpawner`, but they do not have a typed way to
send protocol events back through the host. Extensions that need to
surface progress or status should not have to own persistence, ordering,
transport fanout, or logging decisions themselves.

## What

- Add `ExtensionEventSink`, a host-provided fire-and-forget sink for
`codex_protocol::protocol::Event`.
- Add `NoopExtensionEventSink` so hosts that do not expose extension
event emission keep the existing empty-registry behavior.
- Store the sink on `ExtensionRegistryBuilder` / `ExtensionRegistry`,
with `with_event_sink(...)` and `event_sink()` accessors, and re-export
the new capability from `codex-extension-api`.

## Testing

- Not run locally; PR metadata/body update only.
2026-05-18 14:08:56 +02:00
jif-oai
9531e932ef Make extension lifecycle hooks async (#23291)
## Why

Extension lifecycle hooks sit on the host/extension boundary, but the
current trait surface only allows synchronous callbacks. That forces
extensions that need to seed, rehydrate, observe, or flush
extension-owned state during thread and turn transitions to either block
inside the callback or move async work into separate host plumbing.

This PR makes those lifecycle callbacks awaitable so extension
implementations can perform async work directly at the lifecycle point
where the host already has the relevant session, thread, or turn stores
available.

## What changed

- Makes `ThreadLifecycleContributor` and `TurnLifecycleContributor`
async in `codex-extension-api`.
- Awaits thread start/resume/stop and turn start/stop/abort lifecycle
callbacks from `codex-core`.
- Updates the guardian and memories extensions to implement the async
lifecycle trait surface.
- Updates the existing lifecycle tests to use async contributor
implementations.
- Adds `async-trait` to the crates that now expose or implement these
async object-safe lifecycle traits.

## Testing

- Existing `codex-core` lifecycle tests were updated to cover async
implementations for thread stop and turn abort ordering.
2026-05-18 13:53:58 +02:00
jif-oai
a80f07ec4a chore: goal ext skeleton (#23288)
Skeleton of `/goal` in extension
Lot's of follow-ups coming
2026-05-18 13:32:21 +02:00
xli-oai
da14dd2add [codex] Add installed-plugin mention API (#22448)
## Summary
- add app-server `plugin/installed` for mention-oriented plugin loading
- return installed plugins plus explicitly requested install-suggestion
rows
- keep remote handling on installed-state data instead of the broad
catalog listing path

## Why
The `@` mention surface only needs plugins that are usable now, plus a
small product-approved set of install suggestions. It does not need the
full catalog-shaped `plugin/list` payload that the Plugins page uses.

## Validation
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-app-server --test all plugin_installed_`

## Notes
- The package-wide `cargo test -p codex-app-server` run still hits an
existing unrelated stack overflow in
`in_process::tests::in_process_start_clamps_zero_channel_capacity`.
- Companion webview PR: https://github.com/openai/openai/pull/915672
2026-05-18 03:11:54 -07:00
jif-oai
22dd9ad392 Densify and version memory summaries (#23148)
## Why

`memory_summary.md` is injected into every session, so its value depends
on staying compact, navigational, and easy to regenerate when the
expected shape changes. The previous consolidation prompt encouraged a
broad actionable inventory and allowed older summary structures to be
patched in place, which makes it easier for stale or overly verbose
summaries to keep accumulating.

This change makes the summary format explicitly versioned and biases
Phase 2 memory consolidation toward denser prompt-loaded context.

## What changed

- Require `memory_summary.md` to begin with an exact `v1` header.
- Teach consolidation to regenerate `memory_summary.md` from scratch
when the header is missing or incompatible, while still allowing
incremental updates to `MEMORY.md`.
- Tighten the `memory_summary.md` instructions so it acts as a compact
routing/index layer instead of a second handbook.
- Lower `MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT` from
`5_000` to `2_500` so the runtime prompt budget matches the denser
summary target.

## Verification

Not run; this is a prompt/template update plus a prompt budget constant
change.
2026-05-18 09:59:34 +02:00
starr-openai
64ead6a83a Add exec-server websocket keepalive (#23226)
## Summary
- send periodic websocket Ping frames from outbound exec-server
websocket clients
- cover direct exec-server websocket clients plus rendezvous
harness/executor websocket connections
- keep inbound axum-accepted exec-server websocket connections passive
- add focused keepalive coverage for direct and relay websocket paths

## Validation
- /Users/starr/code/openai/project/dotslash-gen/bin/bazel test
//codex-rs/exec-server:exec-server-unit-tests
--test_filter='websocket_connection_sends_keepalive_ping|harness_connection_sends_keepalive_ping|multiplexed_executor_sends_keepalive_ping'
- /Users/starr/code/openai/project/dotslash-gen/bin/bazel test
//codex-rs/exec-server:exec-server-relay-test
--test_filter=multiplexed_remote_executor_routes_independent_virtual_streams
2026-05-18 03:07:32 +00:00
Michael Bolin
0a83353ca3 test: reduce core sandbox policy test setup (#23036)
## Why

`SandboxPolicy` is a legacy compatibility shape, but several core tests
still used it for ordinary turn setup even when the runtime path now
carries `PermissionProfile`. With the first cleanup PR merged, this
follow-up trims more core test scaffolding so remaining `SandboxPolicy`
matches are easier to classify as production compatibility,
legacy-boundary coverage, or explicit conversion tests.

## What Changed

- Updated apply-patch handler and runtime tests to pass
`PermissionProfile` directly.
- Changed sandboxing test helpers to build permission profiles without
first creating `SandboxPolicy` values.
- Converted request-permissions integration turns to pass
`PermissionProfile` through the test helper, leaving legacy sandbox
projection at the `Op::UserTurn` boundary.
- Converted unified exec integration helpers and direct turn submissions
to use `PermissionProfile` values instead of `SandboxPolicy` setup.
- Removed now-unused `SandboxPolicy` imports from the touched core
tests.

## Test Plan

- `just fmt`
- `cargo test -p codex-core --lib tools::sandboxing::tests`
- `cargo test -p codex-core --lib tools::runtimes::apply_patch::tests`
- `cargo test -p codex-core --lib tools::handlers::apply_patch::tests`
- `cargo test -p codex-core --lib unified_exec::process_manager::tests`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-core --test all unified_exec::`
- `just fix -p codex-core`
2026-05-17 08:39:41 -07:00
jif-oai
545ede569c Make multi-agent v2 tool namespace configurable (#23147)
## Summary
- Add `features.multi_agent_v2.tool_namespace` with config/schema
validation for Responses-compatible namespace values.
- Thread the resolved namespace into `ToolsConfig` for normal turns and
review turns.
- Wrap MultiAgentV2 tool specs and registry names in the configured
namespace when namespace tools are supported, while falling back to the
plain tool names when they are not.

## Validation
- `just fmt`
- `just write-config-schema`
- `cargo test -p codex-features multi_agent_v2_feature_config --
--nocapture`
- `cargo test -p codex-core test_build_specs_multi_agent_v2 --
--nocapture`
- `cargo test -p codex-core multi_agent_v2_config -- --nocapture`
- `cargo test -p codex-core
multi_agent_v2_rejects_invalid_tool_namespace -- --nocapture`
- `cargo test -p codex-tools`
- `git diff --check`
2026-05-17 15:27:43 +02:00
Eric Traut
0445b290fe [1 of 4] tui: route primary settings writes through app server (#22913)
## Why
The TUI can run against a remote app server, but several high-traffic
settings still persisted by editing the local config file. That sends
remote sessions' preference writes to the wrong machine and lets local
disk state drift from the app-server-owned config.

This is **[1 of 4]** in a stacked series that moves TUI-owned config
mutations onto app-server APIs.

## What changed
- Added a small TUI helper for typed app-server config writes.
- Routed primary interactive preference writes through
`config/batchWrite`.
- Preserved existing profile scoping for settings that already support
`profiles.<profile>.*` overrides.

## Config keys affected
- `model`
- `model_reasoning_effort`
- `personality`
- `service_tier`
- `plan_mode_reasoning_effort`
- `approvals_reviewer`
- `notice.fast_default_opt_out`
- Profile-scoped equivalents under `profiles.<profile>.*`

## Suggested manual validation
- Connect the TUI to a remote app server, change `model` and
`model_reasoning_effort`, reconnect, and confirm the remote config
retained both values while the local `config.toml` did not change.
- Change `personality`, `plan_mode_reasoning_effort`, and the explicit
auto-review selection, then reconnect and confirm those choices persist
through the app server.
- Clear the service tier back to default and confirm `service_tier` is
cleared while `notice.fast_default_opt_out = true` is persisted
remotely.
- Repeat one setting change with an active profile and confirm the write
lands under `profiles.<profile>.*`.

## Stack
1. [#22913](https://github.com/openai/codex/pull/22913) `[1 of 4]`
primary settings writes
2. [#22914](https://github.com/openai/codex/pull/22914) `[2 of 4]` app
and skill enablement
3. [#22915](https://github.com/openai/codex/pull/22915) `[3 of 4]`
feature and memory toggles
4. [#22916](https://github.com/openai/codex/pull/22916) `[4 of 4]`
startup and onboarding bookkeeping
2026-05-16 14:27:02 -07:00
sayan-oai
061a614d85 multiagent: trim model-visible description, cap to 5 models (#23069)
## Why

The `spawn_agent` model override guidance is uncapped and bloating
context. We need to trim down each entry and cap total entries.

picked 5 as cap, we can change

## What changed

- Cap the model override summaries shown in `spawn_agent` to the first 5
picker-visible models, preserving the existing priority ordering from
the models manager.
- Condense each rendered entry to the actionable pieces the model needs:
  - use the model slug as the label
  - render compact reasoning effort lists with the default marked inline
- render only service tier IDs, and omit the clause when no tiers are
available
- Update coverage so the compact formatter shape and the top-5 cap are
exercised, and keep the end-to-end request assertion aligned with real
model metadata.

## Example

Before:

`- gpt-5.4 ('gpt-5.4\'): Strong model for everyday coding. Default
reasoning effort: medium. Supported reasoning efforts: low (Fast
responses with lighter reasoning), medium (Balances speed and reasoning
depth for everyday tasks), high (Greater reasoning depth for complex
problems), xhigh (Extra high reasoning depth for complex problems).
Supported service tiers: priority (Fast: 1.5x speed, increased usage).`

After:

`- 'gpt-5.4': Strong model for everyday coding. Reasoning efforts: low,
medium (default), high, xhigh. Service tiers: priority.`
2026-05-16 13:43:30 -07:00
Miaolin Min
6941f5c2c5 [codex] preserve MCP result meta in McpToolCallItemResult (#22946)
## Summary

https://openai.slack.com/archives/C0ARA9UAQEA/p1778890981647319?thread_ts=1778888537.934319&cid=C0ARA9UAQEA


- Add `_meta` to exec JSONL MCP tool call result events.
- Copy MCP result metadata through the JSONL event conversion.
- Add a focused test that verifies `_meta` is serialized as `_meta` and
not `meta`.


## Verification

https://www.notion.so/openai/Miaolin-0516-_meta-population-debug-3628e50b62b08074b365e0ce1ffb8f74
2026-05-16 13:27:44 -07:00
Michael Zeng
b200dd1b6f exec-server: support auth-backed remote executor registration (#22769)
This updates remote `exec-server` registration to use normal Codex auth
instead of a registry-issued credential. The registry request is built
from the existing auth-provider path, which preserves the biscuit-only
registry contract introduced in
[openai/openai#924101](https://github.com/openai/openai/pull/924101)
while removing the old remote registry bearer env var and its direct
transport assumptions.

The default remote flow uses persisted ChatGPT auth from the normal
Codex config/storage path. This PR also includes the containerized Agent
Identity path needed by
[openai/openai#924260](https://github.com/openai/openai/pull/924260):
remote `exec-server` accepts `--allow-agent-identity-auth`, permits
Agent Identity auth loaded from `CODEX_ACCESS_TOKEN` only when that flag
is present, and reuses the existing Agent task registration plus derived
`AgentAssertion` header generation. API-key auth remains unsupported,
and Agent Identity stays opt-in.

Validation performed beyond normal presubmit coverage:
- `cargo fmt --all --check`
- `cargo check -p codex-cli`
- `cargo test -p codex-exec-server`
- `cargo test -p codex-cli exec_server_agent_identity_auth_flag_`
- `cargo test -p codex-cli remote_exec_server_auth_mode_`

I also attempted `cargo test -p codex-cli`. The new CLI tests passed
inside that run, but the suite ended on an unrelated local
marketplace-state failure in
`plugin_list_excludes_unconfigured_repo_local_marketplaces`.
2026-05-16 12:48:28 -07:00
Michael Bolin
d91bc15618 test: construct permission profiles directly (#23030)
## Why

`SandboxPolicy` is now a legacy compatibility shape, but several tests
still built a `SandboxPolicy` only to immediately convert it into
`PermissionProfile` for APIs that already accept canonical runtime
permissions. Those detours make it harder to audit where legacy sandbox
policy is still required, because boundary-only usages are mixed
together with ordinary test setup.

## What Changed

- Updated tests in `codex-core`, `codex-exec`, `codex-analytics`, and
`codex-config` to construct `PermissionProfile` values directly when the
code under test takes a permission profile.
- Changed exec-policy, request-permissions, session, and sandbox test
helpers to pass `PermissionProfile` through instead of converting from
`SandboxPolicy` internally.
- Left `SandboxPolicy` in place where tests are explicitly exercising
legacy compatibility or request/response boundaries.

## Test Plan

- `cargo test -p codex-analytics -p codex-config`
- `cargo test -p codex-core --lib safety::tests`
- `cargo test -p codex-core --lib exec_policy::tests::`
- `cargo test -p codex-core --lib exec::tests`
- `cargo test -p codex-core --lib guardian_review_session_config`
- `cargo test -p codex-core --lib tools::network_approval::tests`
- `cargo test -p codex-core --lib
tools::runtimes::shell::unix_escalation::tests`
- `cargo test -p codex-core --lib managed_network`
- `cargo test -p codex-core --test all request_permissions::`
- `cargo test -p codex-exec sandbox`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23030).
* #23036
* __->__ #23030
2026-05-16 12:12:37 -07:00
Eric Traut
941e7f825e Improve goal completion usage reporting (#22907)
## Why

Goal completion follow-up turns currently receive a preformatted English
usage sentence such as `time used: 2586 seconds`. That nudges the model
to echo an awkward raw seconds count in the final reply, even though the
tool result already exposes structured usage fields like
`goal.timeUsedSeconds`, `goal.tokensUsed`, and `goal.tokenBudget`.

## What changed

- Replace the preformatted completion usage sentence with guidance to
read the structured goal fields from the tool result.
- Preserve token-budget reporting while allowing the model to phrase
elapsed time in a concise, human-friendly way that fits the response
language.
- Update core coverage for both the generated completion guidance and
the session flow that forwards it back to the model.

## Verification

Previously, it would have output a final message indicating that it
"worked for 303 seconds". Now it shows the following:

<img width="286" height="35" alt="image"
src="https://github.com/user-attachments/assets/d7011880-9449-46a7-856f-4e50ae00eb45"
/>
2026-05-16 11:49:40 -07:00