**Stack position:** [1 of 7]
## Summary
The first three PRs in this stack are a cleanup pass before the actual
thread settings API work.
Today, core has several overlapping "user input" ops: `UserInput`,
`UserInputWithTurnContext`, and `UserTurn`. They differ mostly in how
much next-turn state they carry, which makes the later queued thread
settings update harder to reason about and review.
This PR starts that cleanup by adding the shared
`ThreadSettingsOverrides` payload and allowing `Op::UserInput` to carry
it. Existing variants remain in place here, so this layer is mostly a
behavior-preserving API shape change plus mechanical constructor
updates.
## End State After PR3
By the end of PR3, `Op::UserInput` is the only "user input" core op. It
can carry optional thread settings overrides for callers that need to
update stored defaults with a turn, while callers without updates use
empty settings. `Op::UserInputWithTurnContext` and `Op::UserTurn` are
deleted.
## End State After PR5
By the end of PR5, core will have only two ops for this area:
- `Op::UserInput` for user-input-bearing submissions.
- `Op::ThreadSettings` for settings-only updates.
## Stack
1. [1 of 7] [Add thread settings to
UserInput](https://github.com/openai/codex/pull/23080) (this PR)
2. [2 of 7] [Remove
UserInputWithTurnContext](https://github.com/openai/codex/pull/23081)
3. [3 of 7] [Remove
UserTurn](https://github.com/openai/codex/pull/23075)
4. [4 of 7] [Placeholder for OverrideTurnContext
cleanup](https://github.com/openai/codex/pull/23087)
5. [5 of 7] [Replace OverrideTurnContext with
ThreadSettings](https://github.com/openai/codex/pull/22508)
6. [6 of 7] [Add app-server thread settings
API](https://github.com/openai/codex/pull/22509)
7. [7 of 7] [Sync TUI thread
settings](https://github.com/openai/codex/pull/22510)
## Summary
- mark `ToolSearch` as removed and ignore stale config writes for its
legacy key
- make search tool exposure depend only on model capability, not a
feature toggle
- remove app-server enablement support and prune now-obsolete test
coverage/setup
## Verification
- `cargo test -p codex-features`
- `cargo test -p codex-tools`
- `cargo test -p codex-core search_tool_requires_model_capability`
- `cargo test -p codex-app-server experimental_feature_enablement_set_`
## Notes
- This keeps the legacy config key as a no-op for compatibility while
removing the ability to toggle the behavior off cleanly.
- No developer-facing docs update outside the touched app-server README
was needed.
Deletes the skill env var dependency prompt feature and its runtime
path. env_var entries in skill dependency metadata are now silently
ignored during skill loading.
## Why
Compaction now installs replacement history inside the session, but the
turn and compaction callers were still reaching into
`ModelClientSession` to reset websocket transport state after that
install. That made a transport-level reset part of the compaction API
even though websocket incremental request selection already checks
whether the next request is a strict extension of the previous one and
falls back to a full `response.create` when it is not.
## What changed
- Removed the compaction-side calls to `reset_websocket_session` from
`compact.rs` and `session/turn.rs`.
- Simplified pre-sampling and mid-turn compaction helpers so they return
`CodexResult<()>` instead of carrying a reset flag.
- Made `ModelClientSession::reset_websocket_session` private to
`client.rs`, leaving only the websocket timeout recovery path inside the
client as a caller.
## Validation
- `cargo test -p codex-core --test all
responses_websocket_creates_on_non_prefix`
- `cargo test -p codex-core --test all
steered_user_input_waits_for_model_continuation_after_mid_turn_compact`
- `cargo test -p codex-core --test all
pre_sampling_compact_runs_on_switch_to_smaller_context_model`
## Why
The v2 app-server permission profile fields are experimental, but the
previous migration kept a legacy object payload for profile selection.
That made clients aware of server-owned `activePermissionProfile`
metadata such as `extends`, and it kept a
`legacy_additional_writable_roots` path even though
`runtimeWorkspaceRoots` now owns runtime workspace-root selection.
This PR makes the client contract match the intended model: clients
select a permission profile by id, and the server resolves and reports
active profile provenance in response payloads.
Follow-up to #22611.
## What Changed
- Changed `thread/start`, `thread/resume`, `thread/fork`, and
`turn/start` permission profile selection to plain profile id strings.
- Changed `command/exec.permissionProfile` to a plain profile id string
for the same client/server ownership split.
- Removed `PermissionProfileSelectionParams` and the legacy `{ type:
"profile", modifications: [...] }` compatibility deserializer.
- Updated app-server, TUI, and `codex exec` call sites to send only ids,
while keeping `activePermissionProfile` as server response metadata.
- Updated app-server docs and schema fixtures for the revised
`command/exec.permissionProfile` shape.
## Verification
- `cargo test -p codex-app-server-protocol`
- `RUST_MIN_STACK=8388608 cargo test -p codex-app-server`
- `cargo test -p codex-exec`
- `RUST_MIN_STACK=8388608 cargo test -p codex-tui`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23360).
* #23368
* __->__ #23360
## Why
- Follows #20949.
- The above moved `thread_source` attribution from the reducer to
explicit caller provided metadata
- The `codex exec` path still omitted this metadata, leaving
exec-created threads without `thread_source`
## What Changed
- Ensures exec threads are marked as user created (`thread_source =
"user"`)
- Preserves thread-source metadata in exec’s startup session event
## Verification
- Updated unit tests to validate exec `thread_source` propagation.
- `cargo +1.93.0 test -p codex-exec --manifest-path codex-rs/Cargo.toml`
- `cargo +1.93.1 build -p codex-cli --manifest-path codex-rs/Cargo.toml`
- Validated locally with a freshly built `codex exec` run:
- Startup logs showed `thread_source: Some(User)`.
- Rollout metadata recorded `"thread_source":"user"`.
## Why
Older iTerm2 builds can be detected as supporting the image transport
that terminal pets use, but in practice they fail to render the pet flow
correctly. Instead of silently attempting image rendering, Codex should
tell the user that their iTerm2 version is too old and that upgrading is
the fix.
## What Changed
- gate iTerm2 pet auto-detection on version `3.6.0` or newer
- show a dedicated upgrade message for older or unknown iTerm2 versions
instead of the generic unsupported-terminal warning
- keep the existing generic unsupported-terminal path for non-iTerm
terminals
- add regression coverage for iTerm2 version parsing and the old-iTerm
warning path
## How to Test
1. Start Codex in iTerm2 3.6 or newer.
2. Run `/pets`.
3. Confirm the pets picker opens instead of showing a warning.
4. Start Codex in an older iTerm2 build, or exercise the equivalent test
path.
5. Run `/pets`.
6. Confirm Codex warns that pets require iTerm2 3.6 or newer and tells
the user to upgrade.
7. Also verify that a non-iTerm unsupported terminal still shows the
generic unsupported-terminal message.
Targeted tests:
- `cargo test -p codex-terminal-detection`
- `cargo test -p codex-tui pets::`
- `cargo test -p codex-tui slash_pets_on_unsupported_terminal`
- `cargo test -p codex-tui slash_pets_on_old_iterm2`
## Why
Pending model input was split across `Session`, `TurnState`, and the
agent mailbox. That made it easy for new paths to manage queued user
input or mailbox delivery outside the intended ownership boundary.
This PR consolidates the model-facing input lifecycle behind the session
input queue so turn-local pending input, next-turn queued items, and
mailbox delivery coordination are owned in one place.
## What Changed
- Added `session/input_queue.rs` to own pending input queues and mailbox
delivery coordination.
- Removed the standalone `agent/mailbox.rs` channel wrapper and store
mailbox items directly in the input queue.
- Moved pending-input mutations off `TurnState`; `TurnState` now exposes
the queue-owned storage directly for now.
- Routed abort cleanup, mailbox delivery phase changes, next-turn queued
items, and active-turn pending input through `InputQueue`.
- Boxed stack-heavy agent resume/fork startup futures that the refactor
pushed over the default test stack.
- Updated session, task, goal, stream-event, and multi-agent call sites
and tests to use the new queue ownership.
## Verification
- `cargo test -p codex-core --lib agent::control::tests`
- `cargo test -p codex-core --lib
agent::control::tests::resume_closed_child_reopens_open_descendants --
--exact`
- `cargo test -p codex-core --lib
agent::control::tests::spawn_agent_fork_last_n_turns_keeps_only_recent_turns
-- --exact`
- `cargo test -p codex-core --lib
agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role
-- --exact`
- `cargo test -p codex-core` was also run; it completed with 1814
passed, 4 ignored, and one timeout in
`agent::control::tests::resume_thread_subagent_restores_stored_nickname_and_role`,
which passed when rerun in isolation.
Adding the id of the plugin that contains the MCP (if any) so we can
apply filters at plugin level.
## Summary
- carry the plugin owner into MCP runtime provenance
- attach `plugin_id` to outbound plugin-backed MCP tool-call `_meta`
- avoid misattributing user-configured MCP servers that shadow plugin
server names
## Testing
- `just fmt`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
- `cargo test -p codex-mcp`
- `cargo test -p codex-core
plugin_mcp_tool_call_request_meta_includes_plugin_id`
- `cargo test -p codex-core
to_mcp_config_omits_plugin_id_when_user_server_shadows_plugin_mcp`
- `cargo test -p codex-core
rebuild_preserving_session_layers_refreshes_plugin_derived_mcp_config`
- `git diff --check`
## Notes
- Attempted `cargo test -p codex-core`; it aborted in
`agent::control::tests::resume_agent_from_rollout_skips_descendants_when_parent_resume_fails`
with a stack overflow before the full suite completed.
## Why
`TurnContextItem` is the durable baseline used to reconstruct context
diffs across resume/fork. Most of the old persisted-only fields on it
are no longer read, so keeping them in rollout snapshots adds schema
surface and state that can drift without affecting reconstruction.
`summary` is the exception: older Codex versions require it to
deserialize `turn_context` records, so keep writing a default
compatibility value until that schema surface can be removed safely.
## What changed
- Removed the unused persisted fields from `TurnContextItem`: trace ids,
user/developer instructions, output schema, and truncation policy.
- Kept `summary` with a compatibility comment and made
`TurnContext::to_turn_context_item` write `ReasoningSummary::Auto`
instead of live turn state.
- Updated rollout/context reconstruction fixtures for the retained
summary field.
## Verification
- `cargo test -p codex-protocol --lib turn_context_item`
- `cargo test -p codex-rollout
resume_candidate_matches_cwd_reads_latest_turn_context`
- `cargo test -p codex-state turn_context`
- `cargo test -p codex-core --lib
new_default_turn_captures_current_span_trace_id`
- `cargo test -p codex-core --lib
record_initial_history_resumed_turn_context_after_compaction_reestablishes_reference_context_item`
- `cargo test -p codex-core --test all
emits_warning_when_resumed_model_differs`
- `git diff --check`
## Description
This PR makes `codex remote-control` behave like a foreground CLI
command by default. Running it now starts remote control, waits for
readiness, prints a clear status message with the machine name, and
stays alive until Ctrl-C.
Users who want daemon behavior can use `codex remote-control start`, and
`codex remote-control stop` now prints concise human-readable output.
`--json` remains available for scripts.
Implementation-wise, this now verifies the real app-server state instead
of just assuming startup worked. The CLI starts or connects to
app-server, probes its control socket, calls the `remoteControl/enable`
API, and waits for the remote-control status response/notification
before printing success.
For daemon mode, `codex remote-control start` also reports which managed
app-server binary was used, including its path and best-effort `codex
--version`, so failures are easier to diagnose.
## Examples
Example output:
```
> codex remote-control
Starting app-server with remote control enabled...
This machine is available for remote control as com-97826.
Press Ctrl-C to stop.
```
Error case using daemon (currently expected based on our publicly
released CLI version):
```
> ./target/debug/codex remote-control start
Starting app-server daemon with remote control enabled...
Error: app server did not become ready on /Users/owen/.codex/app-server-control/app-server-control.sock
Daemon used app-server:
path: /Users/owen/.codex/packages/standalone/current/codex
version: 0.130.0
Managed app-server stderr (/Users/owen/.codex/app-server-daemon/app-server.stderr.log):
error: unexpected argument '--remote-control' found
Usage: codex app-server [OPTIONS] [COMMAND]
For more information, try '--help'.
Caused by:
0: failed to connect to /Users/owen/.codex/app-server-control/app-server-control.sock
1: No such file or directory (os error 2)
```
## What changed
- `codex remote-control` now runs remote control in the foreground and
prints a Ctrl-C stop hint.
- `codex remote-control start` starts the daemon and waits for remote
control readiness before reporting success.
- `codex remote-control stop` reports stopped/not-running status in
plain language.
- Startup failures now include recent managed app-server stderr to make
daemon issues easier to diagnose.
- Added coverage for CLI output, readiness waiting, foreground shutdown,
and stderr log tailing.
## Why
Recent `rust-ci-full` failures were dominated by transient Windows
timeout clusters in process-heavy tests such as `suite::resume`,
`suite::cli_stream`, `suite::auth_env`,
`start_thread_uses_all_default_environments_from_codex_home`, and
`connect_stdio_command_initializes_json_rpc_client_on_windows`.
The goal here is to make those known flaky paths less likely to fail
full CI without relaxing the global nextest timeout policy.
## What changed
- Enable one global nextest retry with `retries = 1` so a single
transient failure can recover.
- Add a `windows_process_heavy` test group with `max-threads = 2` for
the recurring Windows subprocess/session-heavy timeout families.
- Add Windows-only slow-timeout overrides for that process-heavy group.
- Add a narrower Windows-only timeout override for
`start_thread_uses_all_default_environments_from_codex_home`, which
still exceeded the broader Windows bucket in both Windows full-CI lanes.
- Increase the `rust-ci-full` nextest job timeout from `45m` to `60m` so
Windows ARM64 still has job-level headroom after retries and targeted
per-test timeout increases.
- Keep the global `slow-timeout` unchanged at `15s`.
## Validation
Validated through `rust-ci-full` GitHub Actions reruns on this PR.
Observed improvement on the tuned Windows lanes:
- Windows x64 went from `5 timed out` to `0 timed out`.
- Windows ARM64 went from `2 timed out` to `0 timed out`.
- `start_thread_uses_all_default_environments_from_codex_home` recovered
as a flaky pass on Windows ARM64 instead of timing out.
The remaining failing tests in those runs were unrelated hard failures
outside this nextest timeout tuning.
## Why
Extensions that need to track runtime progress currently have no typed
host signal for tool execution. The goal extension in particular needs
to observe tool attempts without inspecting tool payloads, owning tool
implementations, or staying coupled to core-only runtime plumbing.
This adds a narrow lifecycle contributor API for host-owned tool
execution: extensions can observe when an accepted tool call starts and
how it finishes, while policy hooks and tool handlers continue to own
payload rewriting, blocking, and execution.
Relevant code:
-
[`ToolLifecycleContributor`](3ad2850ffc/codex-rs/ext/extension-api/src/contributors.rs (L119))
defines the extension-facing observer contract.
-
[`tool_lifecycle.rs`](3ad2850ffc/codex-rs/ext/extension-api/src/contributors/tool_lifecycle.rs)
defines the typed start/finish inputs, source, and outcome enums.
- [`notify_tool_start` /
`notify_tool_finish`](3ad2850ffc/codex-rs/core/src/tools/lifecycle.rs)
bridges core tool dispatch into the extension registry.
## What Changed
- Added `ToolLifecycleContributor` to `codex-extension-api`, including:
- `ToolStartInput`
- `ToolFinishInput`
- `ToolCallSource`
- `ToolCallOutcome`
- Added registration and lookup support on `ExtensionRegistryBuilder` /
`ExtensionRegistry`.
- Wired core tool dispatch to notify lifecycle contributors for:
- accepted tool starts
- completed tool calls, including the tool output success marker
- pre-tool-use blocks
- failures before or after the handler runs
- cancellation/abort in the parallel tool path
- Registered the goal extension as a lifecycle contributor and added the
outcome filter it will use for goal progress accounting.
## Test Coverage
- Added `dispatch_notifies_tool_lifecycle_contributors` to cover
lifecycle notification ordering and outcomes for successful and
handler-failed tool calls.
## Why
Some tool providers, especially MCP servers and dynamic tool sources,
can supply schema nodes that omit `type` and have no recognized JSON
Schema shape hints. Previously, `sanitize_json_schema` filled those
unknown nodes in as `string`, which made the schema parseable but
invented a scalar constraint that the provider did not specify. For
description-only fields, that could incorrectly steer tool arguments
away from the provider's actual accepted shape.
The Responses API accepts permissive empty schemas such as `{}` at
nested property positions, so Codex should preserve that permissive
meaning instead of coercing unknown schema nodes into a misleading
scalar type.
## What Changed
- Changed the no-hints fallback in `codex-rs/tools/src/json_schema.rs`
to clear unrecognized object schema nodes to `{}`.
- Empty schemas now remain `{}` rather than becoming `type: "string"`.
- Description-only or otherwise metadata-only nested property schemas
now become `{}` while surrounding object/array/string/number inference
still applies when recognized hints are present.
- Updated `codex-tools` and `codex-core` tests to cover top-level empty
schemas, nested empty schemas, metadata-only malformed schemas, dynamic
tools, and MCP tool specs.
## Verification
- `cargo test -p codex-tools`
- `cargo test -p codex-core
test_mcp_tool_property_missing_type_defaults_to_empty_schema`
- Manually verified the real Responses API behavior for both
empty-schema positions:
- Top-level function `parameters: {}` is accepted and echoed back as
`{"type":"object","properties":{}}`; when forced to call the tool,
Responses emitted empty object arguments: `"arguments": "{}"`.
- Nested property schema `{}` is accepted and preserved as `{}`; when
forced to call a tool with `metadata.extra`, Responses emitted
`"arguments": "{\"metadata\":{\"extra\":\"codex schema sanitizer
behavior\"}}"`.
## Summary
- make `load_global_instructions` read through an `ExecutorFileSystem`
- call global AGENTS reads with explicit `LOCAL_FS` so they stay tied to
local codex-home state
## Validation
- `bazel test --bes_backend= --bes_results_url=
--test_filter=instruction_sources_include_global_before_agents_md_docs
//codex-rs/core:core-unit-tests` on `dev`
## Why
`experimentalFeature/list` reports effective feature enablement, but
currently does not resolve it against a working directory where
project-local config.toml files can exist and toggle on/off features
when merged into the effective config after resolving the various config
layers. That means we effectively (and incorrectly) ignore features set
in project-local config.
To address that, this PR exposes an optional `thread_id` param which
allows us to load the thread's `cwd.
## Testing
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server experimental_feature_list`
## Why
The session picker already supports typed search, but it ignored
bracketed paste events entirely. On macOS terminals this makes pasted
text look like a no-op on the resume screen, which is especially
noticeable when a user wants to paste part of a thread name, branch, or
path into the search field.
## What Changed
- route `TuiEvent::Paste(String)` into the session picker instead of
dropping it
- normalize pasted search text into a single-line query by collapsing
whitespace
- ignore whitespace-only pastes
- reuse the existing `set_query(...)` path so pasted searches keep the
same filtering and pagination behavior as typed input
- add focused tests for append behavior, whitespace normalization,
whitespace-only paste, and the existing search-loading path
This PR is stacked on top of #23234 and contains only the net change
relative to `etraut/clarify-resume-hints`.
## How to Test
1. Start Codex in a terminal that emits bracketed paste, for example
iTerm2 on macOS.
2. Open the resume picker so the search UI is visible.
3. Copy a term that should match one of the visible sessions, then paste
it into the picker.
4. Confirm the query updates immediately and the list filters as if the
text had been typed.
5. Also verify that pasting text with newlines or tabs still produces a
usable single-line search query.
6. Also verify that normal typed search still works and that `Esc` still
clears the query / exits as before.
Targeted tests:
- `cargo test -p codex-tui`
---------
Co-authored-by: Eric Traut <etraut@openai.com>