## Why
Extensions can currently observe thread start, resume, and stop, but
they do not have a lifecycle point for the host to say that immediately
pending thread work has drained. That makes idle follow-up behavior
harder to express as extension-owned logic instead of host-specific
plumbing.
This adds an explicit idle lifecycle hook so an extension can react when
a thread becomes idle while the host keeps ownership of whether any
submitted follow-up input starts a turn, is queued, or is ignored.
## What changed
- Added `ThreadIdleInput` with access to the session-scoped and
thread-scoped extension stores.
- Added a default `on_thread_idle` method to
`ThreadLifecycleContributor`.
- Re-exported `ThreadIdleInput` from the extension API surface.
## Testing
Not run; this only extends the extension API trait surface with a
default hook and exported input type.
## Summary
- Add the missing additional_context field to the guardian review
Op::UserInput test initializer.
## Test plan
- just fmt
- just test -p codex-core guardian_review
- just test -p codex-core (compiles, then fails on local environment
issues: sandbox-exec Operation not permitted, missing test_stdio_server
helper binary, and unrelated timeouts)
## Why
The extracted goal runtime needs a host-callable path for turns that
stop because the workspace usage limit is reached. In that case, any
in-turn goal progress should be accounted before the goal becomes
terminal, and active goal accounting must be cleared so later
tool-finish or turn-stop handling does not keep charging usage to a
stopped goal.
## What changed
- Adds `GoalRuntimeHandle::usage_limit_active_goal_for_turn`, which
accounts current active-goal progress, marks the active or
budget-limited thread goal as `UsageLimited`, records terminal metrics
when the status changes, clears active goal accounting, and emits the
updated goal event.
- Covers both active and budget-limited goals in
`ext/goal/tests/goal_extension_backend.rs`, including the invariant that
later token/tool events do not add usage after the goal has been
usage-limited.
## Testing
- Added
`usage_limit_active_goal_accounts_progress_and_clears_accounting`.
- Added `usage_limit_budget_limited_goal_accounts_remaining_progress`.
This reverts commit 5381240f57. Gov cloud
should not be supported
# External (non-OpenAI) Pull Request Requirements
External code contributions are by invitation only. Please read the
dedicated "Contributing" markdown file for details:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
# External (non-OpenAI) Pull Request Requirements
External code contributions are by invitation only. Please read the
dedicated "Contributing" markdown file for details:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
Clear inherited legacy `notify` from Guardian review session config,
since we should not be passing auto review threads into `notify`
targets. Keeps legacy notify payload and hook runtime behavior unchanged
for normal user turns.
## Testing
- [x] add a Guardian config regression and dedicated Guardian
integration test so review sessions cannot inherit parent notify hooks
# Summary
The Codex standalone installers can pause after installation to ask
about an older managed install or launching Codex. That makes unattended
bootstrap and update flows hard to complete reliably.
This PR adds noninteractive installer control on macOS/Linux and Windows
through `CODEX_NON_INTERACTIVE=1`. Noninteractive operation is
environment-only, which gives automated callers one stable way to
suppress prompts. When a noninteractive install leaves an older npm,
bun, or brew-managed Codex installed, the standalone bin is configured
ahead of that command on `PATH` so the newly installed Codex is the one
future launches select. It also supports `CODEX_RELEASE` for callers
that select a release through environment variables while retaining the
existing explicit release inputs. Release selection accepts `latest`,
stable `x.y.z` versions, and Codex prereleases written as
`rust-v0.134.0-alpha.3`, `v0.134.0-alpha.3`, or `0.134.0-alpha.3`; it
validates that shape before constructing release requests.
# Stack
1. [#21567](https://github.com/openai/codex/pull/21567) - Adds release
and noninteractive environment controls to the installers. (current)
2. [#24637](https://github.com/openai/codex/pull/24637) - Runs
standalone updater installs with `CODEX_NON_INTERACTIVE=1`.
3. [#24639](https://github.com/openai/codex/pull/24639) - Removes
explicit release argument inputs in favor of `CODEX_RELEASE`.
# Evidence
| Before | After |
| --- | --- |
| 
| 
|
Environment-controlled macOS install with an existing npm-managed Codex
on `PATH`:
https://github.com/user-attachments/assets/442e0b5b-4a32-4bf5-996b-68784777380d
# Design decisions
Windows installs using the older standalone bin layout still require an
interactive migration confirmation. Noninteractive mode does not
auto-migrate that existing directory because replacing it is a
destructive transition for an early, limited-use layout; unattended
installs on that layout fail with an instruction to rerun interactively.
# Testing
Tests: installer syntax validation, release-selector acceptance and
rejection coverage including PowerShell `Latest` compatibility, macOS
live-terminal installer smoke testing with environment-controlled stable
and prerelease installation and competing PATH precedence, shell
rejection of the omitted noninteractive flag, and Windows ARM64
PowerShell smoke testing with environment-only noninteractive behavior,
retained release input, and competing PATH precedence through Parallels.
## Summary
- Bump the workspace Rust toolchain from `1.93.0` to `1.95.0` across
Cargo, Bazel, CI, release workflows, devcontainers, and the Codex
environment config.
- Refresh `MODULE.bazel.lock` so the Bazel Rust toolchain artifacts
match the new version.
- Leave purpose-specific toolchains unchanged, including the
`argument-comment-lint` nightly and the upstream `rusty_v8` `1.91.0`
build pin.
- Includes fixes for new lints from `just fix` and a few codex-authored
fixes for lints without a suggestion.
## Why
When a turn needs a follow-up request after tool output is recorded,
Codex can still appear stuck in `Thinking` before the next `/responses`
request is opened. The existing local trace showed the last completed
response and the absence of a new backend request, but it did not show
whether the stall was in tool-router preparation or later request setup.
Issue: N/A (internal incident investigation)
## What Changed
Added trace spans around the pre-stream tool-router handoff in
`core/src/session/turn.rs`, including the `built_tools` phase and the
MCP manager read lock.
Added per-server MCP tool-listing spans and trace breadcrumbs in
`codex-mcp/src/connection_manager.rs` with startup snapshot /
startup-complete state so a pending MCP client is visible in feedback
logs instead of looking like a silent hang.
## Verification
- `just fmt`
- `just test -p codex-mcp`
- `just test -p codex-core` (prior full rerun fails in this workspace on
unrelated integration tests: code-mode output length expectations, one
shell timeout formatting assertion, and shell snapshot timeouts; latest
review-fix rerun compiled and passed 1160 tests before I stopped the
abnormally slow unrelated suite)
add new `parse_tool_input_schema_without_compaction` to bypass the
existing compaction/trimming of client-provided tool schemas that are
over 4k bytes.
we want this for standalone web search to keep field guidance/metadata
on certain fields; this keeps us closer to parity with existing hosted
tool schema (which didnt go through this 4k byte filter).
## Why
`continuation_turn_id` was introduced to distinguish synthetic goal
continuation turns for the no-tool continuation suppression heuristic.
#20523 removed that heuristic, but left the marker behind. It is still
written and cleared without affecting any runtime decision.
## What Changed
- Remove `GoalRuntimeState::continuation_turn_id`.
- Remove the marker setter/clearer and their now-no-op start, finish,
and abort call sites.
## Testing
- Not run yet (deferred at request).
## Why
- Runtime analytics events report `thread_id`, which identifies the
individual thread emitting an event
- They don't report `session_id`, which identifies the shared session
for a root thread and its subagent threads
- Emitting both identifiers allows analytics to group related activity
## What Changed
- Adds `session_id` to relevant analytics events (thread_initalized,
turn, turn_steer, compaction, guardian_review)
- Tracks each thread's session ID in the analytics reducer so subsequent
thread scoped events emit the same value
- Carries the shared session ID through subagent initialization
## Verification
- `just test -p codex-analytics` validates event payloads and subagent
session grouping.
- Focused `codex-app-server` tests validate session IDs for thread,
turn, and steer events.
- Focused `codex-core` tests validate root and subagent session ID
propagation.
## Why
Older persisted rollouts can contain `input_image.detail` values of
`auto` or `low` from before `ImageDetail` was narrowed to
`high`/`original`. Current deserialization rejects those values, which
can make resume skip later compacted checkpoints and reconstruct an
oversized raw suffix before the next compaction attempt.
Confirmed Sentry reports fixed by this compatibility path:
- [CODEX-1H3F](https://openai.sentry.io/issues/7500642496/)
- [CODEX-1H6N](https://openai.sentry.io/issues/7501025347/)
- [CODEX-1JDP](https://openai.sentry.io/issues/7504549065/)
- [CODEX-1HW6](https://openai.sentry.io/issues/7503407986/)
## Background
[openai/codex#20693](https://github.com/openai/codex/pull/20693) added
image-detail plumbing for app-server `UserInput` so input images could
explicitly request `detail: original`. The Slack discussion behind that
PR was about ScreenSpot / bridge evals where user input images were
resized, while tool output images already had MCP/code-mode ways to
request image detail.
In review, the intended new API surface was narrowed to `high` and
`original`: default to `high`, allow `original` when callers need
unchanged image handling, and avoid encouraging new `auto` or `low`
usage. That policy still makes sense for newly emitted values.
The missing compatibility piece is persisted history. Older rollouts can
already contain `auto` and `low`, and resume reconstructs typed history
by deserializing those rollout records. Rejecting old values at that
boundary causes valid compacted checkpoints to be skipped. This PR
restores `auto` and `low` as real variants so old records deserialize
and round-trip without being rewritten as `high`, while product paths
can continue to default to `high` and avoid emitting `auto` for new
behavior.
## What changed
- Restored `ImageDetail::Auto` and `ImageDetail::Low` as first-class
protocol values.
- Preserved `auto`/`low` through rollout deserialization, MCP image
metadata, code-mode image output, and schema/type generation.
- Kept local image byte handling conservative: only `original` switches
to original-resolution loading; `auto`/`low`/`high` continue through the
resize-to-fit path while retaining their detail value.
- Added regression coverage for enum round-tripping and code-mode `low`
detail handling.
## Testing
- `just write-app-server-schema`
- `just test -p codex-protocol`
- `just test -p codex-tools`
- `just test -p codex-code-mode`
- `just test -p codex-app-server-protocol`
- `just test -p codex-core
suite::rmcp_client::stdio_image_responses_preserve_original_detail_metadata`
- `just test -p codex-core
suite::code_mode::code_mode_can_use_mcp_image_result_with_image_helper`
- Loaded broken rollouts on local fixed builds, and started/completed
new turns.
I also attempted `just test -p codex-core`; the local broad run did not
finish green: 2559 tests run, 2467 passed, 55 flaky, 91 failed, 1 timed
out. The failures were broad timeout/deadline failures across unrelated
areas; targeted changed-path core tests above passed.
## Why
Windows sandbox diagnostics are currently hard to recover from
`/feedback` even though they are often the most useful artifact when
debugging sandbox behavior. Now that sandbox logging uses daily rolling
files, feedback can safely include the current day's sandbox log without
uploading the old ever-growing legacy `sandbox.log`.
## What changed
- Add a `codex-windows-sandbox` helper that resolves the current daily
sandbox log from `codex_home`.
- When feedback is submitted with logs enabled on Windows, app-server
attaches today's sandbox log if it exists.
- Upload the attachment under the stable filename `windows-sandbox.log`,
independent of the dated on-disk filename.
- Keep existing raw `extra_log_files` behavior unchanged for rollout and
desktop log attachments.
## Verification
- `cargo fmt -p codex-app-server -p codex-windows-sandbox`
- `cargo test -p codex-windows-sandbox
current_log_file_path_for_codex_home_uses_sandbox_dir`
- `cargo test -p codex-app-server
windows_sandbox_log_attachment_uses_current_log`
- Manual CLI/TUI `/feedback` test confirmed Sentry received
`windows-sandbox.log`.
## Why
Remote image submissions currently wrap native `input_image` spans in
literal `<image>` and `</image>` text spans. Those extra prompt tokens
add structure without providing label or routing information.
## What Changed
- Serialize `UserInput::Image` directly as an `input_image` content
span.
- Preserve named local-image framing and legacy wrapper parsing for
labeled attachments and existing histories.
- Update existing request-shape expectations for drag-and-drop images,
model switching, and compaction.
## Validation
- `just test -p codex-protocol`
- Focused `codex-core` run covering
`drag_drop_image_persists_rollout_request_shape`,
`model_change_from_image_to_text_strips_prior_image_content`, and
`snapshot_request_shape_pre_turn_compaction_including_incoming_user_message`
## Notes
- A broader `just test -p codex-core` run was attempted; the affected
tests passed, while the overall run failed in unrelated CLI, MCP, and
tooling tests plus a `thread_manager` timeout.
## Why
The Windows sandbox runner still carried the old `SandboxPolicy`
compatibility path even though core now computes `PermissionProfile`.
That meant Windows command-runner execution could only see the legacy
projection, so profile-only filesystem rules such as deny globs were not
part of the runner input.
## What Changed
- Removed the Windows-local `SandboxPolicy` parser/export and deleted
`windows-sandbox-rs/src/policy.rs`.
- Changed restricted-token capture/session setup, elevated setup,
world-writable audit, read-root grant, and command-runner session APIs
to accept `PermissionProfile` plus the profile cwd.
- Bumped the elevated command-runner IPC protocol to version 2 because
`SpawnRequest` now carries `permission_profile` /
`permission_profile_cwd` instead of the legacy `policy_json_or_preset` /
`sandbox_policy_cwd` fields.
- Updated core exec, unified exec, debug-sandbox, TUI setup/grant flows,
and app-server setup to pass the actual effective `PermissionProfile`.
- Left regression coverage asserting the old IPC policy fields are
absent and the runner serializes tagged `PermissionProfile` JSON.
## Verification
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core windows_sandbox`
- `cargo test -p codex-app-server
request_processors::windows_sandbox_processor`
- `just fix -p codex-windows-sandbox -p codex-core -p codex-app-server
-p codex-cli -p codex-tui`
- `just fix -p codex-cli -p codex-tui`
- `just fix -p codex-windows-sandbox -p codex-tui`
- `rg "\\bSandboxPolicy\\b" codex-rs/windows-sandbox-rs` returned no
matches.
Note: `cargo test -p codex-cli` was attempted but did not reach crate
tests because local disk filled while compiling dependencies (`No space
left on device`). The targeted clippy pass compiled the affected CLI/TUI
surfaces afterward.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/23813).
* #24108
* __->__ #23813
Fixes#24249.
## Why
Codex already supports discovering marketplaces under both
`.agents/plugins/marketplace.json` and
`.claude-plugin/marketplace.json`. The Git marketplace auto-upgrade
no-op check only looked for the `.agents` layout. That meant an
installed `.claude-plugin` marketplace with matching revision metadata
still looked absent, so plugin list/startup upgrade work could stage and
re-activate the same marketplace again.
That matches the failure shape in #24249: the report called out repeated
marketplace sync/cache refresh logs and a large recently-touched
`.tmp/marketplaces/.staging` directory. This change makes the
auto-upgrade path recognize the installed `.claude-plugin` marketplace
as already current, which should remove that staging/activation feedback
loop.
## What changed
`codex-rs/core-plugins/src/marketplace_upgrade.rs` now uses the existing
supported marketplace manifest discovery helper when deciding whether an
installed Git marketplace is already current. Existing local plugin
source validation is unchanged; `source: "./"` still remains invalid.
## Confidence
Confidence is high that this fixes the repeated marketplace upgrade
path: the old hardcoded layout check was definitely wrong for installed
`.claude-plugin` marketplaces, and the reported staging churn points
directly at that path.
Confidence is not 100% because we do not have a CPU profile or a fully
re-run reporter repro. A malformed marketplace entry can still be logged
as invalid if another caller repeatedly lists plugins; this PR fixes the
staging/upgrade feedback loop that likely made the failure pathological,
not every possible source of repeated marketplace resolution.
## Summary
TUI plugin mention refresh still joined app-server plugin inventory with
client-local plugin config, which can diverge once plugin state is owned
by the app server.
This changes the TUI to mirror the GUI client: `plugin/list` is the
autocomplete source, and mention candidates are plugin-level entries
filtered to installed, enabled, and not disabled by admin. The TUI no
longer reads local plugin config or calls `plugin/read` while refreshing
plugin mention candidates.
## API shape and limitations
The current app-server API does not expose effective per-session plugin
capability summaries for mention autocomplete. As in the GUI,
autocomplete now trusts `plugin/list` metadata rather than proving which
plugin capabilities are loaded in the active session.
That avoids stale client-local reads and the cwd/remote detail gaps in
`plugin/read`, but intentionally accepts the same list-level tradeoff as
the app: if `plugin/list` reports a remote plugin before its local
bundle is materialized, the plugin can still appear as a mention
candidate.
## Why
When Codex calls responsesapi, we currently send `session_id`,
`thread_id`, and `turn_id` among other things as
`client_metadata["x-codex-turn-metadata"]`. This PR adds
`forked_from_thread_id` which helps explain the "lineage" of a forked
thread.
## What's changed
- Track the immediate history source copied into a forked thread through
thread/session creation, including subagent and review turn metadata
paths.
- Include `forked_from_thread_id` in Codex turn metadata while
preventing turn-scoped Responses API client metadata from overwriting
Codex-owned lineage fields.
- Add coverage for fork lineage in turn metadata and the app-server
Responses API request path.
Fixes#24186.
## Why
When the TUI resumes a thread through the local app-server daemon with a
selected workspace, `thread/resume` can hit an already-loaded but idle
cached thread. That path previously rejoined the cached `CodexThread`,
so cwd/config overrides in `ThreadResumeParams` were ignored and the
resumed session kept using the old cwd.
## What changed
App-server now treats a loaded-but-idle thread with no subscribers as a
cache entry when resume overrides differ: it unloads that cached thread
and lets the normal resume path rebuild it with the requested
cwd/config. Threads that still have subscribers, or active runtime work,
continue to rejoin the existing loaded thread so in-flight state remains
observable.
The existing thread teardown helper was generalized from
archive-specific cleanup to shared unload cleanup for this path.
## Why
When the app-server remote-control websocket path stalls during
connection setup or teardown, the existing logs do not show where the
task stopped, and several awaits can keep the task from returning
promptly. That makes offline or stale-host incidents hard to distinguish
from expected shutdown or disable flow.
Issue: N/A (internal incident investigation)
## What Changed
Added structured lifecycle and status logging around remote-control
enable/disable requests, websocket task startup and exit, connection
cycles, enrollment context, and status/environment transitions.
Bound websocket connect, transport-event forwarding, and
connection-worker shutdown waits. On timeout, the code logs the stalled
operation and stops or aborts workers so the loop can reconnect or exit
instead of waiting indefinitely. Ping sends now also observe shutdown
cancellation.
## Summary
Adds experimental `additionalContext` support to `turn/start` and
`turn/steer` so clients can provide ephemeral external context, such as
browser or automation state, without turning that plumbing into a
visible user prompt or triggering user-prompt lifecycle behavior.
## API Shape
The parameter shape is:
```ts
additionalContext?: Record<string, {
value: string
kind: "untrusted" | "application"
}> | null
```
Example:
```json
{
"additionalContext": {
"browser_info": {
"value": "Active tab is CI failures.",
"kind": "untrusted"
},
"automation_info": {
"value": "CI rerun is in progress.",
"kind": "application"
}
}
}
```
The keys are opaque and caller-defined.
## Context Injection
When provided, accepted entries are inserted into model context as
hidden contextual message items, not as visible thread user-message
items.
`kind: "untrusted"` entries are inserted with role `user`:
```text
<external_${key}>${value}</external_${key}>
```
`kind: "application"` entries are inserted with role `developer`:
```text
<${key}>${value}</${key}>
```
Values are not escaped. Each value is truncated to 1k approximate tokens
before wrapping.
For `turn/start`, accepted additional context is inserted before normal
user input. For `turn/steer`, additional context is merged only when the
steer includes non-empty user input; context-only steers still reject as
empty input.
## Dedupe Strategy
`AdditionalContextStore` lives on session state and stores the latest
complete additional-context map.
Each `turn/start` or non-empty `turn/steer` treats its
`additionalContext` as the current complete set of values. Entries are
injected only when the key is new or the exact entry for that key
changed, including `value` or `kind`. After merging, the store is
replaced with the provided map, so omitted keys are removed from the
retained set and can be injected again later if reintroduced.
Omitting `additionalContext`, passing `null`, or passing an empty object
resets the store to empty and injects nothing.
## What Changed
- Threads experimental v2 `additionalContext` through app-server into
core turn start and steer handling.
- Adds separate contextual fragment types for untrusted user-role
context and application developer-role context.
- Uses pending response input items so additional context can be
combined with normal user input without treating it as prompt text.
- Adds integration coverage for start/steer flow, role routing,
dedupe/reset behavior, deletion/re-add behavior, hook-blocked input
behavior, empty context-only steer rejection, external-fragment marker
matching, and truncation.
## Summary
Fix the TUI `$` app mention paths so App Directory rows that are not
accessible are not treated as usable apps.
This includes the core preservation fix from #24104, but expands it to
the other app mention paths:
- preserve app-server `is_accessible` flags when partial
`app/list/updated` snapshots reach the TUI
- require apps to be both accessible and enabled when resolving exact
`$slug` mentions
- require restored/stale `app://...` bindings to point at accessible,
enabled apps before emitting structured app mentions
- remove the now-unused `codex-chatgpt` dependency from `codex-tui`,
which addresses the `cargo shear` failure seen on #24104
## Root Cause
The app server already sends merged app snapshots with accessibility
computed. The TUI handled app-server app list updates as partial app
loads and re-ran the old accessible-app merge path. That path treated
every notification row as accessible, so App Directory entries with
`isAccessible=false` could appear in `$` suggestions.
Regression source: #22914 routed app-list updates through the app server
while reusing the old TUI partial-load handling. Related precursor:
#14717 introduced the partial-load path, but #22914 made it user-visible
for app-server updates.
## Issues
Fixes#24145Fixes#24205Fixes#24319
## Validation
- `just fmt`
- `git diff --check`
- `just bazel-lock-update`
- `just bazel-lock-check`
- `just argument-comment-lint -p codex-tui`
- `just test -p codex-tui
chatwidget::tests::popups_and_settings::apps_notification_update_excludes_inaccessible_apps_from_mentions
chatwidget::tests::composer_submission::submit_user_message_ignores_inaccessible_app_mentions_from_bindings
chatwidget::skills::tests::find_app_mentions_requires_accessible_enabled_apps_for_bound_paths
chatwidget::skills::tests::find_app_mentions_requires_accessible_enabled_apps_for_slugs`
## Why
Raw output mode intentionally sends logical source lines to the terminal
without Codex-inserted wrapping so copied content retains its original
line structure. In Zellij, soft-wrapped continuation rows from those raw
lines are not confined by the inline history scroll region. When raw
mode replays a long transcript, continuation rows can occupy the
composer viewport and are overwritten on the following draw, leaving the
transcript visibly truncated underneath the composer.
This is specific to the combination of Zellij and raw terminal-wrapped
history. Rich output and non-Zellij terminals should continue using the
existing insertion behavior.
Related context: #20819 introduced raw output mode, and #22214 removed
the broad Zellij insertion workaround after the standard rich-output
path no longer required it.
| Before | After |
|---|---|
| <img width="1728" height="916" alt="image"
src="https://github.com/user-attachments/assets/f85398a5-e930-46d9-bcfd-106a24c41466"
/> | <img width="1723" height="912" alt="image"
src="https://github.com/user-attachments/assets/5c62e16a-a6e5-4842-bcb2-eab163cda04c"
/> |
## What Changed
- Cache Zellij detection in `Tui` and select a dedicated insertion mode
only for `HistoryLineWrapPolicy::Terminal` batches in Zellij.
- For that guarded path, clear the existing viewport, append raw source
lines through the terminal so its soft wrapping remains
selection-friendly, and reserve empty viewport rows before redrawing the
composer.
- Add snapshot regressions for both an incremental soft-wrapped raw
insert and an overflowing raw transcript replay that starts at the top
of the cleared terminal.
## How to Test
1. Start Codex inside Zellij with raw output enabled or toggle raw
output after a multiline response is in history.
2. Produce or replay output containing long logical lines, such as a
fenced shell command with several wrapped lines.
3. Confirm the wrapped history remains visible above the composer and
the composer no longer overwrites the end of the response.
4. Toggle back to rich output or run outside Zellij and confirm standard
history rendering still behaves normally.
Targeted tests run:
- `just test -p codex-tui vt100_zellij_raw -- --nocapture`
Additional validation notes:
- `just test -p codex-tui` was attempted; the two new Zellij raw
insertion tests passed, while two existing
`app::tests::update_feature_flags_disabling_guardian_*` tests failed
outside this history insertion path.
- `just argument-comment-lint` was attempted but local Bazel analysis
fails before reaching the changed source because the LLVM `compiler-rt`
package is missing `include/sanitizer/*.h`. Modified literal callsites
were inspected manually.
## Summary
Add the extension-backed standalone `web.run` tool so Codex can call the
standalone search endpoint through the `codex-api` search client and
return its encrypted output to Responses.
- gate the new tool behind `standalone_web_search`
- install the extension in the app-server thread registry and hide
hosted `web_search` when standalone search is enabled for OpenAI
providers so the two paths stay mutually exclusive
- build search context from persisted history using a small tail
heuristic: previous user message, assistant text between the last two
user turns capped at about 1k tokens, and current user message
## Test Plan
- `cargo test -p codex-web-search-extension`
- `cargo test -p codex-api`
- `cargo test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`
## Summary
Generated memory rows and their stage-one/stage-two job state currently
live in `state_5.sqlite` alongside thread metadata. That makes memory
cleanup and regeneration share the main state schema even though those
rows are memory-pipeline data and can be rebuilt independently from the
durable thread records.
This PR moves the memory-owned tables into a dedicated
`memories_1.sqlite` runtime database while keeping thread metadata in
`state_5.sqlite`.
## Changes
- Adds a separate memories DB runtime, migrator, path helpers, telemetry
kind, and Bazel compile data for `state/memory_migrations`.
- Introduces `MemoryStore` behind `StateRuntime::memories()` and moves
memory table/job operations onto that store.
- Drops the old memory tables from the state DB and recreates their
schema in `state/memory_migrations/0001_memories.sql`.
- Updates memory startup, citation usage tracking, rollout pollution
handling, `debug clear-memories`, and app-server `memory/reset` to
operate through the memories DB.
- Preserves cross-DB behavior by hydrating thread metadata from the
state DB when selecting visible memory outputs and checking stage-one
staleness.
## Verification
- Added/updated `codex-state` tests for deleted-thread memory visibility
and already-polluted phase-two enqueue behavior.
- Updated `debug clear-memories`, app-server `memory/reset`, and
memories startup tests to seed and assert memory rows through
`memories_1.sqlite`.
## Why
Goal idle accounting is supposed to survive a thread resume. Previously,
the resume hook restored the active goal state inline from the extension
lifecycle contributor, which left the runtime handle without a reusable
restoration path and made the behavior hard to cover directly. When a
thread with an active goal was resumed, goal accounting could lose track
of the active idle goal instead of continuing to accrue elapsed time.
## What changed
- Moved thread-resume restoration into
`GoalRuntimeHandle::restore_after_resume()` so the runtime owns
rehydrating active goal accounting from persisted thread goal state.
- Kept disabled goal runtimes as a no-op and preserved the existing
warning path when persisted goal state cannot be loaded.
- Added a backend regression test that seeds an active goal, resumes the
thread, waits briefly, and verifies elapsed idle time is reflected on
the next external goal mutation.
## Testing
- Not run locally; this metadata update only rewrote the PR title/body.
## Why
Codex 0.131 started enabling tmux `modifyOtherKeys` mode 2 when the
active tmux session reported `extended-keys-format csi-u`, and also when
that format could not be queried. The fallback was meant to help
compatible tmux panes enter extended-key mode, but it breaks iTerm2
control-mode sessions on older tmux.
Issue #23711 reproduces with:
```bash
ssh -t ubuntu@192.168.68.149 'tmux -CC new -A -s main'
```
On tmux 3.2a, `extended-keys-format` is not available. With mode 2
enabled, `Ctrl-C` is delivered as `^[[27;5;99~` instead of the normal
interrupt/control key path, so Codex does not handle it. Running with
`CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT=1` restores `Ctrl-C`, which
points at keyboard mode setup rather than chat input routing.
## What Changed
- Only request `modifyOtherKeys` mode 2 when tmux explicitly reports
`extended-keys-format csi-u`.
- Treat an unknown or unavailable tmux extended-key format as
unsupported for this mode.
- Update the keyboard mode unit coverage so `None` no longer opts into
`modifyOtherKeys`.
This preserves the explicit modern tmux `csi-u` path from #21943 while
avoiding the unsafe fallback on older or unqueryable tmux setups.
## How to Test
Regression path from #23711:
1. Start iTerm2 tmux integration against an older tmux host:
```bash
ssh -t ubuntu@192.168.68.149 'tmux -CC new -A -s main'
```
2. Start patched Codex.
3. Run `/keymap debug`, press a regular key, then press `Ctrl-C`.
4. Confirm `Ctrl-C` closes the inspector and Codex remains responsive
without `CODEX_TUI_DISABLE_KEYBOARD_ENHANCEMENT=1`.
5. Confirm `Shift+Enter` still inserts a newline in the same session.
Modern tmux compatibility path:
1. Start an ordinary tmux 3.6a server with explicit `csi-u`:
```bash
tmux -L codex-csiu -f /dev/null new-session -d -s repro
tmux -L codex-csiu set-option -g extended-keys on
tmux -L codex-csiu set-option -g extended-keys-format csi-u
tmux -L codex-csiu attach -t repro
```
2. Start patched Codex.
3. From another terminal, confirm the Codex pane reports `mode=Ext 2`:
```bash
tmux -L codex-csiu list-panes -a -F '#{pane_id} mode=#{pane_key_mode}
cmd=#{pane_current_command}'
```
4. Type `one`, press `Shift+Enter`, type `two`, and confirm the composer
shows two lines without submitting.
5. Press `Ctrl-C` and confirm Codex handles it normally.
Targeted tests:
- `./tools/argument-comment-lint/run.py -p codex-tui -- --lib`
- `just test -p codex-tui` runs the new keyboard mode test successfully;
the full run currently reports two unrelated guardian feature-flag test
failures:
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
No documentation update is needed.
## Why
`core/src/goals.rs` already emits OTEL metrics for goal creation,
resume, terminal transitions, token counts, and duration. As `/goal`
moves into `ext/goal`, the extension needs to preserve that telemetry
contract instead of only emitting app-visible `ThreadGoalUpdated`
events.
This keeps the existing `codex.goal.*` metric surface intact while goal
lifecycle ownership shifts toward the extension.
## What changed
- Added an extension-local `GoalMetrics` helper that records the
existing `codex.goal.*` counters and histograms through `codex-otel`.
- Threaded an optional `MetricsClient` through `install_with_backend`,
`GoalExtension`, `GoalRuntimeHandle`, and `GoalToolExecutor`.
- Emitted created, resumed, and terminal goal metrics from the extension
paths that create goals, restore active goals on thread resume, account
budget limits, complete or block goals, and handle external goal
mutations.
- Updated existing goal extension test setup callsites to pass `None`
for metrics when instrumentation is not under test.
## Verification
Not run locally.
Recent composer cleanups split state ownership out of `ChatComposer`,
but slash-command handling still mixed parsing, popup coordination,
completion, submission validation, queue behavior, and argument element
rebasing into the main composer file. Pending changes to slash command
parsing and selection inspired this code move to prevent
`chat_composer.rs` bloat.
This is just a refactor, no functional or behavioral changes are
intended.
## What changed
- Move slash-command parsing and lookup helpers into
`bottom_pane/chat_composer/slash_input.rs`.
- Move slash popup key handling, command-name completion, and popup
construction into the slash input helper module.
- Centralize bare-command, inline-args, submission-validation, and
queued-input action selection behind slash-specific helpers.
- Move command argument text-element rebasing into the slash input
module so inline command submission keeps the same element behavior with
less composer-local logic.
## Verification
- `just fmt`
- `just test -p codex-tui`
- `cargo insta pending-snapshots -p codex-tui`
## Why
The
`approving_apply_patch_for_session_skips_future_prompts_for_same_file`
integration test writes `apply_patch_allow_session.txt` under the
process cwd while exercising outside-workspace patch approval behavior.
With `just test` now being the normal validation path, that file can be
left behind in the checkout when the test runs or fails, creating
confusing untracked state.
## What changed
- Registers the resolved `apply_patch_allow_session.txt` path with
`tempfile::TempPath` before the test removes and recreates it through
`apply_patch`.
- Preserves the existing outside-workspace path shape so the approval
behavior under test does not change.
- Lets `TempPath` remove the generated file when the test exits,
including panic paths.
## Verification
- `just test -p codex-core --test all
approving_apply_patch_for_session_skips_future_prompts_for_same_file`
## Why
`codex.task.compact` only distinguished `local` vs `remote`, which made
it hard to answer simple counter questions in Statsig. Manual `/compact`
and automatic compaction were collapsed together, and the legacy remote
path was also collapsed with `remote_compaction_v2`.
## What Changed
- route `codex.task.compact` through a shared helper in
`core/src/tasks/mod.rs`
- add a `manual=true|false` tag so manual and automatic compaction can
be counted separately
- split the remote tag into `remote` and `remote_v2`
- emit the metric from the inline auto-compaction path in
`core/src/session/turn.rs` as well as the manual `CompactTask` path in
`core/src/tasks/compact.rs`
- add focused unit coverage for the new tag shapes in
`core/src/tasks/mod_tests.rs`
## Verification
- added unit coverage in `core/src/tasks/mod_tests.rs` covering manual
`remote_v2` tags and automatic `local` tags
## Why
Users who opt into named permission profiles through
`default_permissions` or `[permissions.*]` should stay in named-profile
semantics when they open `/permissions`. The legacy picker rewrites
those users into anonymous preset state, which loses the active profile
identity and hides custom configured profiles.
## What changed
- Switch `/permissions` to a profile-aware picker when profile mode is
active.
- Show friendly built-in labels instead of raw `:` profile syntax.
- Include configured custom profiles and their descriptions in the
picker.
- Route selections through the split TUI profile-selection flow below
this PR.
- Add TUI snapshots and regression coverage for built-ins, custom
profiles, and conflicting legacy runtime overrides.
## Stack
1. [#22931](https://github.com/openai/codex/pull/22931):
runtime/session/network propagation for active permission profiles.
2. [#23708](https://github.com/openai/codex/pull/23708): TUI selection
plumbing and guardrail flow.
3. **This PR**: profile-aware `/permissions` menu and custom profile
display.
## UX impact
In profile mode, `/permissions` shows the same human-facing built-ins
users already know:
```text
Default
Auto-review
Full Access
Read Only
locked-down
web-enabled
```
Selecting `locked-down` keeps `active_permission_profile =
Some("locked-down")`; selecting a built-in keeps the friendly label
while switching to its named built-in profile.
## Screenshots
Live `$test-tui` smoke screenshots uploaded through GitHub attachments:
**Profile mode with built-ins and custom profiles**
<img width="832" alt="Profile mode permissions picker with custom
profiles"
src="https://github.com/user-attachments/assets/58b72431-418c-4839-9e39-575076db4c8f"
/>
**Legacy mode remains anonymous preset picker**
<img width="1232" alt="Legacy permissions picker"
src="https://github.com/user-attachments/assets/95f413ab-4cee-411c-9afb-92580a885c97"
/>
<img width="1296" height="906" alt="image"
src="https://github.com/user-attachments/assets/ea381a78-9904-4aa2-828f-b7f2e43f60f2"
/>
<img width="705" height="207" alt="Screenshot 2026-05-18 at 2 58 00 PM"
src="https://github.com/user-attachments/assets/2fa6dd71-0296-449e-a6de-a72d78a1cb70"
/>
## Validation
- `git diff --cached --check` before commit.
- Full test run skipped at the user request while pushing the split
stack.
## Why
The memories extension already has dedicated `list`, `read`, `search`,
and `add_ad_hoc_note` tools, but app-server registration was still
disabled. The memories app collaborator needs an explicit config switch
so those native extension tools can be exposed intentionally, without
making ordinary memory prompt usage automatically register the dedicated
tool surface.
## What changed
- Added `[memories].dedicated_tools`, defaulting to `false`, to
`MemoriesToml` / `MemoriesConfig`.
- Regenerated `core/config.schema.json` for the new setting.
- Registered the memories extension as a `ToolContributor`, while
keeping tool contribution gated on both memories being enabled and
`dedicated_tools = true`.
- Added tests for the disabled default, the enabled dedicated-tools
path, and installer registration.
## Verification
- `just test -p codex-config -p codex-memories-extension`
## Why
Fixes#24502.
`codex resume --include-non-interactive` should include sessions created
by `codex exec`, but the TUI was sending no `sourceKinds` filter to
`thread/list` for that mode. `thread/list` treats omitted or empty
`sourceKinds` as interactive-only (`cli`, `vscode`), so exec sessions
were still filtered out.
## What Changed
- Added a shared TUI `resume_source_kinds` helper so both resume lookup
paths always pass explicit `sourceKinds` to `thread/list`.
- Kept the default resume behavior scoped to `cli` and `vscode`.
- Made `--include-non-interactive` include `exec` and `appServer`
sessions, while continuing to exclude subagent and unknown sources.
## Verification
Added focused coverage for both affected TUI request builders:
- `latest_session_lookup_params_can_include_non_interactive_sources`
- `remote_thread_list_params_can_include_non_interactive_sources`
## Why
The `non_prefixed_mcp_tool_names` feature should be applied where MCP
tools become model-visible, not by remapping names later in core.
Keeping the decision in `McpConnectionManager` construction makes
`ToolInfo` the single shaped view that spec building, deferred tool
search, routing, and unavailable-tool placeholders can consume directly.
This also preserves the existing external behavior while the feature is
off, and keeps the feature-on behavior for code mode and hooks explicit
at the manager boundary.
## What Changed
- Add `McpToolNameMode` to `codex-mcp` and flow it through `McpConfig`
into `McpConnectionManager::new`.
- Normalize MCP `ToolInfo` names in the manager using either
legacy-prefixed namespaces or non-prefixed namespaces; the legacy path
adds `mcp__` without restoring the old trailing namespace suffix.
- Remove the core-side MCP name remapping path so specs, tool search,
session resolution, and unavailable-tool placeholder construction use
the manager-provided `ToolName` values directly.
- Keep code mode flattening on the `__` namespace separator.
- Preserve hook compatibility by giving non-prefixed MCP hook names
legacy `mcp__...` matcher aliases.
- Add/adjust integration and unit coverage for non-prefixed code-mode
behavior, hook matching with the feature on and off, and manager-level
legacy prefixing.
## Testing
- `cargo test -p codex-mcp --lib`
- `cargo test -p codex-core --lib tools::spec::tests -- --nocapture`
- `cargo test -p codex-core --lib mcp_tools -- --nocapture`
- `cargo test -p codex-core --lib mcp_tool_exposure -- --nocapture`
- `cargo test -p codex-core --test all mcp_tool -- --nocapture`
- `cargo test -p codex-core --test all search_tool -- --nocapture`
- `cargo test -p codex-core --test all hooks_mcp -- --nocapture`
- `cargo test -p codex-core --test all
code_mode_uses_non_prefixed_mcp_tool_names_when_feature_enabled --
--nocapture`
- `cargo test -p codex-tools`
- `cargo test -p codex-features`
## Why
`ActiveTurn` already runs at most one task: starting a task requires
that no task is present, and replacement aborts existing work first.
Representing that state as an `IndexMap` leaves a multi-task shape for a
single-task invariant and makes each lifecycle lookup operate like a
collection lookup.
The slot remains optional because goal continuation uses an empty active
turn as a reservation while deciding whether to start continuation work.
## What changed
- Replace `ActiveTurn.tasks` with `task: Option<RunningTask>`.
- Update task abort/completion, session lookup and steering, input-queue
matching, goal reservation, and network-approval lookup to operate on
the singular slot.
- Mutate the singular task slot directly instead of retaining
collection-era add/remove/take helpers.
- Record token usage on the completing active task span without a
regular-task-only opt-in flag.
## Validation
- `cargo test -p codex-core --lib session::tests::steer_input`
- `cargo test -p codex-core --lib
session::tests::abort_empty_active_turn_preserves_pending_input`
- `cargo test -p codex-core --lib
session::tests::queued_response_items_for_next_turn_move_into_next_active_turn`
- `cargo test -p codex-core --lib
session::tests::active_goal_continuation_runs_again_after_no_tool_turn`
- `cargo test -p codex-core --lib
session::tests::abort_regular_task_emits_turn_aborted_only`
- `cargo test -p codex-core --lib session::input_queue::tests`
## Summary
`/mcp` in the TUI should reflect the current loaded thread, including
project-local MCP servers from that thread config. Before this change,
`mcpServerStatus/list` only read the latest global MCP config, so the
active chat could miss project-local servers.
This adds optional `threadId` to `mcpServerStatus/list`. When present,
app-server resolves the loaded thread and lists MCP status from the
refreshed effective config for that thread; when omitted, existing
global config behavior stays unchanged.
The TUI now sends the active chat thread id for `/mcp` and `/mcp
verbose`, carries that origin through the async inventory result, and
ignores stale completions if the user has switched threads before the
fetch returns. The app-server schemas were regenerated.
## Follow-up
Once this app-server API change lands, the desktop app should make the
same `threadId` plumbing so its MCP inventory also uses the current
thread config.
Fixes#23874
## Why
The goal extension already emits `ThreadGoalUpdated` events, but
production app-server thread extensions were built with the default
no-op extension event sink. That meant extension-driven goal updates
could be produced without ever reaching app-server clients.
## What changed
- Build app-server thread extensions with a host-provided
`ExtensionEventSink`.
- Add an app-server sink that converts extension `ThreadGoalUpdated`
events into `ServerNotification::ThreadGoalUpdated` broadcasts.
- Use the existing bounded outgoing message channel via `try_send` so
event forwarding cannot create an unbounded queue.
- Pass `NoopExtensionEventSink` in app-server tests that construct a
`ThreadManager` without an app-server host.
- Refresh `Cargo.lock` for the existing `codex-memories-extension`
`codex-otel` dependency.
## Verification
- `just test -p codex-app-server
extensions::tests::app_server_event_sink_forwards_thread_goal_updates`
## Why
The memories extension now receives a metrics exporter, but the useful
extension-owned signal is the memory tool call itself: which operation
ran, which memory area it touched, whether the backend call succeeded,
and whether the result was truncated.
## What changed
- Added the `codex.memories.tool.call` counter in
`ext/memories/src/metrics.rs`.
- Emit that counter from `memories/add_ad_hoc_note`, `memories/list`,
`memories/read`, and `memories/search` after backend execution.
- Tag each call with `tool`, `operation`, `scope`, `status`, and
`truncated`.
- Pass the existing `MetricsClient` through the memories extension into
the tool executors; tests use `None`.
## Verification
- `just test -p codex-memories-extension`
## Summary
- let the memories extension capture the process-global OTEL metrics
client at install time
- keep app-server/TUI/exec extension construction APIs unchanged
- store the metrics client for future memory metrics without emitting
any metrics yet
## Test plan
- `just fmt`
- `just bazel-lock-update`
- `just bazel-lock-check`
- Not run: tests/clippy per request; CI will cover them
## Why
Codex memory updates currently rely on instructions that tell agents to
create ad-hoc note files directly in the memory workspace. The memories
extension already has a `MemoriesBackend` abstraction for local storage
and future non-filesystem backends, so the ad-hoc note writer should
live behind that same interface instead of baking local filesystem
assumptions into the tool shape.
## What
- Adds a `memories/add_ad_hoc_note` tool to the existing memories tool
bundle.
- Extends `MemoriesBackend` with `add_ad_hoc_note` plus request/response
types so remote memory stores can implement the same operation later.
- Implements the local backend by creating append-only notes under
`extensions/ad_hoc/notes`.
- Validates the tool-provided filename contract
(`YYYY-MM-DDTHH-MM-SS-<slug>.md`), rejects path-like filenames, rejects
empty notes, and uses create-new semantics so existing notes are never
overwritten.
- Keeps memories tool contribution behind the existing commented-out
registration path; this defines the tool surface without newly exposing
it through app-server.
## Test Plan
- `just test -p codex-memories-extension`