Commit Graph

583 Commits

Author SHA1 Message Date
Ahmed Ibrahim
9d579813bb 1- Add model service tiers metadata (#20969)
## Why

The model list needs to carry display-ready service tier metadata so
clients can render tier choices with stable IDs, names, and
descriptions. A raw speed-tier string list is not enough for richer UI
copy or future tier labels.

## What changed

- Added `ModelServiceTier` to shared model metadata with string `id`,
`name`, and `description` fields.
- Added `service_tiers` to `ModelInfo` and `ModelPreset`, preserving
empty defaults for older cached model payloads.
- Exposed `serviceTiers` on app-server v2 `Model` responses and threaded
it through TUI app-server model conversion.
- Marked legacy `additional_speed_tiers` / `additionalSpeedTiers`
metadata as deprecated in source and generated schema output.
- Regenerated app-server protocol JSON schema and TypeScript fixtures,
including `ModelServiceTier.ts`.

## Verification

- Ran `just write-app-server-schema`.
- Did not run local tests per repo instruction; relying on PR CI.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-05 09:51:18 +03:00
Tom
707e51bd8b codex: route metadata updates through ThreadStore (#20576)
- Route `thread/metadata/update` through
`ThreadStore::update_thread_metadata`.
- Add `LocalThreadStore` git metadata patch support for set, partial
update, and clear semantics.
- Add some unit tests for the new thread store code
- Remove a lot of dead code/tests!
2026-05-04 20:09:41 -07:00
Ruslan Nigmatullin
4950e7d8a6 [codex] Add unsandboxed process exec API (#19040)
## Why

App-server clients sometimes need argv-based local process execution
while sandbox policy is controlled outside Codex. Those environments can
reject sandbox-disabling paths before a command ever starts, even when
the caller intentionally wants unsandboxed execution.

This PR adds a distinct `process/*` API for that use case instead of
extending `command/exec` with another sandbox-disabling shape. Keeping
the new surface separate also makes the future removal of `command/exec`
simpler: clients that need explicit process lifecycle control can move
to the newer handle-based API without depending on `command/exec`
business logic.

## What changed

- Added v2 process lifecycle methods: `process/spawn`,
`process/writeStdin`, `process/resizePty`, and `process/kill`.
- Added process notifications: `process/outputDelta` for streamed
stdout/stderr chunks and `process/exited` for final exit status and
buffered output.
- Made `process/spawn` intentionally unsandboxed and omitted
sandbox-selection fields such as `sandboxPolicy` and
`permissionProfile`.
- Added client-supplied, connection-scoped `processHandle` values for
follow-up control requests and notification routing.
- Supported cwd, environment overrides, PTY mode and size, stdin
streaming, stdout/stderr streaming, per-stream output caps, and timeout
controls.
- Killed active process sessions when the originating app-server
connection closes.
- Wired the implementation through the modular `request_processors/`
app-server layout, with process-handle request serialization for
follow-up control calls.
- Updated generated JSON/TypeScript schema fixtures and documented the
new API in `codex-rs/app-server/README.md`.
- Added v2 app-server integration coverage in
`codex-rs/app-server/tests/suite/v2/process_exec.rs` for spawn
acknowledgement before exit, buffered output caps, and process
termination.

## Verification

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`

---------

Co-authored-by: Owen Lin <owen@openai.com>
2026-05-04 16:43:58 -07:00
xli-oai
a8db4af5c3 Remove remote plugin uninstall prefix gate (#20722)
## Summary

Remove the hardcoded remote plugin ID prefix allow-list from app-server
uninstall routing. IDs that do not parse as local `plugin@marketplace`
IDs now flow through the remote uninstall path, where the existing
remote ID safety validation still rejects empty IDs, spaces, slashes,
and other unsafe characters before URL/cache use.

## Why

Plugin-service owns the backend remote plugin ID contract. Codex should
not require remote IDs to start with the local hardcoded prefixes
`plugins~`, `plugins_`, `app_`, `asdk_app_`, or `connector_`, because
newer backend ID families could otherwise be rejected before
plugin-service sees the request.

## Validation

- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `just fix -p codex-app-server`
- `git diff --check`
2026-05-04 16:28:13 -07:00
kmeelu-oai
e7e6267ab3 Make realtime sideband startup async (#20715)
## Summary

Moves the WebRTC realtime sideband websocket join out of the voice start
critical path. Call creation still posts the SDP offer and session
config synchronously so the client gets the SDP answer, but the sideband
websocket now connects in the input task async and doesn't block
conversation state installation.

This lets the normal realtime input channels buffer text, handoff
output, and audio while the WebRTC sideband websocket is connecting. If
the sideband join fails while the conversation is still active, the task
sends a RealtimeEvent::Error through the existing events_tx / fanout
path.

To rephrase this:
* No longer blocked on sideband: the client can receive the SDP answer
earlier, set up the WebRTC peer connection, and let the media leg
progress while the sideband websocket joins.
* Still blocked on sideband: queued text, handoff output, and sideband
server events cannot flow until connect_webrtc_sideband(...).await
finishes and then run_realtime_input_task(...) starts

## Validation

- `env CODEX_SKIP_VENDORED_BWRAP=1 cargo test --manifest-path
codex-rs/Cargo.toml -p codex-core --test all
conversation_webrtc_start_posts_generated_session`

`CODEX_SKIP_VENDORED_BWRAP=1` is needed in this local environment
because `libcap.pc` is not installed for the vendored bubblewrap build.

## Testing
I tested this locally by running `cargo run -p codex-cli --bin codex --
--enable realtime_conversation` and invoking `/realtime`. Then, we get
logs emitted in `~/.codex/log/codex-tui.log`.

### Before the Change
Logging commit
(c0299e6edf)
```
2026-05-04T16:06:09.251956Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: starting realtime conversation
2026-05-04T16:06:09.251980Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: creating realtime call transport="webrtc"
2026-05-04T16:06:10.365722Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: realtime call created; sdp answer ready transport="webrtc" call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy elapsed_ms=1113 total_elapsed_ms=1113
2026-05-04T16:06:10.365843Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: connecting realtime sideband websocket call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy
2026-05-04T16:06:10.784528Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: connected realtime sideband websocket call_id=rtc_u0_Dbq65nhak5eLjQZ73yhAy elapsed_ms=418 total_elapsed_ms=1532
2026-05-04T16:06:10.784665Z  INFO session_loop{thread_id=019df3b9-e3d8-7271-b13a-b880119aa4c2}:submission_dispatch{otel.name="op.dispatch.realtime_conversation_start" submission.id="019df3bd-65df-7ee2-8125-1d6701fe39d2" codex.op="realtime_conversation_start"}: codex_core::realtime_conversation: realtime conversation started
```

### After the Change
Logging commit
(c8b00ac21a)
```
2026-05-04T15:41:24.080363Z  INFO ... codex_core::realtime_conversation: starting realtime conversation
2026-05-04T15:41:24.080434Z  INFO ... codex_core::realtime_conversation: creating realtime call transport="webrtc"
2026-05-04T15:41:25.106906Z  INFO ... codex_core::realtime_conversation: realtime call created; sdp answer ready transport="webrtc" call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy elapsed_ms=1026 total_elapsed_ms=1026
2026-05-04T15:41:25.107067Z  INFO ... codex_core::realtime_conversation: spawned realtime sideband connection task transport="webrtc" total_elapsed_ms=1026
2026-05-04T15:41:25.107160Z  INFO ... codex_core::realtime_conversation: realtime conversation started
2026-05-04T15:41:25.107185Z  INFO codex_core::realtime_conversation: connecting realtime sideband websocket call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy
2026-05-04T15:41:25.107352Z  INFO ... codex_core::realtime_conversation: sent realtime sdp answer to client
2026-05-04T15:41:26.076685Z  INFO codex_core::realtime_conversation: connected realtime sideband websocket call_id=rtc_u0_Dbpi8nhak5eLjQZ73yhAy elapsed_ms=969 total_elapsed_ms=1996
2026-05-04T15:41:26.573893Z  INFO codex_core::realtime_conversation: realtime session updated realtime_session_id=sess_u0_Dbpi8nhak5eLjQZ73yhAy
2026-05-04T15:41:26.573970Z  INFO codex_core::realtime_conversation: received realtime conversation event event=SessionUpdated { ... }
```

### Conclusion
Here we see that we saved about a half a second in conversation startup
(1532ms -> 969ms). This also checks out with my sanity tests; I was
seeing at most a second of saving.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-05-04 22:28:14 +00:00
Ruslan Nigmatullin
4d201e340e state: pass state db handles through consumers (#20561)
## Why

SQLite state was still being opened from consumer paths, including lazy
`OnceCell`-backed thread-store call sites. That let one process
construct multiple state DB connections for the same Codex home, which
makes SQLite lock contention and `database is locked` failures much
easier to hit.

State DB lifetime should be chosen by main-like entrypoints and tests,
then passed through explicitly. Consumers should use the supplied
`Option<StateDbHandle>` or `StateDbHandle` and keep their existing
filesystem fallback or error behavior when no handle is available.

The startup path also needs to keep the rollout crate in charge of
SQLite state initialization. Opening `codex_state::StateRuntime`
directly bypasses rollout metadata backfill, so entrypoints should
initialize through `codex_rollout::state_db` and receive a handle only
after required rollout backfills have completed.

## What Changed

- Initialize the state DB in main-like entrypoints for CLI, TUI,
app-server, exec, MCP server, and the thread-manager sample.
- Pass `Option<StateDbHandle>` through `ThreadManager`,
`LocalThreadStore`, app-server processors, TUI app wiring, rollout
listing/recording, personality migration, shell snapshot cleanup,
session-name lookup, and memory/device-key consumers.
- Remove the lazy local state DB wrapper from the thread store so
non-test consumers use only the supplied handle or their existing
fallback path.
- Make `codex_rollout::state_db::init` the local state startup path: it
opens/migrates SQLite, runs rollout metadata backfill when needed, waits
for concurrent backfill workers up to a bounded timeout, verifies
completion, and then returns the initialized handle.
- Keep optional/non-owning SQLite helpers, such as remote TUI local
reads, as open-only paths that do not run startup backfill.
- Switch app-server startup from direct
`codex_state::StateRuntime::init` to the rollout state initializer so
app-server cannot skip rollout backfill.
- Collapse split rollout lookup/list APIs so callers use the normal
methods with an optional state handle instead of `_with_state_db`
variants.
- Restore `getConversationSummary(ThreadId)` to delegate through
`ThreadStore::read_thread` instead of a LocalThreadStore-specific
rollout path special case.
- Keep DB-backed rollout path lookup keyed on the DB row and file
existence, without imposing the filesystem filename convention on
existing DB rows.
- Verify readable DB-backed rollout paths against `session_meta.id`
before returning them, so a stale SQLite row that points at another
thread's JSONL falls back to filesystem search and read-repairs the DB
row.
- Keep `debug prompt-input` filesystem-only so a one-off debug command
does not initialize or backfill SQLite state just to print prompt input.
- Keep goal-session test Codex homes alive only in the goal-specific
helper, rather than leaking tempdirs from the shared session test
helper.
- Update tests and call sites to pass explicit state handles where DB
behavior is expected and explicit `None` where filesystem-only behavior
is intended.

## Validation

- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p
codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p
codex-tui -p codex-exec -p codex-cli --tests`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout state_db_`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout find_thread_path`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout find_thread_path -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout try_init_ -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-rollout`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p
codex-rollout --lib -- -D warnings`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-thread-store
read_thread_falls_back_when_sqlite_path_points_to_another_thread --
--nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-thread-store`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
shell_snapshot`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all personality_migration`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find`
- `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find::find_prefers_sqlite_path_by_id --
--nocapture`
- `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
--test all rollout_list_find -- --nocapture`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core
interrupt_accounts_active_goal_before_pausing`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-app-server get_auth_status -- --test-threads=1`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p
codex-app-server --lib`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout
-p codex-app-server --tests`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout
-p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p
codex-exec -p codex-cli`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p
codex-app-server`
- `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p
codex-rollout`
- `CODEX_SKIP_VENDORED_BWRAP=1
CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core`
- `just argument-comment-lint -p codex-core`
- `just argument-comment-lint -p codex-rollout`

Focused coverage added in `codex-rollout`:

- `recorder::tests::state_db_init_backfills_before_returning` verifies
the rollout metadata row exists before startup init returns.
- `state_db::tests::try_init_waits_for_concurrent_startup_backfill`
verifies startup waits for another worker to finish backfill instead of
disabling the handle for the process.
-
`state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill`
verifies startup does not hang indefinitely on a stuck backfill lease.
-
`tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename`
verifies DB-backed lookup accepts valid existing rollout paths even when
the filename does not include the thread UUID.
-
`tests::find_thread_path_falls_back_when_db_path_points_to_another_thread`
verifies DB-backed lookup ignores a stale row whose existing path
belongs to another thread and read-repairs the row after filesystem
fallback.

Focused coverage updated in `codex-core`:

- `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a
DB-preferred rollout file with matching `session_meta.id`, so it still
verifies that valid SQLite paths win without depending on stale/empty
rollout contents.

`cargo test -p codex-app-server thread_list_respects_search_term_filter
-- --test-threads=1 --nocapture` was attempted locally but timed out
waiting for the app-server test harness `initialize` response before
reaching the changed thread-list code path.

`bazel test //codex-rs/thread-store:thread-store-unit-tests
--test_output=errors` was attempted locally after the thread-store fix,
but this container failed before target analysis while fetching `v8+`
through BuildBuddy/direct GitHub. The equivalent local crate coverage,
including `cargo test -p codex-thread-store`, passes.

A plain local `cargo check -p codex-rollout -p codex-app-server --tests`
also requires system `libcap.pc` for `codex-linux-sandbox`; the
follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in
this container.
2026-05-04 11:46:03 -07:00
Owen Lin
541e99cf09 feat(app-server): always return limited thread history (#20682)
## Why

Whenever we return a thread's history (turns and items) over app-server,
always return the limited form as specified by the rollout policy
`EventPersistenceMode::Limited`, even if the thread was previously
started with `EventPersistenceMode::Extended`.

We're finding it is quite unscalable to be returning the extended
history, so let's apply the same filtering logic of the rollout policy
when we load and return the thread's history.

## What Changed

- Reuse the rollout persistence policy when reconstructing app-server
`ThreadItem` history so only `EventPersistenceMode::Limited` rollout
items are replayed into API turns.
- Route `thread/read`, `thread/resume`, `thread/fork`,
`thread/turns/list`, and rollback responses through the same filtered
app-server history projection.
- Keep live active turns intact when composing a response for a
currently running thread.
- Update command execution coverage so persisted extended command events
are excluded from returned history for `thread/read`, `thread/fork`, and
`thread/turns/list`.

## Test Plan

- `cargo test -p codex-app-server limited`
- `cargo test -p codex-app-server thread_shell_command`
- `cargo test -p codex-app-server thread_read`
- `cargo test -p codex-app-server thread_rollback`
- `cargo test -p codex-app-server thread_fork`
- `cargo test -p codex-app-server-protocol`
2026-05-04 10:37:35 -07:00
Eric Traut
3c2dcbef85 Keep paused goals paused on thread resume (#20790)
## Summary

Early adopters of the `/goal` feature have provided feedback that they
expect a goal they explicitly paused to remain paused when they resume a
thread. Previously, resuming a thread would reactivate a paused goal.

This PR keeps persisted goal status unchanged during thread resume. This
honors the user feedback while also simplifying the core goal logic.

Rather than have the core logic automatically resume a paused goal, that
responsibility is transferred to the client. The TUI now detects a
resumed thread with a paused goal and asks the user whether to `Resume
goal` or `Leave paused`. The prompt appears only for quiet resume flows,
so users who resume with an immediate prompt are not interrupted.

<img width="544" height="111" alt="image"
src="https://github.com/user-attachments/assets/0ac9de1c-6ee6-47ba-b223-c03c8eb4c192"
/>
2026-05-04 08:58:07 -07:00
Abhinav
443f6b831e Use the 2025-06-18 elicitation capability shape (#20562)
# Why

Codex currently negotiates MCP `2025-06-18`, where the client
elicitation capability is represented as an empty object. We were still
serializing `capabilities.elicitation.form`, which belongs to the later
capability shape and can cause strict `2025-06-18` servers to reject
`initialize` with an unrecognized-field error.

This keeps the handshake aligned with the protocol version Codex
actually negotiates and fixes the compatibility regression tracked in
#17492.

# What

- Serialize the client elicitation capability as `elicitation: {}` for
`2025-06-18`.
- Keep elicitation advertised for both Codex Apps and custom MCP
servers.
- Tighten regression coverage so the unit test asserts both the Rust
value and the serialized wire shape.
- Add an app-server integration test that round-trips a form elicitation
from a custom MCP server; the existing connector round-trip continues to
cover the connector path.

# Verification

- `cargo test -p codex-mcp`
- `cargo test -p codex-app-server mcp_server_elicitation_round_trip`
- `cargo test -p codex-app-server
mcp_server_tool_call_round_trips_elicitation`

# Next steps

- Decide whether `tool_call_mcp_elicitation=false` should also suppress
capability advertisement during `initialize`.
- Revisit `form` / `url` capability advertisement when Codex is ready to
negotiate MCP `2025-11-25`, which defines that newer shape.
2026-05-01 14:16:22 -07:00
Tom
e4d6675632 [codex] Migrate loaded thread/read history to ThreadStore (#20486)
## Summary

- Route loaded `thread/read` + `includeTurns` through
`CodexThread::load_history` / ThreadStore history instead of direct
rollout JSONL reads.
- Add an in-memory ThreadStore regression test covering loaded
`thread/read includeTurns` without a local rollout path.
2026-05-01 10:55:04 -07:00
xl-openai
48791920a8 feat: Track local paths for shared plugins (#20560)
When a local plugin is shared, Codex now records the local plugin path
by remote plugin id under CODEX_HOME/.tmp.

plugin/share/list includes the remote share URL and the matching local
plugin path when available, and plugin/share/delete
clears the local mapping after deleting the remote share.

Also add sharedURL to plugin/share/list.
2026-05-01 00:50:12 -07:00
xli-oai
96d2ea9058 Add remote plugin skill read API (#20150)
## Summary

Adds an app-server `plugin/skill/read` method for remote plugin skill
markdown. The new method calls the plugin-service skill detail endpoint
and returns `skill_md_contents`, so clients can preview skills for
remote plugins before the bundle is installed locally.

## Why

Uninstalled remote plugin skills do not have local `SKILL.md` files.
Without an on-demand remote read, the desktop plugin details UI cannot
render the skill details modal for those skills.

## Validation

- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server --test all --
suite::v2::plugin_read::plugin_skill_read_reads_remote_skill_contents_when_remote_plugin_enabled
--exact`
- `just fix -p codex-app-server-protocol -p codex-core-plugins -p
codex-app-server`
2026-05-01 00:16:25 -07:00
Tom
fe05acad23 Make thread store process-scoped (#19474)
- Build one app-server process ThreadStore from startup config and share
it with ThreadManager and CodexMessageProcessor.
- Remove per-thread/fork store reconstruction so effective thread config
cannot switch the persistence backend.
- Add params to ThreadStore create/resume for specifying thread
metadata, since otherwise the metadata from store creation would be used
(incorrectly).
2026-04-30 21:24:59 -07:00
xli-oai
bb60b78c46 Surface admin-disabled remote plugin status (#20298)
## Summary

Remote plugin-service returns plugin availability separately from a
user's installed/enabled state. This adds `PluginAvailabilityStatus` to
the app-server protocol, propagates remote catalog `status` into
`PluginSummary`, and rejects install attempts for remote plugins marked
`DISABLED_BY_ADMIN` before downloading or caching the bundle.

This is the `openai/codex` half of the change. The companion
`openai/openai` webview PR is
https://github.com/openai/openai/pull/873269.

## Validation

- `cargo run -p codex-app-server-protocol --bin write_schema_fixtures`
- `cargo test -p codex-app-server --test all
plugin_list_marks_remote_plugin_disabled_by_admin`
- `cargo test -p codex-app-server --test all
plugin_list_includes_remote_marketplaces_when_remote_plugin_enabled`
- `cargo test -p codex-app-server --test all
plugin_install_rejects_remote_plugin_disabled_by_admin_before_download`
- `cargo test -p codex-app-server-protocol schema_fixtures`
2026-04-30 20:00:07 -07:00
Abhinav
0d9a5d20ec Alias codex_hooks feature as hooks (#20522)
# Why

The hooks feature flag should use the concise canonical name `hooks`,
while existing configs that still use `codex_hooks` continue to work
during the rename.

# What

- change the canonical `Feature::CodexHooks` key from `codex_hooks` to
`hooks`
- register `codex_hooks` through the existing legacy-alias path
- update the config schema and canonical config fixtures to prefer
`hooks`
- add regression coverage that both `hooks` and `codex_hooks` resolve to
`Feature::CodexHooks`

# Verification

- `cargo test -p codex-features`
- `cargo test -p codex-core config::schema_tests`
- `cargo test -p codex-core
pre_tool_use_blocks_shell_when_defined_in_config_toml`
- `cargo test -p codex-app-server
hooks_list_uses_each_cwds_effective_feature_enablement`
2026-05-01 00:46:33 +00:00
xli-oai
acdf908268 Emit analytics for remote plugin installs (#20267)
## Summary

- emit `codex_plugin_installed` after a remote plugin install succeeds
- keep local installs unchanged, but let remote installs override the
analytics `plugin_id` with the backend remote plugin id
(`plugins~Plugin_...`)
- preserve the local/display identity in `plugin_name` and
`marketplace_name`, plus capability metadata from the installed bundle
- add regression coverage for local install analytics, remote install
analytics, and analytics id override serialization

## Testing

- `just fmt`
- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server`
2026-04-30 17:27:16 -07:00
xli-oai
2686873e77 Sync remote installed plugin bundles (#20268)
## Summary
- Download missing remote installed plugin bundles during app-server
startup and plugin/list refresh.
- Upgrade cached remote installed bundles when the backend installed
version changes.
- Remove stale remote installed bundle caches without writing remote
plugin state into config.toml.

## Review note
This is a clean PR branch cut from the current diff on top of latest
`origin/main`. The diff intentionally has no `codex-rs/core/**` files,
so CODEOWNERS should not request the core-directory owner review from
stale PR history.

## Validation
Already run on the source branch before creating this clean PR:
- `just fmt`
- `cargo test -p codex-core-plugins`
- `cargo test -p codex-app-server --test all
app_server_startup_sync_downloads_remote_installed_plugin_bundles --
--nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_sync_upgrades_and_removes_remote_installed_plugin_bundles --
--nocapture`
- `cargo test -p codex-app-server --test all
app_server_startup_remote_plugin_sync_runs_once -- --nocapture`
- `just fix -p codex-core-plugins`
- `just fix -p codex-app-server`
- `git diff --check`
2026-04-30 16:05:14 -07:00
Tom
127be0612c [codex] Migrate thread turns list to thread store (#19280)
- migrate `thread/turns/list` to ThreadStore. Uses ThreadStore for most
data now but merges in the in-memory state from thread manager
- keep v2 `thread/list` pathless-store friendly by converting
`StoredThread` directly to API `Thread`
- add regression coverage for pathless store history/listing
2026-04-30 14:16:42 -07:00
alexsong-oai
9121132c8f Send external import completion for sync imports (#20379) 2026-04-30 13:03:21 -07:00
pakrym-oai
b52083146c Stop emitting item/fileChange/outputDelta output delta notifications (#20471)
## Why

`item/fileChange/outputDelta` text output was only the tool's summary or
error text and not used by client surfaces.

We keep `item/fileChange/outputDelta` in the app-server protocol as a
deprecated compatibility entry, but the server no longer emits it.

## What changed

- stop the `apply_patch` runtime from emitting `ExecCommandOutputDelta`
events
- simplify `item_event_to_server_notification` so command output deltas
always map to `item/commandExecution/outputDelta`
- remove the app-server bookkeeping that tried to detect whether an
output delta belonged to a file change
- mark `item/fileChange/outputDelta` as a deprecated legacy protocol
entry in the v2 types, schema, and README
- simplify the file-change approval tests so they only wait for
completion instead of expecting output-delta notifications

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-thread-manager-sample`
- `cargo test -p codex-app-server-protocol
protocol::event_mapping::tests::exec_command_output_delta_maps_to_command_execution_output_delta
-- --exact`
- `cargo test -p codex-app-server
turn_start_file_change_approval_accept_for_session_persists_v2 --
--exact` *(failed before the test assertions because the wiremock
`/responses` mock received 0 requests in setup)*
2026-04-30 11:42:07 -07:00
Owen Lin
3516cb9751 fix(core): truncate large mcp tool outputs in rollouts (#20260)
## Why
Large MCP tool call outputs can make rollout JSONL files enormous. In
the session that motivated this change, the biggest JSONL records were:
- `event_msg/mcp_tool_call_end`
- `response_item/function_call_output`

both containing the same unbounded MCP payloads - just 3 MCP tool calls
that each were multi-hundred MBs 😱

This PR truncates both of those JSONL records.

## How

#### For `response_item/function_call_output`
Unified exec already bounds tool output before it is injected into
model-facing history, which also keeps the corresponding rollout
`response_item/function_call_output` records small.

MCP should follow the same pattern: truncate the model-facing tool
output at the tool-output boundary, while leaving code-mode/raw hook
consumers alone.

#### For `event_msg/mcp_tool_call_end`
`McpToolCallEnd` also needs its own bounded event copy because it is the
app-server/replay/UI event shape that backs `ThreadItem::McpToolCall`.
Unfortunately this is _not_ downstream of the `ToolOutput` trait.

## Model behavior 
Model behavior is actually unchanged as a result of this PR. 

Before this PR, MCP output was:
1. Converted to `FunctionCallOutput`.
2. Recorded into in-memory history.
3. Truncated by `ContextManager::record_items()` before later model
turns saw it.

After this branch, MCP output is truncated earlier, in
`McpToolOutput::response_payload()`, using the same helper. Then
`ContextManager::record_items()` sees an already-truncated output and
effectively has little/no additional work to do.

So the model should still see the same kind of truncated function-call
output. The practical difference is where truncation happens: earlier,
before rollout persistence/app-server emission can see the giant
payload.

## Verification

- `cargo test -p codex-core mcp_tool_output`
- `cargo test -p codex-core
mcp_tool_call::tests::truncate_mcp_tool_result_for_event`
- `cargo test -p codex-core
mcp_post_tool_use_payload_uses_model_tool_name_args_and_result`
- `just fmt`
- `just fix -p codex-core`
- `git diff --check`
2026-04-30 16:30:43 +00:00
Ahmed Ibrahim
8a97f3cf03 realtime: rename provider session ids (#20361)
## Summary

Codex is repurposing `session` to mean a thread group, so the realtime
provider session id should no longer use `session_id` / `sessionId` in
Codex-facing protocol payloads. This PR renames that provider-specific
field to `realtime_session_id` / `realtimeSessionId` and intentionally
breaks clients that still send the old field names.

## What Changed

- Renamed realtime provider session fields in `ConversationStartParams`,
`RealtimeConversationStartedEvent`, and `RealtimeEvent::SessionUpdated`.
- Renamed app-server v2 realtime request and notification fields to
`realtimeSessionId`.
- Removed legacy serde aliases for `session_id` / `sessionId`; clients
must send the new names.
- Propagated the rename through core realtime startup, app-server
adapters, codex-api websocket handling, and TUI realtime state.
- Regenerated app-server protocol schema/TypeScript outputs and updated
app-server README examples.
- Kept upstream Realtime API concepts unchanged: provider `session.id`
parsing and `x-session-id` headers still use the upstream wire names.

## Testing

- CI is running on the latest pushed commit.
- Earlier local verification on this PR:
  - `cargo test -p codex-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-core
realtime_conversation`
  - `cargo test -p codex-app-server-protocol`
- `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-app-server
realtime_conversation`
- attempted `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-tui` (local
linker bus error while linking the test binary)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 13:39:48 +03:00
xl-openai
87d0cf1a62 feat: Add workspace plugin sharing APIs (#20278)
1. Adds v2 plugin/share/save, plugin/share/list, and plugin/share/delete
RPCs.
2. Implements save by archiving a local plugin root, enforcing a size
limit, uploading through the workspace upload flow, and supporting
updates via remotePluginId.
3. Lists created workspace plugins
4. Deletes a previously uploaded/shared plugin.
2026-04-29 23:49:20 -07:00
Abhinav
8f3c06cc97 Add persisted hook enablement state (#19840)
## Why

After `hooks/list` exposes the hook inventory, clients need a way to
persist user hook preferences, make those changes effective in
already-open sessions, and distinguish user-controllable hooks from
managed requirements without adding another bespoke app-server write
API.

## What

- Extends `hooks/list` entries with effective `enabled` state.
- Persists user-level hook state under `hooks.state.<hook-id>` so the
model can grow beyond a single boolean over time.
- Uses the existing `config/batchWrite` path for hook state updates
instead of introducing a dedicated hook write RPC.
- Refreshes live session hook engines after config writes so
already-open threads observe updated enablement without a restart.

## Stack

1. openai/codex#19705
2. openai/codex#19778
3. This PR - openai/codex#19840
4. openai/codex#19882

## Reviewer Notes

The generated schema files account for much of the raw diff. The core
behavior is in:

- `hooks/src/config_rules.rs`, which resolves per-hook user state from
the config layer stack.
- `hooks/src/engine/discovery.rs`, which projects effective enablement
into `hooks/list` from source-derived managedness.
- `config/src/hook_config.rs`, which defines the new `hooks.state`
representation.
- `core/src/session/mod.rs`, which rebuilds live hook state after user
config reloads.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-30 04:46:32 +00:00
Michael Bolin
ac4332c05b permissions: expose active profile metadata (#20095) 2026-04-29 20:54:59 -07:00
Alex Zamoshchin
8b07132e09 update codex_plugins_beta_setting (from workspace settings) (#20250)
update the name after rename internally

see https://github.com/openai/openai/pull/871006
2026-04-30 00:40:25 +00:00
pakrym-oai
fedcefe9da Reduce the surface of collaboration modes (#20149)
Collaboration modes were slightly invasive both into ThreadManager
construction and ModelProvider
2026-04-29 17:22:41 -07:00
stefanstokic-oai
c8abcbf925 Import external agent sessions in background (#20284)
Summary:
- Return from external agent import before session history import
finishes
- Run session import work in the background and emit the existing
completion notification when it is done
- Serialize session imports so duplicate requests do not create
duplicate imported threads

Verification:
- cargo test -p codex-app-server external_agent_config_
- cargo test -p codex-external-agent-sessions
- just fix -p codex-app-server
- just fix -p codex-external-agent-sessions
- git diff --check
2026-04-30 00:00:41 +00:00
alexsong-oai
7bcd4626c4 Consume ai-title from external sessions and add end marker (#20261)
## Summary
- Support Claude Code `ai-title` / `aiTitle` records when detecting and
importing external agent sessions.
- Preserve existing `custom-title` / `customTitle` precedence; only fall
back to `aiTitle` when no custom title is present.
- Add coverage for both detection and import title selection, including
the custom-title-over-ai-title case.

## Testing
- `cargo test -p codex-external-agent-sessions`
- `just fix -p codex-external-agent-sessions`
2026-04-30 00:00:13 +00:00
Abhinav
8774229a89 Add hooks/list app-server RPC (#19778)
## Why

We need a way to list the available hooks to expose via the TUI and App
so users can view and manage their hooks

## What

- Adds `hooks/list` for one or more `cwd` values that returns discovered
hook metadata

## Stack

1. openai/codex#19705
2. This PR - openai/codex#19778
3. openai/codex#19840
4. openai/codex#19882

## Review Notes

The generated schema files account for most of the raw diff, these files
have the core change:

- `hooks/src/engine/discovery.rs` builds the inventory entries during
hook discovery while leaving runtime handlers focused on execution.
- `app-server/src/codex_message_processor.rs` wires `hooks/list` into
the app-server flow for each requested `cwd`.
- `app-server-protocol/src/protocol/v2.rs` defines the new v2
request/response payloads exposed on the wire.

### Core Changes

`core/src/plugins/manager.rs` adds `plugins_for_layer_stack(...)` so
`skills/list` and `hooks/list`can resolve plugin state for each
requested `cwd`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-29 23:39:57 +00:00
xli-oai
afbddabc8b Require remote plugin detail before uninstall (#19966)
## Summary
- Fetch remote plugin detail before sending the uninstall request.
- Use the detail response to derive the marketplace namespace and plugin
name for cache cleanup.
- Stop the uninstall before the backend POST if detail lookup fails, so
backend state and local cache state do not diverge.

## Testing
- `just fmt`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo test -p codex-core-plugins`
- `git diff --check`
2026-04-29 14:01:11 -07:00
xl-openai
73cd831952 feat: Use remote installed plugin cache for skills and MCP (#20096)
- Fetches and caches remote /installed plugin state
- Lets skills/list load skills from remote-installed cached plugins
without requiring a local marketplace entry
- Routes plugin list/startup/install/uninstall changes through async
plugin cache invalidation and MCP refresh
2026-04-29 12:09:49 -07:00
xl-openai
6f328d5e02 Soften skill description budget warnings (#20112)
Updates skill description budget messaging to be less alarming
2026-04-28 19:56:25 -07:00
Celia Chen
8c47e36504 feat: expose provider capability bounds to app server clients (#20049)
follow up of #19442. The app server now exposes provider-derived bounds
through a new v2 `modelProvider/read` method. The response reports the
configured provider map key as `modelProvider` and returns the effective
capability booleans so clients can align their UI with the same
provider-owned limits used by core.
2026-04-29 01:36:19 +00:00
canvrno-oai
4c39ad33cb Fix plugin list workspace settings test isolation (#20086)
Fixes test that often fails locally when running `cargo test`
- Add an app-server test helper that combines managed-config isolation
with custom env overrides.
- Isolate `HOME` / `USERPROFILE` in plugin-list workspace settings tests
so host home marketplaces do not affect results.
2026-04-28 18:34:38 -07:00
Gabriel Peal
5e6cbbadf7 Return None when auth refresh fails (#20092)
Right now, if Codex winds up in a state with auth but it can't refresh
the token, the user is left with an unhelpful message that says to log
out and log back in again.

Ultimately, we should prevent that from happening but if it does,
returning None will allow the caller to redirect the user back to the
login page
2026-04-28 16:15:47 -07:00
Ruslan Nigmatullin
1de7a9bf69 app-server: allow remote_control runtime feature override (#20047) 2026-04-28 13:36:12 -07:00
stefanstokic-oai
4c68bd728f External agent session support (#19895)
## Summary

This extends external agent detection/import beyond config artifacts so
Codex can detect recent sessions files from the external agent home and
import them into Codex rollout history.

## What changed

- Added a focused `external_agent_sessions` module for:
  - session discovery
  - source-record parsing
  - rollout construction
  - import ledger tracking
- Wired session detection/import into the app-server external agent
config API.
- Added compaction handling so large imported sessions can be resumed
safely before the first follow-up turn.

## Testing

Added coverage for:
- recent-session detection
- custom-title handling
- recency filtering
- dedupe and re-detect-after-source-change behavior
- visible imported turn construction
- backward-compatible import payload deserialization
- end-to-end RPC import flow
- rejection of undetected session paths
- repeat-import behavior
- large-session compaction before first follow-up

Ran:
- `cargo test -p codex-app-server external_agent_config_import_ --test
all`
2026-04-28 17:42:36 +00:00
xli-oai
803705f795 Add remote plugin uninstall API (#19456)
## Summary
- Adds the remote `plugin/uninstall` request form using required
`pluginId` plus optional `remoteMarketplaceName`, while preserving local
`pluginId` uninstall.
- Adds `codex_core_plugins::remote::uninstall_remote_plugin` for the
deployed ChatGPT plugin backend uninstall path and validates the backend
returns the same id with `enabled: false`.
- Routes app-server remote uninstall through feature checks, remote
plugin id validation, backend mutation, local downloaded cache deletion,
cache clearing, docs, and regenerated protocol schemas.

## Tests
- `just write-app-server-schema`
- `just fmt`
- `cargo test -p codex-app-server-protocol
plugin_uninstall_params_serialization_omits_force_remote_sync`
- `cargo test -p codex-app-server plugin_uninstall --test all`
- `cargo test -p codex-app-server plugin_uninstall`
- `cargo build -p codex-cli`
- `CODEX_BIN=/Users/xli/code/codex/codex-rs/target/debug/codex python3
/Users/xli/.codex/skills/xli-test-marketplace-api/scripts/run_marketplace_api_matrix.py`
(44 pass / 0 fail)
- `just fix -p codex-app-server-protocol -p codex-app-server -p
codex-tui`
- `just fix -p codex-app-server`
2026-04-28 03:27:53 -07:00
xl-openai
7d72fc8f53 feat: Cache remote plugin bundles on install (#19914)
Remote installs now fetch, validate, download, and cache the plugin
bundle locally
2026-04-28 00:53:27 -07:00
Michael Bolin
4b55979755 permissions: remove cwd special path (#19841)
## Why

The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.

## What changed

- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.

## Compatibility

Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.

## Follow-up

This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.

## Verification

- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
2026-04-27 13:41:27 -07:00
rhan-oai
215d5a8f7c [codex-analytics] remove ga flag (#19863) 2026-04-27 19:29:19 +00:00
Andrey Mishchenko
35bc6e3d01 Delete unused ResponseItem::Message.end_turn (#19605)
This field is unused. Delete it.
2026-04-26 17:18:09 -07:00
pakrym-oai
9c3abcd46c [codex] Move config loading into codex-config (#19487)
## Why

Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.

To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.

This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.

## What Changed

- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`

## Notes

- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
2026-04-26 15:10:53 -07:00
Michael Bolin
4d7ce3447d permissions: make runtime config profile-backed (#19606)
## Why

This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.

`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.

The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.

## What Changed

- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
2026-04-26 13:29:54 -07:00
Michael Bolin
ac2bffa443 test: harden app-server integration tests (#19683)
## Why

Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.

A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.

## What Changed

- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.

## Verification

- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
2026-04-26 12:43:16 -07:00
Michael Bolin
d54493ba1c test: stabilize app-server path assertions on Windows (#19604)
## Why

Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.

That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.

This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.

## Failure Modes Covered

- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.

## What Changed

- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.

## Behavior

This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.

## Verification

Targeted local checks run while extracting this fix:

- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`

Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.

## Docs

No docs update is needed because this is test-only infrastructure
stabilization.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
2026-04-25 16:25:28 -07:00
Michael Bolin
f41306b4f3 test: isolate remote thread store regression from plugin warmups (#19593)
Follow-up to #19266.

## Why


`thread_start_with_non_local_thread_store_does_not_create_local_persistence`
is meant to catch accidental local thread persistence when a non-local
thread store is configured. The Windows flake reported in [this
BuildBuddy
invocation](https://app.buildbuddy.io/invocation/0b75dde4-6828-4e7b-a35b-e45b73fb005d)
showed that the assertion was tripping on an unexpected top-level `.tmp`
entry:

```diff
 {
+    ".tmp",
     "config.toml",
     "installation_id",
     "memories",
     "skills",
 }
```

That `.tmp` does not appear to come from `tempfile::TempDir`; it comes
from unrelated plugin startup work that can legitimately materialize
`codex_home/.tmp`, including the startup remote plugin sync marker in
[`core/src/plugins/startup_sync.rs`](bce74c70ce/codex-rs/core/src/plugins/startup_sync.rs (L13-L15))
and the curated plugin snapshot under
[`.tmp/plugins`](bce74c70ce/codex-rs/core-plugins/src/startup_sync.rs (L25-L26)).

That makes the regression race unrelated background startup tasks
instead of validating the thread-store invariant it was added to cover.
Rather than weakening the assertion to allow arbitrary `.tmp` entries,
this change isolates the test from plugin warmups so it can stay strict
about unexpected local thread persistence artifacts.

## What changed

- disable plugins in the generated config used by
`app-server/tests/suite/v2/remote_thread_store.rs`
- keep the existing `codex_home` assertions unchanged so the test still
fails if local session or sqlite persistence is introduced

## Verification

- `cargo test -p codex-app-server
suite::v2::remote_thread_store::thread_start_with_non_local_thread_store_does_not_create_local_persistence
-- --exact`
2026-04-25 20:45:31 +00:00
Eric Traut
4167628622 Add goal core runtime (4 / 5) (#18076)
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.

## Why

A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.

## What changed

- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.

## Verification

- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
2026-04-24 21:16:00 -07:00
Eric Traut
6c874f9b34 Add goal app-server API (2 / 5) (#18074)
Adds the app-server v2 goal API on top of the persisted goal state from
PR 1.

## Why

Clients need a stable app-server surface for reading and controlling
materialized thread goals before the model tools and TUI can use them.
Goal changes also need to be observable by app-server clients, including
clients that resume an existing thread.

## What changed

- Added v2 `thread/goal/get`, `thread/goal/set`, and `thread/goal/clear`
RPCs for materialized threads.
- Added `thread/goal/updated` and `thread/goal/cleared` notifications so
clients can keep local goal state in sync.
- Added resume/snapshot wiring so reconnecting clients see the current
goal state for a thread.
- Added app-server handlers that reconcile persisted rollout state
before direct goal mutations.
- Updated the app-server README plus generated JSON and TypeScript
schema fixtures for the new API surface.

## Verification

- Added app-server v2 coverage for goal get/set/clear behavior,
notification emission, resume snapshots, and non-local thread-store
interactions.
2026-04-24 20:53:41 -07:00