Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.
This reverts commit 47356ff83c.
## Summary
To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
- inbound handling (`transport_event_rx`)
- outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks
## Validation
Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests
<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>
---------
Co-authored-by: jif-oai <jif@openai.com>
This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.
## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.
Having a dedicated `turn/steer` API eliminates all those quirks.
`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.
- Adds --listen <URL> to codex app-server with two listen modes:
- stdio:// (default, existing behavior)
- ws://IP:PORT (new websocket transport)
- Refactors message routing to be connection-aware:
- Tracks per-connection session state (initialize/experimental
capability)
- Routes responses/errors to the originating connection
- Broadcasts server notifications/requests to initialized connections
- Updates initialization semantics to be per connection (not
process-global), and updates app-server docs accordingly.
- Adds websocket accept/read/write handling (JSON-RPC per text frame,
ping/pong handling, connection lifecycle events).
Testing
- Unit tests for transport URL parsing and targeted response/error
routing.
- New websocket integration test validating:
- per-connection initialization requirements
- no cross-connection response leakage
- same request IDs on different connections route independently.
## Problem being solved
- We need a single, reliable way to mark app-server API surface as
experimental so that:
1. the runtime can reject experimental usage unless the client opts in
2. generated TS/JSON schemas can exclude experimental methods/fields for
stable clients.
Right now that’s easy to drift or miss when done ad-hoc.
## How to declare experimental methods and fields
- **Experimental method**: add `#[experimental("method/name")]` to the
`ClientRequest` variant in `client_request_definitions!`.
- **Experimental field**: on the params struct, derive `ExperimentalApi`
and annotate the field with `#[experimental("method/name.field")]` + set
`inspect_params: true` for the method variant so
`ClientRequest::experimental_reason()` inspects params for experimental
fields.
## How the macro solves it
- The new derive macro lives in
`codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
`#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
attributes.
- **Structs**:
- Generates `ExperimentalApi::experimental_reason(&self)` that checks
only annotated fields.
- The “presence” check is type-aware:
- `Option<T>`: `is_some_and(...)` recursively checks inner.
- `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
- `bool`: must be `true`.
- Other types: considered present (returns `true`).
- Registers each experimental field in an `inventory` with `(type_name,
serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
that type. Field names are converted from `snake_case` to `camelCase`
for schema/TS filtering.
- **Enums**:
- Generates an exhaustive `match` returning `Some(reason)` for annotated
variants and `None` otherwise (no wildcard arm).
- **Wiring**:
- Runtime gating uses `ExperimentalApi::experimental_reason()` in
`codex-rs/app-server/src/message_processor.rs` to reject requests unless
`InitializeParams.capabilities.experimental_api == true`.
- Schema/TS export filters use the inventory list and
`EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
strip experimental methods/fields when `experimental_api` is false.
## Summary
- Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
tags from normal assistant output.
- Persist plan items and rebuild them on resume so proposed plans show
in thread history.
- Wire plan items/deltas through app-server protocol v2 and render a
dedicated proposed-plan view in the TUI, including the “Implement this
plan?” prompt only when a plan item is present.
## Changes
### Core (`codex-rs/core`)
- Added a generic, line-based tag parser that buffers each line until it
can disprove a tag prefix; implements auto-close on `finish()` for
unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
- Refactored proposed plan parsing to wrap the generic parser.
`codex-rs/core/src/proposed_plan_parser.rs`
- In plan mode, stream assistant deltas as:
- **Normal text** → `AgentMessageContentDelta`
- **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion
(`codex-rs/core/src/codex.rs`)
- Final plan item content is derived from the completed assistant
message (authoritative), not necessarily the concatenated deltas.
- Strips `<proposed_plan>` blocks from assistant text in plan mode so
tags don’t appear in normal messages.
(`codex-rs/core/src/stream_events_utils.rs`)
- Persist `ItemCompleted` events only for plan items for rollout replay.
(`codex-rs/core/src/rollout/policy.rs`)
- Guard `update_plan` tool in Plan Mode with a clear error message.
(`codex-rs/core/src/tools/handlers/plan.rs`)
- Updated Plan Mode prompt to:
- keep `<proposed_plan>` out of non-final reasoning/preambles
- require exact tag formatting
- allow only one `<proposed_plan>` block per turn
(`codex-rs/core/templates/collaboration_mode/plan.md`)
### Protocol / App-server protocol
- Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
(`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
- Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
EXPERIMENTAL markers and note that deltas may not match the final plan
item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
- Added plan delta route in app-server protocol common mapping.
(`codex-rs/app-server-protocol/src/protocol/common.rs`)
- Rebuild plan items from persisted `ItemCompleted` events on resume.
(`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)
### App-server
- Forward plan deltas to v2 clients and map core plan items to v2 plan
items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
`codex-rs/app-server/src/codex_message_processor.rs`)
- Added v2 plan item tests.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
### TUI
- Added a dedicated proposed plan history cell with special background
and padding, and moved “• Proposed Plan” outside the highlighted block.
(`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
- Only show “Implement this plan?” when a plan item exists.
(`codex-rs/tui/src/chatwidget.rs`,
`codex-rs/tui/src/chatwidget/tests.rs`)
<img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
/>
### Docs / Misc
- Updated protocol docs to mention plan deltas.
(`codex-rs/docs/protocol_v1.md`)
- Minor plumbing updates in exec/debug clients to tolerate plan deltas.
(`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)
## Tests
- Added core integration tests:
- Plan mode strips plan from agent messages.
- Missing `</proposed_plan>` closes at end-of-message.
(`codex-rs/core/tests/suite/items.rs`)
- Added unit tests for generic tag parser (prefix buffering, non-tag
lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
- Existing app-server plan item tests in v2.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)
## Notes / Behavior
- Plan output no longer appears in standard assistant text in Plan Mode;
it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
- The final plan item content is authoritative and may diverge from
streamed deltas (documented as experimental).
- Reasoning summaries are not filtered; prompt instructs the model not
to include `<proposed_plan>` outside the final plan message.
## Codex Author
`codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
## Summary
- Adds a new `thread/unarchive` RPC to move archived thread rollouts
back into the active `sessions/` tree.
## What changed
- **Protocol**
- Adds `thread/unarchive` request/response types and wiring.
- **Server**
- Implements `thread_unarchive` in the app server.
- Validates the archived rollout path and thread ID.
- Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
filename timestamp.
- **Core**
- Adds `find_archived_thread_path_by_id_str` helper for archived
rollouts.
- **Docs**
- Documents the new RPC and usage example.
- **Tests**
- Adds an end-to-end server test that:
1) starts a thread,
2) archives it,
3) unarchives it,
4) asserts the file is restored to `sessions/`.
## How to use
```json
{ "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
```
## Author Codex Session
`codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
kind of session UUID forkable by anyone with the right
`session_object_storage_url` line in their config, but for now just
pasting it here for my reference)
## Summary
Add dynamic tool injection to thread startup in API v2, wire dynamic
tool calls through the app server to clients, and plumb responses back
into the model tool pipeline.
### Flow (high level)
- Thread start injects `dynamic_tools` into the model tool list for that
thread (validation is done here).
- When the model emits a tool call for one of those names, core raises a
`DynamicToolCallRequest` event.
- The app server forwards it to the client as `item/tool/call`, waits
for the client’s response, then submits a `DynamicToolResponse` back to
core.
- Core turns that into a `function_call_output` in the next model
request so the model can continue.
### What changed
- Added dynamic tool specs to v2 thread start params and protocol types;
introduced `item/tool/call` (request/response) for dynamic tool
execution.
- Core now registers dynamic tool specs at request time and routes those
calls via a new dynamic tool handler.
- App server validates tool names/schemas, forwards dynamic tool call
requests to clients, and publishes tool outputs back into the session.
- Integration tests
In order to make Codex work with connectors, we add a built-in gateway
MCP that acts as a transparent proxy between the client and the
connectors. The gateway MCP collects actions that are accessible to the
user and sends them down to the user, when a connector action is chosen
to be called, the client invokes the action through the gateway MCP as
well.
- [x] Add the system built-in gateway MCP to list and run connectors.
- [x] Add the app server methods and protocol
Add a new `codex app-server --analytics-default-enabled` CLI flag that
controls whether analytics are enabled by default.
Analytics are disabled by default for app-server. Users have to
explicitly opt in
via the `analytics` section in the config.toml file.
However, for first-party use cases like the VSCode IDE extension, we
default analytics
to be enabled by default by setting this flag. Users can still opt out
by setting this
in their config.toml:
```toml
[analytics]
enabled = false
```
See https://developers.openai.com/codex/config-advanced/#metrics for
more details.
**Motivation**
The `originator` header is important for codex-backend’s Responses API
proxy because it identifies the real end client (codex cli, codex vscode
extension, codex exec, future IDEs) and is used to categorize requests
by client for our enterprise compliance API.
Today the `originator` header is set by either:
- the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var (our VSCode extension
does this)
- calling `set_default_originator()` which sets a global immutable
singleton (`codex exec` does this)
For `codex app-server`, we want the `initialize` JSON-RPC request to set
that header because it is a natural place to do so. Example:
```json
{
"method": "initialize",
"id": 0,
"params": {
"clientInfo": {
"name": "codex_vscode",
"title": "Codex VS Code Extension",
"version": "0.1.0"
}
}
}
```
and when app-server receives that request, it can call
`set_default_originator()`. This is a much more natural interface than
asking third party developers to set an env var.
One hiccup is that `originator()` reads the global singleton and locks
in the value, preventing a later `set_default_originator()` call from
setting it. This would be fine but is brittle, since any codepath that
calls `originator()` before app-server can process an `initialize`
JSON-RPC call would prevent app-server from setting it. This was
actually the case with OTEL initialization which runs on boot, but I
also saw this behavior in certain tests.
Instead, what we now do is:
- [unchanged] If `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var is set,
`originator()` would return that value and `set_default_originator()`
with some other value does NOT override it.
- [new] If no env var is set, `originator()` would return the default
value which is `codex_cli_rs` UNTIL `set_default_originator()` is called
once, in which case it is set to the new value and becomes immutable.
Later calls to `set_default_originator()` returns
`SetOriginatorError::AlreadyInitialized`.
**Other notes**
- I updated `codex_core::otel_init::build_provider` to accepts a service
name override, and app-server sends a hardcoded `codex_app_server`
service name to distinguish it from `codex_cli_rs` used by default (e.g.
TUI).
**Next steps**
- Update VSCE to set the proper value for `clientInfo.name` on
`initialize` and drop the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var.
- Delete support for `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` in codex-rs.
Add `thread/rollback` to app-server to support IDEs undo-ing the last N
turns of a thread.
For context, an IDE partner will be supporting an "undo" capability
where the IDE (the app-server client) will be responsible for reverting
the local changes made during the last turn. To support this well, we
also need a way to drop the last turn (or more generally, the last N
turns) from the agent's context. This is what `thread/rollback` does.
**Core idea**: A Thread rollback is represented as a persisted event
message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
rewriting history. On resume, both the model's context (core replay) and
the UI turn list (app-server v2's thread history builder) apply these
markers so the pruned history is consistent across live conversations
and `thread/resume`.
Implementation notes:
- Rollback only affects agent context and appends to the rollout file;
clients are responsible for reverting files on disk.
- If a thread rollback is currently in progress, subsequent
`thread/rollback` calls are rejected.
- Because we use `CodexConversation::submit` and codex core tracks
active turns, returning an error on concurrent rollbacks is communicated
via an `EventMsg::Error` with a new variant
`CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
sends the BAD_REQUEST RPC response.
Tests cover thread rollbacks in both core and app-server, including when
`num_turns` > existing turns (which clears all turns).
**Note**: this explicitly does **not** behave like `/undo` which we just
removed from the CLI, which does the opposite of what `thread/rollback`
does. `/undo` reverts local changes via ghost commits/snapshots and does
not modify the agent's context / conversation history.
What changed
- Added `outputSchema` support to the app-server APIs, mirroring `codex
exec --output-schema` behavior.
- V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
assistant message for that turn.
- V2 `turn/start` now accepts `outputSchema` and constrains the final
assistant message for that turn (explicitly per-turn only).
Core behavior
- `Op::UserTurn` already supported `final_output_json_schema`; now V1
`sendUserTurn` forwards `outputSchema` into that field.
- `Op::UserInput` now carries `final_output_json_schema` for per-turn
settings updates; core maps it into
`SessionSettingsUpdate.final_output_json_schema` so it applies to the
created turn context.
- V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
(it’s applied only for the current turn). Other overrides
(cwd/model/etc) keep their existing persistent behavior.
API / docs
- `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
`outputSchema`).
- `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
- `codex-rs/app-server/README.md`: document `outputSchema` for
`turn/start` and clarify it applies only to the current turn.
- `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
`sendUserTurn` and v2 `turn/start`.
Tests added/updated
- New app-server integration tests asserting `outputSchema` is forwarded
into outbound `/responses` requests as `text.format`:
- `codex-rs/app-server/tests/suite/output_schema.rs`
- `codex-rs/app-server/tests/suite/v2/output_schema.rs`
- Added per-turn semantics tests (schema does not leak to the next
turn):
- `send_user_turn_output_schema_is_per_turn_v1`
- `turn_start_output_schema_is_per_turn_v2`
- Added protocol wire-compat tests for the merged op:
- serialize omits `final_output_json_schema` when `None`
- deserialize works when field is missing
- serialize includes `final_output_json_schema` when `Some(schema)`
Call site updates (high level)
- Updated all `Op::UserInput { .. }` constructions to include
`final_output_json_schema`:
- `codex-rs/app-server/src/codex_message_processor.rs`
- `codex-rs/core/src/codex_delegate.rs`
- `codex-rs/mcp-server/src/codex_tool_runner.rs`
- `codex-rs/tui/src/chatwidget.rs`
- `codex-rs/tui2/src/chatwidget.rs`
- plus impacted core tests.
Validation
- `just fmt`
- `cargo test -p codex-core`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-protocol`
- `cargo clippy --all-features --tests --profile dev --fix -- -D
warnings`
Implements:
```
turn/start
turn/interrupt
```
along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.
However, an improvement made for developer ergonomics:
- `turn/start` replaces both `SendUserMessage` (no turn overrides) and
`SendUserTurn` (can override model, approval policy, etc.)
Implements:
```
thread/list
thread/start
thread/resume
thread/archive
```
along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.
However, an improvement made for developer ergonomics:
- `thread/start` and `thread/resume` automatically attaches a
conversation listener internally, so clients don't have to make a
separate `AddConversationListener` call like they do today.
For consistency, also updated `model/list` and `feedback/upload` (naming
conventions, list API params).
V2 for `account/updated` and `account/logout` for app server. correspond
to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs
will make other v2 endpoints call `account/updated` instead of
`authStatusChange` too.