- Let typed user messages submit while realtime is active and mirror
accepted text into the realtime text stream.
- Add integration coverage and snapshot for outbound realtime text.
Select Current Thread startup context by budget from newest turns, cap
each rendered turn at 300 approximate tokens, and add formatter plus
integration snapshot coverage.
## Summary
App-server v2 already receives turn-scoped `clientMetadata`, but the
Rust app-server was dropping it before the outbound Responses request.
This change keeps the fix lightweight by threading that metadata through
the existing turn-metadata path rather than inventing a new transport.
## What we're trying to do and why
We want turn-scoped metadata from the app-server protocol layer,
especially fields like Hermes/GAAS run IDs, to survive all the way to
the actual Responses API request so it is visible in downstream
websocket request logging and analytics.
The specific bug was:
- app-server protocol uses camelCase `clientMetadata`
- Responses transport already has an existing turn metadata carrier:
`x-codex-turn-metadata`
- websocket transport already rewrites that header into
`request.request_body.client_metadata["x-codex-turn-metadata"]`
- but the Rust app-server never parsed or stored `clientMetadata`, so
nothing from the app-server request was making it into that existing
path
This PR fixes that without adding a new header or a second metadata
channel.
## How we did it
### Protocol surface
- Add optional `clientMetadata` to v2 `TurnStartParams` and
`TurnSteerParams`
- Regenerate the JSON schema / TypeScript fixtures
- Update app-server docs to describe the field and its behavior
### Runtime plumbing
- Add a dedicated core op for app-server user input carrying turn-scoped
metadata: `Op::UserInputWithClientMetadata`
- Wire `turn/start` and `turn/steer` through that op / signature path
instead of dropping the metadata at the message-processor boundary
- Store the metadata in `TurnMetadataState`
### Transport behavior
- Reuse the existing serialized `x-codex-turn-metadata` payload
- Merge the new app-server `clientMetadata` into that JSON additively
- Do **not** replace built-in reserved fields already present in the
turn metadata payload
- Keep websocket behavior unchanged at the outer shape level: it still
sends only `client_metadata["x-codex-turn-metadata"]`, but that JSON
string now contains the merged fields
- Keep HTTP fallback behavior unchanged except that the existing
`x-codex-turn-metadata` header now includes the merged fields too
### Request shape before / after
Before, a websocket `response.create` looked like:
```json
{
"type": "response.create",
"client_metadata": {
"x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\"}"
}
}
```
Even if the app-server caller supplied `clientMetadata`, it was not
represented there.
After, the same request shape is preserved, but the serialized payload
now includes the new turn-scoped fields:
```json
{
"type": "response.create",
"client_metadata": {
"x-codex-turn-metadata": "{\"session_id\":\"...\",\"turn_id\":\"...\",\"fiber_run_id\":\"fiber-start-123\",\"origin\":\"gaas\"}"
}
}
```
## Validation
### Targeted tests added / updated
- protocol round-trip coverage for `clientMetadata` on `turn/start` and
`turn/steer`
- protocol round-trip coverage for `Op::UserInputWithClientMetadata`
- `TurnMetadataState` merge test proving client metadata is added
without overwriting reserved built-in fields
- websocket request-shape test proving outbound `response.create`
contains merged metadata inside
`client_metadata["x-codex-turn-metadata"]`
- app-server integration tests proving:
- `turn/start` forwards `clientMetadata` into the outbound Responses
request path
- websocket warmup + real turn request both behave correctly
- `turn/steer` updates the follow-up request metadata
### Commands run
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
turn_metadata_state_merges_client_metadata_without_replacing_reserved_fields
--lib`
- `cargo test -p codex-core --test all
responses_websocket_preserves_custom_turn_metadata_fields`
- `cargo test -p codex-app-server --test all client_metadata`
- `cargo test -p codex-app-server --test all
turn_start_forwards_client_metadata_to_responses_websocket_request_body_v2
-- --nocapture`
- `just fmt`
- `just fix -p codex-core -p codex-protocol -p codex-app-server-protocol
-p codex-app-server`
- `just fix -p codex-exec -p codex-tui-app-server`
- `just argument-comment-lint`
### Full suite note
`cargo test` in `codex-rs` still fails in:
-
`suite::v2::turn_interrupt::turn_interrupt_resolves_pending_command_approval_request`
I verified that same failure on a clean detached `HEAD` worktree with an
isolated `CARGO_TARGET_DIR`, so it is not caused by this patch.
- Default realtime sessions to v2 and gpt-realtime-1.5 when no override
is configured.
- Add Op::RealtimeConversationStart integration coverage and keep
v1-specific tests explicit.
---------
Co-authored-by: Codex <noreply@openai.com>
- Adds a core-owned realtime backend prompt template and preparation
path.
- Makes omitted realtime start prompts use the core default, while null
or empty prompts intentionally send empty instructions.
- Covers the core realtime path and app-server v2 path with integration
coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
Summary:
- parse the realtime call Location header and join that call over the
direct realtime WebSocket
- keep WebRTC starts alive on the existing realtime conversation path
Validation:
- just fmt
- git diff --check
- cargo check -p codex-api
- cargo check -p codex-core --tests
- local cargo tests not run; relying on PR CI
Adds WebRTC startup to the experimental app-server
`thread/realtime/start` method with an optional transport enum. The
websocket path remains the default; WebRTC offers create the realtime
session through the shared start flow and emit the answer SDP via
`thread/realtime/sdp`.
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`codex-core` was re-exporting APIs owned by sibling `codex-*` crates,
which made downstream crates depend on `codex-core` as a proxy module
instead of the actual owner crate.
Removing those forwards makes crate boundaries explicit and lets leaf
crates drop unnecessary `codex-core` dependencies. In this PR, this
reduces the dependency on `codex-core` to `codex-login` in the following
files:
```
codex-rs/backend-client/Cargo.toml
codex-rs/mcp-server/tests/common/Cargo.toml
```
## What
- Remove `codex-rs/core/src/lib.rs` re-exports for symbols owned by
`codex-login`, `codex-mcp`, `codex-rollout`, `codex-analytics`,
`codex-protocol`, `codex-shell-command`, `codex-sandboxing`,
`codex-tools`, and `codex-utils-path`.
- Delete the `default_client` forwarding shim in `codex-rs/core`.
- Update in-crate and downstream callsites to import directly from the
owning `codex-*` crate.
- Add direct Cargo dependencies where callsites now target the owner
crate, and remove `codex-core` from `codex-rs/backend-client`.
## Why
`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.
This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.
## What changed
- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`
That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.
## Validation
- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`
## Follow-up
- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.
- prefix realtime handoff output with the agent final message label for
both realtime v1 and v2
- update realtime websocket and core expectations to match
- close live realtime sessions on errors, ctrl-c, and active meter
removal
- centralize TUI realtime cleanup and avoid duplicate follow-up close
info
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
- route realtime startup, input, and transport failures through a single
shutdown path
- emit one realtime error/closed lifecycle while clearing session state
once
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
- thread the realtime version into conversation start and app-server
notifications
- keep playback-aware mic gating and playback interruption behavior on
v2 only, leaving v1 on the legacy path
## Stack Position
2/4. Built on top of #14828.
## Base
- #14828
## Unblocks
- #14829
- #14827
## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.
---------
Co-authored-by: Codex <noreply@openai.com>
- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests
## What changed
- The realtime startup-context tests no longer assume the interesting
websocket payload is always `connection 1 / request 0`.
- Instead, they now wait for the first outbound websocket request that
actually carries `session.instructions`, regardless of which websocket
connection won the accept-order race on the runner.
- The env-key fallback test stays serialized because it mutates process
environment.
## Why this fixes the flake
- The old test synchronized on the mirrored `session.updated` client
event and then inspected a fixed websocket slot.
- On CI, the response websocket and the realtime websocket can race each
other during startup. When the response websocket wins that race, the
fixed slot can contain `response.create` instead of the
startup-context-bearing `session.update` request the test actually cares
about.
- That made the test fail nondeterministically by inspecting the wrong
request, or by timing out waiting on a secondary event even though the
real outbound request path was correct.
- Waiting directly on the first request whose payload includes
`session.instructions` removes both ordering assumptions and makes the
assertion line up with the actual contract under test.
- Separately, serializing the environment-mutating fallback case
prevents unrelated tests from seeing partially updated auth state.
## Scope
- Test-only change.
- add experimental_realtime_ws_startup_context to override or disable
realtime websocket startup context
- preserve generated startup context when unset and cover the new
override paths in tests
## Summary
- group recent work by git repo when available, otherwise by directory
- render recent work as bounded user asks with per-thread cwd context
- exclude hidden files and directories from workspace trees
## Summary
- Route delegated realtime handoff turns from all handoff message texts,
preserving order
- Fallback to input_transcript only when no messages are present
- Add regression coverage for multi-message handoff requests
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
## Summary\n- add a websocket test-server request waiter so tests can
synchronize on recorded client messages\n- use that waiter in the
realtime delegation test instead of a fixed audio timeout\n- add
temporary timing logs in the test and websocket mock to inspect where
the flake stalls
## Why
`codex-core::all` has a flaky test,
`suite::realtime_conversation::conversation_start_audio_text_close_round_trip`,
that assumes a fixed ordering between `conversation.item.create` and
`response.input_audio.delta` requests.
That ordering is not guaranteed: realtime text and audio input are
forwarded through separate queues and a background task, so either
request can be observed first while still being correct behavior.
## What Changed
- Updated the assertion in
`codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two
observed request types order-independently.
- Kept the existing checks that `session.create` is sent first and that
exactly two follow-up requests are recorded.
## Verification
- Re-ran `cargo test -p codex-core --test all
conversation_start_audio_text_close_round_trip` 10 times locally.
## Why
`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.
This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.
## What Changed
- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
- `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
- `codex_protocol::protocol`
- `codex_protocol::config_types`
- `codex_protocol::models`
- `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
- `codex-utils-approval-presets`
- `codex-utils-cli`
## Verification
- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
- add top-level `experimental_realtime_ws_backend_prompt` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
`backend_prompt`, with config + realtime tests
- add top-level `experimental_realtime_ws_base_url` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
transport, with config + realtime tests
- Introduce `RealtimeConversationManager` for realtime API management
- Add `op::conversation` to start conversation, insert audio, insert
text, and close conversation.
- emit conversation lifecycle and realtime events.
- Move shared realtime payload types into codex-protocol and add core
e2e websocket tests for start/replace/transport-close paths.
Things to consider:
- Should we use the same `op::` and `Events` channel to carry audio? I
think we should try this simple approach and later we can create
separate one if the channels got congested.
- Sending text updates to the client: we can start simple and later
restrict that.
- Provider auth isn't wired for now intentionally