chore(app-server): optimize thread_builder upsert_item

simplify
tracing design
2026-02-28 03:33:57 +00:00 · 2026-02-27 11:13:18 -08:00 · 2026-02-26 16:25:42 -08:00 · 2026-02-26 15:13:10 -08:00
2 changed files with 695 additions and 4 deletions
--- a/app_server_tracing_design.md
+++ b/app_server_tracing_design.md
@@ -0,0 +1,691 @@
+# App-server v2 tracing design
+
+This document proposes a simple, staged tracing design for
+`codex-rs/app-server` with these goals:
+
+- support distributed tracing from client-initiated app-server work into
+  app-server and `codex-core`
+- keep tracing consistent across the app-server v2 surface area
+- minimize tracing boilerplate in request handlers
+- avoid introducing tracing-owned lifecycle state that duplicates existing
+  app-server runtime state
+
+This design explicitly avoids a `RequestKind` taxonomy and avoids
+app-server-owned long-lived lifecycle span registries.
+
+## Summary
+
+The design has four major pieces:
+
+1. A transport-level W3C trace carrier on inbound JSON-RPC request envelopes.
+2. A centralized app-server request tracing layer that wraps every inbound
+   request in the same request span.
+3. An internal trace-context handoff through `codex_protocol::Submission` so
+   work that continues in `codex-core` inherits the inbound app-server request
+   ancestry.
+4. A core-owned long-lived turn span for turn-producing operations such as
+   `turn/start` and `review/start`.
+
+Every inbound JSON-RPC request gets a standardized request span.
+
+When an app-server request submits work into core, the current span context is
+captured into `Submission.trace`. Core then creates a short-lived dispatch span
+parented from that carrier and, for turn-producing operations, creates a
+long-lived turn span beneath it before continuing into its existing task and
+model request tracing.
+
+Important:
+
+- request spans stay short-lived
+- long-lived turn spans are owned by core, not app-server
+- the design does not add app-server-owned long-lived thread or realtime spans
+
+## Design goals
+
+- **Distributed tracing first**
+  - Clients should be able to send trace context to app-server.
+  - App-server should preserve that trace ancestry across the async handoff into
+    core.
+  - Existing core model request tracing should continue to inherit from the
+    active core span once the handoff occurs.
+
+- **Consistent request instrumentation**
+  - Every inbound request should produce the same request span with the same
+    base attributes.
+  - Request tracing should be wired at the transport boundary, not repeated in
+    individual handlers.
+
+- **Minimal boilerplate**
+  - Request handlers should not manually parse carriers or build request spans.
+  - Existing calls to `thread.submit(...)` and similar APIs should pick up trace
+    propagation automatically.
+
+- **Minimal business logic pollution**
+  - W3C parsing, OTEL conversion, and span-parenting rules should live in
+    tracing-specific modules.
+  - App-server business logic should stay focused on request handling, not span
+    management.
+
+- **Incremental rollout**
+  - The first rollout should prove inbound request tracing and app-server ->
+    core propagation.
+  - Once propagation is in place, core should add a long-lived turn span so a
+    single span covers the actual duration of a turn.
+  - Thread and realtime lifecycle tracing should wait until there is a concrete
+    need.
+
+## Non-goals
+
+- This design does not attempt to make every loaded thread or realtime session
+  correspond to a long-lived tracing span.
+- This design does not add tracing-owned thread or realtime state stores in the
+  initial design.
+- This design does not require every app-server v2 `*Params` type to carry
+  trace metadata.
+- This design does not require outbound JSON-RPC trace propagation in the
+  initial rollout.
+
+## Why not `RequestKind`
+
+An earlier direction considered a central `RequestKind` taxonomy such as
+`Unary`, `TurnLifecycle`, or `RealtimeLifecycle`.
+
+That is workable, but it makes tracing depend on a classification that can
+drift from runtime behavior. The simpler design instead treats tracing as two
+generic mechanics:
+
+- every inbound request gets the same request span
+- any async work that crosses from app-server into core gets the current span
+  context attached to `Submission`
+
+This keeps the initial implementation small and avoids turning tracing into a
+taxonomy maintenance problem.
+
+## Terminology
+
+- **Request span**
+  - A short-lived span for one inbound JSON-RPC request to app-server.
+
+- **W3C trace context**
+  - A serializable representation of distributed trace context based on
+    `traceparent` and `tracestate`.
+
+- **Submission trace handoff**
+  - The optional serialized trace context attached to
+    `codex_protocol::Submission` so core can restore parentage after the
+    app-server request handler returns.
+
+- **Dispatch span**
+  - A short-lived core span created when the submission loop receives a
+    `Submission` with trace context.
+
+- **Turn span**
+  - A long-lived core-owned span representing the actual runtime of a turn from
+    turn start until completion, interruption, or failure.
+
+## High-level tracing model
+
+### 1. Inbound request
+
+For every inbound JSON-RPC request:
+
+1. parse an optional W3C trace carrier from the JSON-RPC envelope
+2. create a standardized request span
+3. parent that span from the incoming carrier when present
+4. process the request inside that span
+
+This is true for every request, regardless of whether the API is unary or
+starts work that continues later.
+
+### 2. Async handoff into core
+
+Some app-server requests submit work that continues in core after the original
+request returns. The critical example is `turn/start`, but the mechanism should
+be generic.
+
+To preserve trace ancestry:
+
+- add an optional `W3cTraceContext` to `codex_protocol::Submission`
+- have `CodexThread::submit()` capture the current span context into that field
+  automatically
+- have `codex-core` create a per-submission dispatch span parented from that
+  carrier
+
+This gives a clean causal chain:
+
+- client span
+- app-server request span
+- core dispatch span
+- core turn span for turn-producing operations
+- existing core spans such as `run_turn`, sampling, and model request spans
+
+### 3. Core-owned turn spans
+
+For turn-producing operations such as `turn/start` and `review/start`:
+
+- app-server creates the inbound request span
+- app-server propagates that request context through `Submission.trace`
+- core creates a dispatch span when it receives the submission
+- core then creates a long-lived turn span beneath that dispatch span
+- existing core work such as `run_turn` and model request tracing runs beneath
+  the turn span
+
+This keeps long-lived span ownership with the layer that actually owns turn
+execution and completion.
+
+### 4. Defer thread and realtime lifecycle-heavy tracing
+
+The design should not add:
+
+- app-server-owned thread residency stores
+- app-server-owned realtime session stores
+
+App-server already maintains thread subscription and runtime state in existing
+structures. If later tracing work needs thread loaded-duration or realtime
+duration metrics, that data should extend those existing structures rather than
+introducing a parallel tracing-only state machine.
+
+## Span model by API shape
+
+The initial implementation keeps the app-server side uniform.
+
+### Unary request/response APIs
+
+Examples:
+
+- `thread/list`
+- `thread/read`
+- `model/list`
+- `config/read`
+- `skills/list`
+- `app/list`
+
+Behavior:
+
+- create request span
+- return response
+- no additional app-server span state
+
+### Turn-producing APIs
+
+Examples:
+
+- `turn/start`
+- `review/start`
+- `thread/compact/start` when it executes as a normal turn lifecycle
+
+Behavior:
+
+- create request span
+- submit work under that request span
+- capture the current span context into `Submission.trace`
+- let core create a dispatch span and then a long-lived turn span
+- let the turn span remain open until the real core turn lifecycle ends
+
+Important: request spans should not stay open until eventual streamed
+completion. The request span ends quickly; the core-owned turn span carries the
+long-running work.
+
+### Other APIs that submit work into core
+
+Examples:
+
+- `thread/realtime/start`
+- `thread/realtime/appendAudio`
+- `thread/realtime/appendText`
+- `thread/realtime/stop`
+
+Behavior:
+
+- create request span
+- submit work under that request span
+- capture the current span context into `Submission.trace`
+- let core continue tracing from there
+
+These APIs do not automatically imply a long-lived app-server or core lifecycle
+span in the initial design.
+
+### Thread lifecycle APIs
+
+Examples:
+
+- `thread/start`
+- `thread/resume`
+- `thread/fork`
+- `thread/unsubscribe`
+
+Behavior in the initial design:
+
+- create request span
+- annotate with `thread.id` when known
+- do not introduce separate app-server lifecycle spans or tracing-only state
+
+If later work needs thread loaded/unloaded metrics, it should reuse the existing
+thread runtime state already maintained by app-server.
+
+## Where the code should live
+
+### `codex-rs/protocol`
+
+Add a small shared `W3cTraceContext` type to
+[`codex-rs/protocol/src/protocol.rs`](/Users/owen/repos/codex3/codex-rs/protocol/src/protocol.rs).
+
+Responsibilities:
+
+- define a serializable W3C trace context type
+- avoid direct dependence on OTEL runtime types
+- be usable from both protocol crates and runtime crates
+
+Suggested contents:
+
+- `W3cTraceContext`
+  - `traceparent: Option<String>`
+  - `tracestate: Option<String>`
+
+Suggested `Submission` change:
+
+- `Submission { id, op, trace: Option<W3cTraceContext> }`
+
+This is the only new internal async handoff needed for the initial rollout.
+
+### `codex-rs/otel`
+
+Add a small helper module or extend existing tracing helpers so OTEL-specific
+logic stays centralized.
+
+Responsibilities:
+
+- convert `W3cTraceContext` -> OTEL `Context`
+- convert the current tracing span context -> `W3cTraceContext`
+- parent a tracing span from an explicit carrier when present
+- apply precedence rules:
+  - explicit carrier from app-server transport or `Submission.trace`
+  - fallback to env `TRACEPARENT` / `TRACESTATE`
+  - otherwise root span
+
+Important:
+
+- keep this focused on carrier parsing and span parenting
+- do not move app-server runtime state into `codex-otel`
+- do not overload `OtelManager` with app-server lifecycle ownership in the
+  initial design
+
+### `codex-rs/app-server-protocol`
+
+Extend inbound JSON-RPC request envelopes in
+[`codex-rs/app-server-protocol/src/jsonrpc_lite.rs`](/Users/owen/repos/codex3/codex-rs/app-server-protocol/src/jsonrpc_lite.rs)
+with a dedicated optional trace carrier field.
+
+Suggested shape:
+
+- `JSONRPCRequest { id, method, params, trace }`
+
+Where:
+
+- `trace: Option<W3cTraceContext>`
+
+Important:
+
+- use a dedicated tracing field, not a generic `meta` bag
+- keep tracing transport-level and method-agnostic
+- do not add trace fields to individual `*Params` business payloads
+
+### `codex-rs/core`
+
+Make small changes in the submission path in
+[`codex-rs/core/src/codex.rs`](/Users/owen/repos/codex3/codex-rs/core/src/codex.rs).
+
+Responsibilities:
+
+- read `Submission.trace`
+- create a per-submission dispatch span parented from that carrier
+- run existing submission handling under that span
+
+This is enough for existing core tracing to inherit the correct ancestry, and
+it is the right place to add the long-lived turn span required for turn
+lifecycles.
+
+For turn-producing operations, core responsibilities should include:
+
+- read `Submission.trace`
+- create a per-submission dispatch span parented from that carrier
+- create a long-lived turn span beneath the dispatch span when the operation
+  actually starts a turn
+- finish that turn span when the real core turn lifecycle completes,
+  interrupts, or fails
+
+### `codex-rs/app-server`
+
+Add a small dedicated tracing module rather than spreading request tracing logic
+across handlers. A likely shape is:
+
+- `app_server_tracing/mod.rs`
+- `app_server_tracing/request_spans.rs`
+- `app_server_tracing/incoming.rs`
+
+Responsibilities:
+
+- extract incoming W3C trace carriers from JSON-RPC requests
+- build standardized request spans
+- provide a small API that wraps request handling in the correct span
+
+Non-responsibilities in the initial design:
+
+- no thread residency registry
+- no realtime session registry
+
+## Standardized request spans
+
+Every inbound request should use the same request-span builder.
+
+Suggested name:
+
+- `app_server.request`
+
+Suggested attributes:
+
+- `rpc.system = "jsonrpc"`
+- `rpc.service = "codex-app-server"`
+- `rpc.method`
+- `rpc.transport`
+  - `stdio`
+  - `websocket`
+- `rpc.request_id`
+- `app_server.connection_id`
+- `app_server.api_version = "v2"` when applicable
+- `app_server.client_name` when known from initialize
+- `app_server.client_version` when known
+
+Optional useful attributes:
+
+- `thread.id` when already known from params
+- `turn.id` when already known from params
+
+Important:
+
+- the span factory should be the only place that assembles these fields
+- handlers should not manually construct request-span attributes
+- for the `initialize` request itself, read `clientInfo.name` and
+  `clientInfo.version` directly from the request params when present
+- for later requests on the same connection, read client metadata from
+  per-connection session state populated during `initialize`
+
+## No app-server tracing registries
+
+The design should not introduce app-server-owned tracing registries for turns,
+threads, or realtime sessions.
+
+Why:
+
+- app-server already has thread subscription and runtime state
+- core already owns the real task and turn lifecycle
+- a second tracing-specific state machine adds more code and more ways for
+  lifecycle tracking to drift
+
+Future guidance:
+
+- if thread loaded/unloaded metrics become important, extend existing app-server
+  thread state
+- keep long-lived turn spans in core
+- if realtime lifecycle metrics become important, extend the existing realtime
+  runtime path rather than creating a parallel tracing store
+
+## No direct span construction in handlers
+
+Request handlers should not call `info_span!`, `trace_span!`, `set_parent`, or
+OTEL APIs directly for app-server request tracing.
+
+Instead:
+
+- `message_processor` should wrap inbound request handling through the
+  centralized request-span helper
+- `CodexThread::submit()` should capture the current span context into
+  `Submission.trace`
+
+That keeps request tracing transport-level and largely invisible to business
+handlers.
+
+## Layering
+
+The intended call graph is:
+
+- `message_processor` -> `app_server_tracing`
+  - create and enter the standardized inbound request span
+- `CodexThread::submit()` -> `codex-otel` trace-context helper
+  - snapshot the current span context into `Submission.trace`
+- `codex-core` submission loop -> `codex-otel` trace-context helper
+  - create a dispatch span parented from `Submission.trace`
+  - create a long-lived turn span for turn-producing operations
+
+Important:
+
+- app-server owns inbound request tracing
+- core owns execution after the async handoff
+- core owns long-lived turn spans
+- the design does not add app-server-owned long-lived thread or realtime spans
+
+## Inbound flow in app-server
+
+The inbound request path should work like this:
+
+1. Parse the JSON-RPC request envelope, including `trace`.
+2. Use the tracing module to create a request span.
+3. Process the request inside that span.
+4. If the request submits work into core, let `CodexThread::submit()` capture
+   the active span context into `Submission.trace`.
+
+Integration point:
+
+- [`codex-rs/app-server/src/message_processor.rs`](/Users/owen/repos/codex3/codex-rs/app-server/src/message_processor.rs)
+
+## Core handoff flow
+
+The `turn/start` and similar flows cross an async boundary:
+
+- app-server handler submits work
+- core submission loop receives `Submission`
+- actual work continues later on different tasks
+
+To preserve parentage:
+
+1. app-server request handling runs inside `app_server.request`
+2. `CodexThread::submit()` captures that active context into `Submission.trace`
+3. core submission loop creates a dispatch span parented from `Submission.trace`
+4. if the submission starts a turn, core creates a long-lived turn span beneath
+   that dispatch span
+5. existing core spans naturally nest under the turn span
+
+This lets:
+
+- submission handling
+- a single long-lived turn span for turn-producing APIs
+- `run_turn`
+- model client request tracing
+
+inherit the app-server request trace without broad tracing changes across core.
+
+## Behavior for key v2 APIs
+
+### `thread/start`
+
+- create request span
+- annotate with `thread.id` once known
+- send response and `thread/started`
+- no separate thread lifecycle span in the initial design
+
+### `thread/resume`
+
+- create request span
+- annotate with `thread.id` when known
+- no separate lifecycle span
+
+### `thread/fork`
+
+- create request span
+- annotate with the new `thread.id`
+- no separate lifecycle span
+
+### `thread/unsubscribe`
+
+- create request span
+- no separate unload span
+- if later thread unload metrics are needed, reuse existing thread state rather
+  than adding a tracing-only registry
+
+### `turn/start`
+
+- create request span
+- submit work into core under that request span
+- propagate the active span context through `Submission.trace`
+- let core create a dispatch span and then a long-lived turn span
+- let that turn span cover the full duration until completion, interruption, or
+  failure
+
+### `turn/steer`
+
+- create request span
+- if the request submits core work, propagate via `Submission.trace`
+- otherwise request span only
+
+### `turn/interrupt`
+
+- create request span
+- request span only unless core submission is involved
+
+### `review/start`
+
+- treat like `turn/start`
+- let core create the same kind of long-lived turn span
+
+### `thread/realtime/start`, `appendAudio`, `appendText`, `stop`
+
+- create request span
+- if the API submits work into core, propagate via `Submission.trace`
+- do not introduce separate realtime lifecycle spans in the initial design
+
+### Unary methods such as `thread/list`
+
+- create request span only
+
+## Runtime checks
+
+Keep runtime checks narrowly scoped in the initial rollout:
+
+- warn when an inbound trace carrier is present but invalid
+- test that `Submission.trace` is set when work is submitted from a traced
+  request
+
+Do not add lifecycle consistency checks for tracing registries that do not
+exist yet.
+
+## Tests
+
+Add tests for the initial mechanics:
+
+- inbound request tracing accepts a valid W3C carrier
+- invalid carriers are ignored cleanly
+- unary methods create request spans without needing any extra handler changes
+- `turn/start` propagates request ancestry through `Submission.trace` into core
+- `turn/start` creates a long-lived core-owned turn span
+- the turn span closes on completion, interruption, or failure
+- existing core spans inherit from the propagated parent
+
+The goal is to verify the centralized propagation behavior, not to exhaustively
+test OTEL internals.
+
+## Suggested PR sequence
+
+### PR 1: Foundation plus inbound request spans
+
+Scope:
+
+1. Introduce a shared `W3cTraceContext` type in `codex-protocol`.
+2. Add `trace` to inbound JSON-RPC request envelopes in app-server protocol.
+3. Add focused trace-context helpers in `codex-rs/otel`.
+4. Add the centralized app-server request tracing module.
+5. Wrap inbound request handling in `message_processor.rs`.
+
+Why this PR:
+
+- proves the transport and request-span shape with minimal scope
+- gives all inbound app-server APIs consistent request tracing immediately
+- avoids mixing lifecycle questions into the initial plumbing review
+
+### PR 2: Async handoff into core via `Submission`
+
+Scope:
+
+1. Add `trace` to `Submission`.
+2. Have `CodexThread::submit()` capture the current span context automatically.
+3. Have the core submission loop restore parentage with a dispatch span.
+4. Validate the flow with `turn/start`.
+
+Why this PR:
+
+- validates the critical async handoff from app-server into core
+- proves that existing core tracing can inherit the app-server request ancestry
+- keeps the behavior change focused on one boundary
+
+### PR 3: Core-owned long-lived turn spans
+
+Scope:
+
+1. Add a long-lived turn span in core for `turn/start`.
+2. Reuse the same turn-span pattern for `review/start`.
+3. Ensure the span closes on completion, interruption, or failure.
+
+Why this PR:
+
+- completes the minimum useful tracing story for turn lifecycles
+- keeps long-lived span ownership in the layer that actually owns the turn
+- still builds on the simpler propagation model from PR 2 instead of mixing
+  everything into one change
+
+### PR 4: Optional follow-ups
+
+Possible follow-ups:
+
+1. Reuse existing app-server thread state to add thread loaded/unloaded duration
+   metrics if needed.
+2. Reuse existing realtime runtime state to add realtime duration metrics if
+   needed.
+3. Add outbound JSON-RPC trace propagation only if there is a concrete
+   client-side tracing use case.
+
+## Rollout guidance
+
+Start with:
+
+- inbound request spans for all app-server requests
+- `turn/start` request -> core propagation
+- a core-owned long-lived turn span for `turn/start`
+
+Those pieces exercise the important mechanics:
+
+- inbound carrier extraction
+- request span creation
+- async handoff into core
+- inherited core tracing beneath the propagated parent
+- a single span covering the full duration of a turn
+
+After that, only add more lifecycle-specific tracing if a real debugging or
+observability gap remains.
+
+## Bottom line
+
+The recommended initial design is:
+
+- trace context on inbound JSON-RPC request envelopes
+- one standardized request span for every inbound request
+- automatic propagation through `Submission` into core
+- core-owned long-lived turn spans for turn-producing APIs
+- OTEL conversion and carrier logic centralized in `codex-otel`
+- no app-server-owned tracing registries for turns, threads, or realtime
+  sessions in the initial implementation
+
+This gives app-server distributed tracing that is:
+
+- consistent
+- low-boilerplate
+- modular
+- aligned with the existing ownership boundaries in app-server and core
--- a/codex-rs/app-server-protocol/src/protocol/thread_history.rs
+++ b/codex-rs/app-server-protocol/src/protocol/thread_history.rs
@@ -1010,11 +1010,11 @@ fn format_file_change_diff(change: &codex_protocol::protocol::FileChange) -> Str
 }

 fn upsert_turn_item(items: &mut Vec<ThreadItem>, item: ThreadItem) {
-    if let Some(existing_item) = items
-        .iter_mut()
-        .find(|existing_item| existing_item.id() == item.id())
+    if let Some(index) = items
+        .iter()
+        .rposition(|existing_item| existing_item.id() == item.id())
    {
-        *existing_item = item;
+        items[index] = item;
        return;
    }
    items.push(item);
Author	SHA1	Message	Date
Owen Lin	4a25699826	chore(app-server): optimize thread_builder upsert_item	2026-02-27 11:13:18 -08:00
Owen Lin	212e014f6b	simplify	2026-02-26 16:25:42 -08:00
Owen Lin	7405197511	tracing design	2026-02-26 15:13:10 -08:00