21 KiB
App-server v2 tracing design
This document proposes a simple, staged tracing design for
codex-rs/app-server with these goals:
- support distributed tracing from client-initiated app-server work into
app-server and
codex-core - keep tracing consistent across the app-server v2 surface area
- minimize tracing boilerplate in request handlers
- avoid introducing tracing-owned lifecycle state that duplicates existing app-server runtime state
This design explicitly avoids a RequestKind taxonomy and avoids
app-server-owned long-lived lifecycle span registries.
Summary
The design has four major pieces:
- A transport-level W3C trace carrier on inbound JSON-RPC request envelopes.
- A centralized app-server request tracing layer that wraps every inbound request in the same request span.
- An internal trace-context handoff through
codex_protocol::Submissionso work that continues incodex-coreinherits the inbound app-server request ancestry. - A core-owned long-lived turn span for turn-producing operations such as
turn/startandreview/start.
Every inbound JSON-RPC request gets a standardized request span.
When an app-server request submits work into core, the current span context is
captured into Submission.trace. Core then creates a short-lived dispatch span
parented from that carrier and, for turn-producing operations, creates a
long-lived turn span beneath it before continuing into its existing task and
model request tracing.
Important:
- request spans stay short-lived
- long-lived turn spans are owned by core, not app-server
- the design does not add app-server-owned long-lived thread or realtime spans
Design goals
-
Distributed tracing first
- Clients should be able to send trace context to app-server.
- App-server should preserve that trace ancestry across the async handoff into core.
- Existing core model request tracing should continue to inherit from the active core span once the handoff occurs.
-
Consistent request instrumentation
- Every inbound request should produce the same request span with the same base attributes.
- Request tracing should be wired at the transport boundary, not repeated in individual handlers.
-
Minimal boilerplate
- Request handlers should not manually parse carriers or build request spans.
- Existing calls to
thread.submit(...)and similar APIs should pick up trace propagation automatically.
-
Minimal business logic pollution
- W3C parsing, OTEL conversion, and span-parenting rules should live in tracing-specific modules.
- App-server business logic should stay focused on request handling, not span management.
-
Incremental rollout
- The first rollout should prove inbound request tracing and app-server -> core propagation.
- Once propagation is in place, core should add a long-lived turn span so a single span covers the actual duration of a turn.
- Thread and realtime lifecycle tracing should wait until there is a concrete need.
Non-goals
- This design does not attempt to make every loaded thread or realtime session correspond to a long-lived tracing span.
- This design does not add tracing-owned thread or realtime state stores in the initial design.
- This design does not require every app-server v2
*Paramstype to carry trace metadata. - This design does not require outbound JSON-RPC trace propagation in the initial rollout.
Why not RequestKind
An earlier direction considered a central RequestKind taxonomy such as
Unary, TurnLifecycle, or RealtimeLifecycle.
That is workable, but it makes tracing depend on a classification that can drift from runtime behavior. The simpler design instead treats tracing as two generic mechanics:
- every inbound request gets the same request span
- any async work that crosses from app-server into core gets the current span
context attached to
Submission
This keeps the initial implementation small and avoids turning tracing into a taxonomy maintenance problem.
Terminology
-
Request span
- A short-lived span for one inbound JSON-RPC request to app-server.
-
W3C trace context
- A serializable representation of distributed trace context based on
traceparentandtracestate.
- A serializable representation of distributed trace context based on
-
Submission trace handoff
- The optional serialized trace context attached to
codex_protocol::Submissionso core can restore parentage after the app-server request handler returns.
- The optional serialized trace context attached to
-
Dispatch span
- A short-lived core span created when the submission loop receives a
Submissionwith trace context.
- A short-lived core span created when the submission loop receives a
-
Turn span
- A long-lived core-owned span representing the actual runtime of a turn from turn start until completion, interruption, or failure.
High-level tracing model
1. Inbound request
For every inbound JSON-RPC request:
- parse an optional W3C trace carrier from the JSON-RPC envelope
- create a standardized request span
- parent that span from the incoming carrier when present
- process the request inside that span
This is true for every request, regardless of whether the API is unary or starts work that continues later.
2. Async handoff into core
Some app-server requests submit work that continues in core after the original
request returns. The critical example is turn/start, but the mechanism should
be generic.
To preserve trace ancestry:
- add an optional
W3cTraceContexttocodex_protocol::Submission - have
CodexThread::submit()capture the current span context into that field automatically - have
codex-corecreate a per-submission dispatch span parented from that carrier
This gives a clean causal chain:
- client span
- app-server request span
- core dispatch span
- core turn span for turn-producing operations
- existing core spans such as
run_turn, sampling, and model request spans
3. Core-owned turn spans
For turn-producing operations such as turn/start and review/start:
- app-server creates the inbound request span
- app-server propagates that request context through
Submission.trace - core creates a dispatch span when it receives the submission
- core then creates a long-lived turn span beneath that dispatch span
- existing core work such as
run_turnand model request tracing runs beneath the turn span
This keeps long-lived span ownership with the layer that actually owns turn execution and completion.
4. Defer thread and realtime lifecycle-heavy tracing
The design should not add:
- app-server-owned thread residency stores
- app-server-owned realtime session stores
App-server already maintains thread subscription and runtime state in existing structures. If later tracing work needs thread loaded-duration or realtime duration metrics, that data should extend those existing structures rather than introducing a parallel tracing-only state machine.
Span model by API shape
The initial implementation keeps the app-server side uniform.
Unary request/response APIs
Examples:
thread/listthread/readmodel/listconfig/readskills/listapp/list
Behavior:
- create request span
- return response
- no additional app-server span state
Turn-producing APIs
Examples:
turn/startreview/startthread/compact/startwhen it executes as a normal turn lifecycle
Behavior:
- create request span
- submit work under that request span
- capture the current span context into
Submission.trace - let core create a dispatch span and then a long-lived turn span
- let the turn span remain open until the real core turn lifecycle ends
Important: request spans should not stay open until eventual streamed completion. The request span ends quickly; the core-owned turn span carries the long-running work.
Other APIs that submit work into core
Examples:
thread/realtime/startthread/realtime/appendAudiothread/realtime/appendTextthread/realtime/stop
Behavior:
- create request span
- submit work under that request span
- capture the current span context into
Submission.trace - let core continue tracing from there
These APIs do not automatically imply a long-lived app-server or core lifecycle span in the initial design.
Thread lifecycle APIs
Examples:
thread/startthread/resumethread/forkthread/unsubscribe
Behavior in the initial design:
- create request span
- annotate with
thread.idwhen known - do not introduce separate app-server lifecycle spans or tracing-only state
If later work needs thread loaded/unloaded metrics, it should reuse the existing thread runtime state already maintained by app-server.
Where the code should live
codex-rs/protocol
Add a small shared W3cTraceContext type to
codex-rs/protocol/src/protocol.rs.
Responsibilities:
- define a serializable W3C trace context type
- avoid direct dependence on OTEL runtime types
- be usable from both protocol crates and runtime crates
Suggested contents:
W3cTraceContexttraceparent: Option<String>tracestate: Option<String>
Suggested Submission change:
Submission { id, op, trace: Option<W3cTraceContext> }
This is the only new internal async handoff needed for the initial rollout.
codex-rs/otel
Add a small helper module or extend existing tracing helpers so OTEL-specific logic stays centralized.
Responsibilities:
- convert
W3cTraceContext-> OTELContext - convert the current tracing span context ->
W3cTraceContext - parent a tracing span from an explicit carrier when present
- apply precedence rules:
- explicit carrier from app-server transport or
Submission.trace - fallback to env
TRACEPARENT/TRACESTATE - otherwise root span
- explicit carrier from app-server transport or
Important:
- keep this focused on carrier parsing and span parenting
- do not move app-server runtime state into
codex-otel - do not overload
OtelManagerwith app-server lifecycle ownership in the initial design
codex-rs/app-server-protocol
Extend inbound JSON-RPC request envelopes in
codex-rs/app-server-protocol/src/jsonrpc_lite.rs
with a dedicated optional trace carrier field.
Suggested shape:
JSONRPCRequest { id, method, params, trace }
Where:
trace: Option<W3cTraceContext>
Important:
- use a dedicated tracing field, not a generic
metabag - keep tracing transport-level and method-agnostic
- do not add trace fields to individual
*Paramsbusiness payloads
codex-rs/core
Make small changes in the submission path in
codex-rs/core/src/codex.rs.
Responsibilities:
- read
Submission.trace - create a per-submission dispatch span parented from that carrier
- run existing submission handling under that span
This is enough for existing core tracing to inherit the correct ancestry, and it is the right place to add the long-lived turn span required for turn lifecycles.
For turn-producing operations, core responsibilities should include:
- read
Submission.trace - create a per-submission dispatch span parented from that carrier
- create a long-lived turn span beneath the dispatch span when the operation actually starts a turn
- finish that turn span when the real core turn lifecycle completes, interrupts, or fails
codex-rs/app-server
Add a small dedicated tracing module rather than spreading request tracing logic across handlers. A likely shape is:
app_server_tracing/mod.rsapp_server_tracing/request_spans.rsapp_server_tracing/incoming.rs
Responsibilities:
- extract incoming W3C trace carriers from JSON-RPC requests
- build standardized request spans
- provide a small API that wraps request handling in the correct span
Non-responsibilities in the initial design:
- no thread residency registry
- no realtime session registry
Standardized request spans
Every inbound request should use the same request-span builder.
Suggested name:
app_server.request
Suggested attributes:
rpc.system = "jsonrpc"rpc.service = "codex-app-server"rpc.methodrpc.transportstdiowebsocket
rpc.request_idapp_server.connection_idapp_server.api_version = "v2"when applicableapp_server.client_namewhen known from initializeapp_server.client_versionwhen known
Optional useful attributes:
thread.idwhen already known from paramsturn.idwhen already known from params
Important:
- the span factory should be the only place that assembles these fields
- handlers should not manually construct request-span attributes
- for the
initializerequest itself, readclientInfo.nameandclientInfo.versiondirectly from the request params when present - for later requests on the same connection, read client metadata from
per-connection session state populated during
initialize
No app-server tracing registries
The design should not introduce app-server-owned tracing registries for turns, threads, or realtime sessions.
Why:
- app-server already has thread subscription and runtime state
- core already owns the real task and turn lifecycle
- a second tracing-specific state machine adds more code and more ways for lifecycle tracking to drift
Future guidance:
- if thread loaded/unloaded metrics become important, extend existing app-server thread state
- keep long-lived turn spans in core
- if realtime lifecycle metrics become important, extend the existing realtime runtime path rather than creating a parallel tracing store
No direct span construction in handlers
Request handlers should not call info_span!, trace_span!, set_parent, or
OTEL APIs directly for app-server request tracing.
Instead:
message_processorshould wrap inbound request handling through the centralized request-span helperCodexThread::submit()should capture the current span context intoSubmission.trace
That keeps request tracing transport-level and largely invisible to business handlers.
Layering
The intended call graph is:
message_processor->app_server_tracing- create and enter the standardized inbound request span
CodexThread::submit()->codex-oteltrace-context helper- snapshot the current span context into
Submission.trace
- snapshot the current span context into
codex-coresubmission loop ->codex-oteltrace-context helper- create a dispatch span parented from
Submission.trace - create a long-lived turn span for turn-producing operations
- create a dispatch span parented from
Important:
- app-server owns inbound request tracing
- core owns execution after the async handoff
- core owns long-lived turn spans
- the design does not add app-server-owned long-lived thread or realtime spans
Inbound flow in app-server
The inbound request path should work like this:
- Parse the JSON-RPC request envelope, including
trace. - Use the tracing module to create a request span.
- Process the request inside that span.
- If the request submits work into core, let
CodexThread::submit()capture the active span context intoSubmission.trace.
Integration point:
Core handoff flow
The turn/start and similar flows cross an async boundary:
- app-server handler submits work
- core submission loop receives
Submission - actual work continues later on different tasks
To preserve parentage:
- app-server request handling runs inside
app_server.request CodexThread::submit()captures that active context intoSubmission.trace- core submission loop creates a dispatch span parented from
Submission.trace - if the submission starts a turn, core creates a long-lived turn span beneath that dispatch span
- existing core spans naturally nest under the turn span
This lets:
- submission handling
- a single long-lived turn span for turn-producing APIs
run_turn- model client request tracing
inherit the app-server request trace without broad tracing changes across core.
Behavior for key v2 APIs
thread/start
- create request span
- annotate with
thread.idonce known - send response and
thread/started - no separate thread lifecycle span in the initial design
thread/resume
- create request span
- annotate with
thread.idwhen known - no separate lifecycle span
thread/fork
- create request span
- annotate with the new
thread.id - no separate lifecycle span
thread/unsubscribe
- create request span
- no separate unload span
- if later thread unload metrics are needed, reuse existing thread state rather than adding a tracing-only registry
turn/start
- create request span
- submit work into core under that request span
- propagate the active span context through
Submission.trace - let core create a dispatch span and then a long-lived turn span
- let that turn span cover the full duration until completion, interruption, or failure
turn/steer
- create request span
- if the request submits core work, propagate via
Submission.trace - otherwise request span only
turn/interrupt
- create request span
- request span only unless core submission is involved
review/start
- treat like
turn/start - let core create the same kind of long-lived turn span
thread/realtime/start, appendAudio, appendText, stop
- create request span
- if the API submits work into core, propagate via
Submission.trace - do not introduce separate realtime lifecycle spans in the initial design
Unary methods such as thread/list
- create request span only
Runtime checks
Keep runtime checks narrowly scoped in the initial rollout:
- warn when an inbound trace carrier is present but invalid
- test that
Submission.traceis set when work is submitted from a traced request
Do not add lifecycle consistency checks for tracing registries that do not exist yet.
Tests
Add tests for the initial mechanics:
- inbound request tracing accepts a valid W3C carrier
- invalid carriers are ignored cleanly
- unary methods create request spans without needing any extra handler changes
turn/startpropagates request ancestry throughSubmission.traceinto coreturn/startcreates a long-lived core-owned turn span- the turn span closes on completion, interruption, or failure
- existing core spans inherit from the propagated parent
The goal is to verify the centralized propagation behavior, not to exhaustively test OTEL internals.
Suggested PR sequence
PR 1: Foundation plus inbound request spans
Scope:
- Introduce a shared
W3cTraceContexttype incodex-protocol. - Add
traceto inbound JSON-RPC request envelopes in app-server protocol. - Add focused trace-context helpers in
codex-rs/otel. - Add the centralized app-server request tracing module.
- Wrap inbound request handling in
message_processor.rs.
Why this PR:
- proves the transport and request-span shape with minimal scope
- gives all inbound app-server APIs consistent request tracing immediately
- avoids mixing lifecycle questions into the initial plumbing review
PR 2: Async handoff into core via Submission
Scope:
- Add
tracetoSubmission. - Have
CodexThread::submit()capture the current span context automatically. - Have the core submission loop restore parentage with a dispatch span.
- Validate the flow with
turn/start.
Why this PR:
- validates the critical async handoff from app-server into core
- proves that existing core tracing can inherit the app-server request ancestry
- keeps the behavior change focused on one boundary
PR 3: Core-owned long-lived turn spans
Scope:
- Add a long-lived turn span in core for
turn/start. - Reuse the same turn-span pattern for
review/start. - Ensure the span closes on completion, interruption, or failure.
Why this PR:
- completes the minimum useful tracing story for turn lifecycles
- keeps long-lived span ownership in the layer that actually owns the turn
- still builds on the simpler propagation model from PR 2 instead of mixing everything into one change
PR 4: Optional follow-ups
Possible follow-ups:
- Reuse existing app-server thread state to add thread loaded/unloaded duration metrics if needed.
- Reuse existing realtime runtime state to add realtime duration metrics if needed.
- Add outbound JSON-RPC trace propagation only if there is a concrete client-side tracing use case.
Rollout guidance
Start with:
- inbound request spans for all app-server requests
turn/startrequest -> core propagation- a core-owned long-lived turn span for
turn/start
Those pieces exercise the important mechanics:
- inbound carrier extraction
- request span creation
- async handoff into core
- inherited core tracing beneath the propagated parent
- a single span covering the full duration of a turn
After that, only add more lifecycle-specific tracing if a real debugging or observability gap remains.
Bottom line
The recommended initial design is:
- trace context on inbound JSON-RPC request envelopes
- one standardized request span for every inbound request
- automatic propagation through
Submissioninto core - core-owned long-lived turn spans for turn-producing APIs
- OTEL conversion and carrier logic centralized in
codex-otel - no app-server-owned tracing registries for turns, threads, or realtime sessions in the initial implementation
This gives app-server distributed tracing that is:
- consistent
- low-boilerplate
- modular
- aligned with the existing ownership boundaries in app-server and core