clone/codex

Fork 0

mirror of https://github.com/openai/codex.git synced 2026-04-24 22:54:54 +00:00

Files

Owen Lin 212e014f6b simplify

2026-02-26 16:25:42 -08:00

21 KiB

Raw Blame History

App-server v2 tracing design

This document proposes a simple, staged tracing design for codex-rs/app-server with these goals:

support distributed tracing from client-initiated app-server work into app-server and codex-core
keep tracing consistent across the app-server v2 surface area
minimize tracing boilerplate in request handlers
avoid introducing tracing-owned lifecycle state that duplicates existing app-server runtime state

This design explicitly avoids a RequestKind taxonomy and avoids app-server-owned long-lived lifecycle span registries.

Summary

The design has four major pieces:

A transport-level W3C trace carrier on inbound JSON-RPC request envelopes.
A centralized app-server request tracing layer that wraps every inbound request in the same request span.
An internal trace-context handoff through codex_protocol::Submission so work that continues in codex-core inherits the inbound app-server request ancestry.
A core-owned long-lived turn span for turn-producing operations such as turn/start and review/start.

Every inbound JSON-RPC request gets a standardized request span.

When an app-server request submits work into core, the current span context is captured into Submission.trace. Core then creates a short-lived dispatch span parented from that carrier and, for turn-producing operations, creates a long-lived turn span beneath it before continuing into its existing task and model request tracing.

Important:

request spans stay short-lived
long-lived turn spans are owned by core, not app-server
the design does not add app-server-owned long-lived thread or realtime spans

Design goals

Distributed tracing first
- Clients should be able to send trace context to app-server.
- App-server should preserve that trace ancestry across the async handoff into core.
- Existing core model request tracing should continue to inherit from the active core span once the handoff occurs.
Consistent request instrumentation
- Every inbound request should produce the same request span with the same base attributes.
- Request tracing should be wired at the transport boundary, not repeated in individual handlers.
Minimal boilerplate
- Request handlers should not manually parse carriers or build request spans.
- Existing calls to thread.submit(...) and similar APIs should pick up trace propagation automatically.
Minimal business logic pollution
- W3C parsing, OTEL conversion, and span-parenting rules should live in tracing-specific modules.
- App-server business logic should stay focused on request handling, not span management.
Incremental rollout
- The first rollout should prove inbound request tracing and app-server -> core propagation.
- Once propagation is in place, core should add a long-lived turn span so a single span covers the actual duration of a turn.
- Thread and realtime lifecycle tracing should wait until there is a concrete need.

Non-goals

This design does not attempt to make every loaded thread or realtime session correspond to a long-lived tracing span.
This design does not add tracing-owned thread or realtime state stores in the initial design.
This design does not require every app-server v2 *Params type to carry trace metadata.
This design does not require outbound JSON-RPC trace propagation in the initial rollout.

Why not `RequestKind`

An earlier direction considered a central RequestKind taxonomy such as Unary, TurnLifecycle, or RealtimeLifecycle.

That is workable, but it makes tracing depend on a classification that can drift from runtime behavior. The simpler design instead treats tracing as two generic mechanics:

every inbound request gets the same request span
any async work that crosses from app-server into core gets the current span context attached to Submission

This keeps the initial implementation small and avoids turning tracing into a taxonomy maintenance problem.

Terminology

Request span
- A short-lived span for one inbound JSON-RPC request to app-server.
W3C trace context
- A serializable representation of distributed trace context based on traceparent and tracestate.
Submission trace handoff
- The optional serialized trace context attached to codex_protocol::Submission so core can restore parentage after the app-server request handler returns.
Dispatch span
- A short-lived core span created when the submission loop receives a Submission with trace context.
Turn span
- A long-lived core-owned span representing the actual runtime of a turn from turn start until completion, interruption, or failure.

High-level tracing model

1. Inbound request

For every inbound JSON-RPC request:

parse an optional W3C trace carrier from the JSON-RPC envelope
create a standardized request span
parent that span from the incoming carrier when present
process the request inside that span

This is true for every request, regardless of whether the API is unary or starts work that continues later.

2. Async handoff into core

Some app-server requests submit work that continues in core after the original request returns. The critical example is turn/start, but the mechanism should be generic.

To preserve trace ancestry:

add an optional W3cTraceContext to codex_protocol::Submission
have CodexThread::submit() capture the current span context into that field automatically
have codex-core create a per-submission dispatch span parented from that carrier

This gives a clean causal chain:

client span
app-server request span
core dispatch span
core turn span for turn-producing operations
existing core spans such as run_turn, sampling, and model request spans

3. Core-owned turn spans

For turn-producing operations such as turn/start and review/start:

app-server creates the inbound request span
app-server propagates that request context through Submission.trace
core creates a dispatch span when it receives the submission
core then creates a long-lived turn span beneath that dispatch span
existing core work such as run_turn and model request tracing runs beneath the turn span

This keeps long-lived span ownership with the layer that actually owns turn execution and completion.

4. Defer thread and realtime lifecycle-heavy tracing

The design should not add:

app-server-owned thread residency stores
app-server-owned realtime session stores

App-server already maintains thread subscription and runtime state in existing structures. If later tracing work needs thread loaded-duration or realtime duration metrics, that data should extend those existing structures rather than introducing a parallel tracing-only state machine.

Span model by API shape

The initial implementation keeps the app-server side uniform.

Unary request/response APIs

Examples:

thread/list
thread/read
model/list
config/read
skills/list
app/list

Behavior:

create request span
return response
no additional app-server span state

Turn-producing APIs

Examples:

turn/start
review/start
thread/compact/start when it executes as a normal turn lifecycle

Behavior:

create request span
submit work under that request span
capture the current span context into Submission.trace
let core create a dispatch span and then a long-lived turn span
let the turn span remain open until the real core turn lifecycle ends

Important: request spans should not stay open until eventual streamed completion. The request span ends quickly; the core-owned turn span carries the long-running work.

Other APIs that submit work into core

Examples:

thread/realtime/start
thread/realtime/appendAudio
thread/realtime/appendText
thread/realtime/stop

Behavior:

create request span
submit work under that request span
capture the current span context into Submission.trace
let core continue tracing from there

These APIs do not automatically imply a long-lived app-server or core lifecycle span in the initial design.

Thread lifecycle APIs

Examples:

thread/start
thread/resume
thread/fork
thread/unsubscribe

Behavior in the initial design:

create request span
annotate with thread.id when known
do not introduce separate app-server lifecycle spans or tracing-only state

If later work needs thread loaded/unloaded metrics, it should reuse the existing thread runtime state already maintained by app-server.

Where the code should live

`codex-rs/protocol`

Add a small shared W3cTraceContext type to codex-rs/protocol/src/protocol.rs.

Responsibilities:

define a serializable W3C trace context type
avoid direct dependence on OTEL runtime types
be usable from both protocol crates and runtime crates

`codex-rs/otel`

Add a small helper module or extend existing tracing helpers so OTEL-specific logic stays centralized.

Responsibilities:

convert W3cTraceContext -> OTEL Context
convert the current tracing span context -> W3cTraceContext
parent a tracing span from an explicit carrier when present
apply precedence rules:
- explicit carrier from app-server transport or Submission.trace
- fallback to env TRACEPARENT / TRACESTATE
- otherwise root span

Important:

keep this focused on carrier parsing and span parenting
do not move app-server runtime state into codex-otel
do not overload OtelManager with app-server lifecycle ownership in the initial design

`codex-rs/app-server-protocol`

Extend inbound JSON-RPC request envelopes in codex-rs/app-server-protocol/src/jsonrpc_lite.rs with a dedicated optional trace carrier field.

Suggested shape:

JSONRPCRequest { id, method, params, trace }

Where:

trace: Option<W3cTraceContext>

Important:

use a dedicated tracing field, not a generic meta bag
keep tracing transport-level and method-agnostic
do not add trace fields to individual *Params business payloads

`codex-rs/core`

Make small changes in the submission path in codex-rs/core/src/codex.rs.

Responsibilities:

read Submission.trace
create a per-submission dispatch span parented from that carrier
run existing submission handling under that span

This is enough for existing core tracing to inherit the correct ancestry, and it is the right place to add the long-lived turn span required for turn lifecycles.

For turn-producing operations, core responsibilities should include:

read Submission.trace
create a per-submission dispatch span parented from that carrier
create a long-lived turn span beneath the dispatch span when the operation actually starts a turn
finish that turn span when the real core turn lifecycle completes, interrupts, or fails

`codex-rs/app-server`

Add a small dedicated tracing module rather than spreading request tracing logic across handlers. A likely shape is:

app_server_tracing/mod.rs
app_server_tracing/request_spans.rs
app_server_tracing/incoming.rs

Responsibilities:

extract incoming W3C trace carriers from JSON-RPC requests
build standardized request spans
provide a small API that wraps request handling in the correct span

Non-responsibilities in the initial design:

no thread residency registry
no realtime session registry

Standardized request spans

Every inbound request should use the same request-span builder.

Suggested name:

app_server.request

Suggested attributes:

rpc.system = "jsonrpc"
rpc.service = "codex-app-server"
rpc.method
rpc.transport
- stdio
- websocket
rpc.request_id
app_server.connection_id
app_server.api_version = "v2" when applicable
app_server.client_name when known from initialize
app_server.client_version when known

Optional useful attributes:

thread.id when already known from params
turn.id when already known from params

Important:

the span factory should be the only place that assembles these fields
handlers should not manually construct request-span attributes
for the initialize request itself, read clientInfo.name and clientInfo.version directly from the request params when present
for later requests on the same connection, read client metadata from per-connection session state populated during initialize

No app-server tracing registries

The design should not introduce app-server-owned tracing registries for turns, threads, or realtime sessions.

Why:

app-server already has thread subscription and runtime state
core already owns the real task and turn lifecycle
a second tracing-specific state machine adds more code and more ways for lifecycle tracking to drift

Future guidance:

if thread loaded/unloaded metrics become important, extend existing app-server thread state
keep long-lived turn spans in core
if realtime lifecycle metrics become important, extend the existing realtime runtime path rather than creating a parallel tracing store

No direct span construction in handlers

Request handlers should not call info_span!, trace_span!, set_parent, or OTEL APIs directly for app-server request tracing.

Instead:

message_processor should wrap inbound request handling through the centralized request-span helper
CodexThread::submit() should capture the current span context into Submission.trace

That keeps request tracing transport-level and largely invisible to business handlers.

Layering

The intended call graph is:

message_processor -> app_server_tracing
- create and enter the standardized inbound request span
CodexThread::submit() -> codex-otel trace-context helper
- snapshot the current span context into Submission.trace
codex-core submission loop -> codex-otel trace-context helper
- create a dispatch span parented from Submission.trace
- create a long-lived turn span for turn-producing operations

Important:

app-server owns inbound request tracing
core owns execution after the async handoff
core owns long-lived turn spans
the design does not add app-server-owned long-lived thread or realtime spans

Inbound flow in app-server

The inbound request path should work like this:

Parse the JSON-RPC request envelope, including trace.
Use the tracing module to create a request span.
Process the request inside that span.
If the request submits work into core, let CodexThread::submit() capture the active span context into Submission.trace.

Integration point:

codex-rs/app-server/src/message_processor.rs

Core handoff flow

The turn/start and similar flows cross an async boundary:

app-server handler submits work
core submission loop receives Submission
actual work continues later on different tasks

To preserve parentage:

app-server request handling runs inside app_server.request
CodexThread::submit() captures that active context into Submission.trace
core submission loop creates a dispatch span parented from Submission.trace
if the submission starts a turn, core creates a long-lived turn span beneath that dispatch span
existing core spans naturally nest under the turn span

This lets:

submission handling
a single long-lived turn span for turn-producing APIs
run_turn
model client request tracing

inherit the app-server request trace without broad tracing changes across core.

Behavior for key v2 APIs

`thread/start`

create request span
annotate with thread.id once known
send response and thread/started
no separate thread lifecycle span in the initial design

`thread/resume`

create request span
annotate with thread.id when known
no separate lifecycle span

`thread/fork`

create request span
annotate with the new thread.id
no separate lifecycle span

`thread/unsubscribe`

create request span
no separate unload span
if later thread unload metrics are needed, reuse existing thread state rather than adding a tracing-only registry

`turn/start`

create request span
submit work into core under that request span
propagate the active span context through Submission.trace
let core create a dispatch span and then a long-lived turn span
let that turn span cover the full duration until completion, interruption, or failure

`turn/steer`

create request span
if the request submits core work, propagate via Submission.trace
otherwise request span only

`turn/interrupt`

create request span
request span only unless core submission is involved

`review/start`

treat like turn/start
let core create the same kind of long-lived turn span

`thread/realtime/start`, `appendAudio`, `appendText`, `stop`

create request span
if the API submits work into core, propagate via Submission.trace
do not introduce separate realtime lifecycle spans in the initial design

Unary methods such as `thread/list`

create request span only

Runtime checks

Keep runtime checks narrowly scoped in the initial rollout:

warn when an inbound trace carrier is present but invalid
test that Submission.trace is set when work is submitted from a traced request

Do not add lifecycle consistency checks for tracing registries that do not exist yet.

Tests

Add tests for the initial mechanics:

inbound request tracing accepts a valid W3C carrier
invalid carriers are ignored cleanly
unary methods create request spans without needing any extra handler changes
turn/start propagates request ancestry through Submission.trace into core
turn/start creates a long-lived core-owned turn span
the turn span closes on completion, interruption, or failure
existing core spans inherit from the propagated parent

The goal is to verify the centralized propagation behavior, not to exhaustively test OTEL internals.

Suggested PR sequence

PR 1: Foundation plus inbound request spans

Scope:

Introduce a shared W3cTraceContext type in codex-protocol.
Add trace to inbound JSON-RPC request envelopes in app-server protocol.
Add focused trace-context helpers in codex-rs/otel.
Add the centralized app-server request tracing module.
Wrap inbound request handling in message_processor.rs.

Why this PR:

proves the transport and request-span shape with minimal scope
gives all inbound app-server APIs consistent request tracing immediately
avoids mixing lifecycle questions into the initial plumbing review

PR 2: Async handoff into core via `Submission`

Scope:

Add trace to Submission.
Have CodexThread::submit() capture the current span context automatically.
Have the core submission loop restore parentage with a dispatch span.
Validate the flow with turn/start.

Why this PR:

validates the critical async handoff from app-server into core
proves that existing core tracing can inherit the app-server request ancestry
keeps the behavior change focused on one boundary

PR 3: Core-owned long-lived turn spans

Scope:

Add a long-lived turn span in core for turn/start.
Reuse the same turn-span pattern for review/start.
Ensure the span closes on completion, interruption, or failure.

Why this PR:

completes the minimum useful tracing story for turn lifecycles
keeps long-lived span ownership in the layer that actually owns the turn
still builds on the simpler propagation model from PR 2 instead of mixing everything into one change

PR 4: Optional follow-ups

Possible follow-ups:

Reuse existing app-server thread state to add thread loaded/unloaded duration metrics if needed.
Reuse existing realtime runtime state to add realtime duration metrics if needed.
Add outbound JSON-RPC trace propagation only if there is a concrete client-side tracing use case.

Rollout guidance

Start with:

inbound request spans for all app-server requests
turn/start request -> core propagation
a core-owned long-lived turn span for turn/start

Those pieces exercise the important mechanics:

inbound carrier extraction
request span creation
async handoff into core
inherited core tracing beneath the propagated parent
a single span covering the full duration of a turn

After that, only add more lifecycle-specific tracing if a real debugging or observability gap remains.

Bottom line

The recommended initial design is:

trace context on inbound JSON-RPC request envelopes
one standardized request span for every inbound request
automatic propagation through Submission into core
core-owned long-lived turn spans for turn-producing APIs
OTEL conversion and carrier logic centralized in codex-otel
no app-server-owned tracing registries for turns, threads, or realtime sessions in the initial implementation

This gives app-server distributed tracing that is:

consistent
low-boilerplate
modular
aligned with the existing ownership boundaries in app-server and core

21 KiB Raw Blame History

App-server v2 tracing design

Summary

Design goals

Non-goals

Why not RequestKind

Terminology

High-level tracing model

1. Inbound request

2. Async handoff into core

3. Core-owned turn spans

4. Defer thread and realtime lifecycle-heavy tracing

Span model by API shape

Unary request/response APIs

Turn-producing APIs

Other APIs that submit work into core

Thread lifecycle APIs

Where the code should live

codex-rs/protocol

codex-rs/otel

codex-rs/app-server-protocol

codex-rs/core

codex-rs/app-server

Standardized request spans

No app-server tracing registries

No direct span construction in handlers

Layering

Inbound flow in app-server

Core handoff flow

Behavior for key v2 APIs

thread/start

thread/resume

thread/fork

thread/unsubscribe

turn/start

turn/steer

turn/interrupt

review/start

thread/realtime/start, appendAudio, appendText, stop

Unary methods such as thread/list

Runtime checks

Tests

Suggested PR sequence

PR 1: Foundation plus inbound request spans

PR 2: Async handoff into core via Submission

PR 3: Core-owned long-lived turn spans

PR 4: Optional follow-ups

Rollout guidance

Bottom line

21 KiB

Raw Blame History

Why not `RequestKind`

`codex-rs/protocol`

`codex-rs/otel`

`codex-rs/app-server-protocol`

`codex-rs/core`

`codex-rs/app-server`

`thread/start`

`thread/resume`

`thread/fork`

`thread/unsubscribe`

`turn/start`

`turn/steer`

`turn/interrupt`

`review/start`

`thread/realtime/start`, `appendAudio`, `appendText`, `stop`

Unary methods such as `thread/list`

PR 2: Async handoff into core via `Submission`