mirror of https://github.com/openai/codex.git synced 2026-05-23 12:34:25 +00:00

Files

Matthew Zeng e15ecc9c35 Add production startup and TTFT telemetry (#22198 )

## Why

While investigating `codex exec hi` startup latency, the useful
questions were not "is startup slow?" but "which durable bucket is slow
in production?"

The path we observed has a few distinct stages:

1. `thread/start` creates the session
2. startup prewarm builds the turn context, tools, and prompt
3. startup prewarm warms the websocket
4. the first real turn resolves the prewarm
5. the model produces the first token

Before this PR, production telemetry had some of the raw measurements
already:

- aggregate startup-prewarm duration / age-at-first-turn metrics
- TTFT as a metric
- websocket request telemetry

But there was no coherent production event stream for the startup
breakdown itself, and TTFT was metric-only. That made it hard to answer
the same latency questions from OpenTelemetry-backed logs without adding
one-off local instrumentation.

## What changed

Add durable production telemetry on the existing `SessionTelemetry`
path:

- new `codex.startup_phase` OTel log/trace events plus
`codex.startup.phase.duration_ms`
- new `codex.turn_ttft` OTel log/trace events while preserving the
existing TTFT metric

The startup phase event is emitted for the coarse buckets we actually
observed while running `exec hi`:

- `thread_start_create_thread`
- `startup_prewarm_total`
- `startup_prewarm_create_turn_context`
- `startup_prewarm_build_tools`
- `startup_prewarm_build_prompt`
- `startup_prewarm_websocket_warmup`
- `startup_prewarm_resolve`

These phases are intentionally low-cardinality so they remain safe as
production telemetry tags.

## Why this shape

This keeps the instrumentation on the same production path as the rest
of the session telemetry instead of adding a local debug-only trace
mode. It also avoids changing startup behavior:

- prewarm still runs
- no control flow changes
- no extra remote calls
- no user-visible behavior changes

One boundary is intentional: very early process bootstrap that happens
before a session exists is not included here, because this PR uses
session-scoped production telemetry. The expensive buckets we were
trying to understand after `thread/start` are now covered durably.

## Verification

- `cargo test -p codex-otel`
- `cargo test -p codex-core turn_timing`
- `cargo test -p codex-core
regular_turn_emits_turn_started_without_waiting_for_startup_prewarm`
- `cargo test -p codex-core
interrupting_regular_turn_waiting_on_startup_prewarm_emits_turn_aborted`
- `cargo test -p codex-app-server thread_start`
- `just fix -p codex-otel -p codex-core -p codex-app-server`

I also ran `cargo test -p codex-core`; it built successfully and then
hit an existing unrelated stack overflow in
`tools::handlers::multi_agents::tests::tool_handlers_cascade_close_and_resume_and_keep_explicitly_closed_subtrees_closed`.

2026-05-11 23:58:36 +00:00

src

Add production startup and TTFT telemetry (#22198 )

2026-05-11 23:58:36 +00:00

tests

codex-otel: add configurable trace metadata (#21556 )

2026-05-07 16:06:57 -07:00

BUILD.bazel

Codex/windows bazel rust test coverage no rs (#16528 )

2026-04-03 15:34:03 -07:00

Cargo.toml

otel: remove the last workspace crate feature (#16469 )

2026-04-01 13:45:23 -07:00

README.md

codex-otel: add configurable trace metadata (#21556 )

2026-05-07 16:06:57 -07:00

README.md

codex-otel

codex-otel is the OpenTelemetry integration crate for Codex. It provides:

Provider wiring for log/trace/metric exporters (codex_otel::OtelProvider and codex_otel::provider).
Session-scoped business event emission via codex_otel::SessionTelemetry.
Low-level metrics APIs via codex_otel::metrics.
Trace-context helpers via codex_otel::trace_context and crate-root re-exports.

Tracing and logs

Create an OTEL provider from OtelSettings. The provider also configures metrics (when enabled), then attach its layers to your tracing_subscriber registry:

use codex_otel::config::OtelExporter;
use codex_otel::config::OtelHttpProtocol;
use codex_otel::config::OtelSettings;
use codex_otel::OtelProvider;
use tracing_subscriber::prelude::*;

let settings = OtelSettings {
    environment: "dev".to_string(),
    service_name: "codex-cli".to_string(),
    service_version: env!("CARGO_PKG_VERSION").to_string(),
    codex_home: std::path::PathBuf::from("/tmp"),
    exporter: OtelExporter::OtlpHttp {
        endpoint: "https://otlp.example.com".to_string(),
        headers: std::collections::HashMap::new(),
        protocol: OtelHttpProtocol::Binary,
        tls: None,
    },
    trace_exporter: OtelExporter::OtlpHttp {
        endpoint: "https://otlp.example.com".to_string(),
        headers: std::collections::HashMap::new(),
        protocol: OtelHttpProtocol::Binary,
        tls: None,
    },
    metrics_exporter: OtelExporter::None,
    span_attributes: std::collections::BTreeMap::new(),
    tracestate: std::collections::BTreeMap::new(),
};

if let Some(provider) = OtelProvider::from(&settings)? {
    let registry = tracing_subscriber::registry()
        .with(provider.logger_layer())
        .with(provider.tracing_layer());
    registry.init();
}

Configured span attributes and W3C tracestate member fields are applied to exported trace spans and propagated trace context:

[otel.span_attributes]
"example.trace_attr" = "enabled"

[otel.tracestate.example]
alpha = "one"
beta = "two"

Configured tracestate members and encoded values must be valid W3C tracestate. Each nested table is encoded as semicolon-separated key:value fields inside that member. If propagated trace context already has the named member, Codex upserts configured fields and preserves other fields in that member. This config shape does not support setting opaque tracestate member values. Invalid trace metadata entries are ignored during config load and reported as startup warnings.

SessionTelemetry (events)

SessionTelemetry adds consistent metadata to tracing events and helps record Codex-specific session events. Rich session/business events should go through SessionTelemetry; subsystem-owned audit events can stay with the owning subsystem.

use codex_otel::SessionTelemetry;

let manager = SessionTelemetry::new(
    conversation_id,
    model,
    slug,
    account_id,
    account_email,
    auth_mode,
    originator,
    log_user_prompts,
    terminal_type,
    session_source,
);

manager.user_prompt(&prompt_items);

Metrics (OTLP or in-memory)

Modes:

OTLP: exports metrics via the OpenTelemetry OTLP exporter (HTTP or gRPC).
In-memory: records via opentelemetry_sdk::metrics::InMemoryMetricExporter for tests/assertions; call shutdown() to flush.

codex-otel also provides OtelExporter::Statsig, a shorthand for exporting OTLP/HTTP JSON metrics to Statsig using Codex-internal defaults.

Statsig ingestion (OTLP/HTTP JSON) example:

use codex_otel::config::{OtelExporter, OtelHttpProtocol};

let metrics = MetricsClient::new(MetricsConfig::otlp(
    "dev",
    "codex-cli",
    env!("CARGO_PKG_VERSION"),
    OtelExporter::OtlpHttp {
        endpoint: "https://api.statsig.com/otlp".to_string(),
        headers: std::collections::HashMap::from([(
            "statsig-api-key".to_string(),
            std::env::var("STATSIG_SERVER_SDK_SECRET")?,
        )]),
        protocol: OtelHttpProtocol::Json,
        tls: None,
    },
))?;

metrics.counter("codex.session_started", 1, &[("source", "tui")])?;
metrics.histogram("codex.request_latency", 83, &[("route", "chat")])?;

In-memory (tests):

let exporter = InMemoryMetricExporter::default();
let metrics = MetricsClient::new(MetricsConfig::in_memory(
    "test",
    "codex-cli",
    env!("CARGO_PKG_VERSION"),
    exporter.clone(),
))?;
metrics.counter("codex.turns", 1, &[("model", "gpt-5.1")])?;
metrics.shutdown()?; // flushes in-memory exporter

Trace context

Trace propagation helpers remain separate from the session event emitter:

use codex_otel::current_span_w3c_trace_context;
use codex_otel::set_parent_from_w3c_trace_context;

Shutdown

OtelProvider::shutdown() stops the OTEL exporter.
SessionTelemetry::shutdown_metrics() flushes and shuts down the metrics provider.

Both are optional because drop performs best-effort shutdown, but calling them explicitly gives deterministic flushing (or a shutdown error if flushing does not complete in time).