feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630)

### Summary
This adds turn-level latency metrics for the first model output and the
first completed agent message.
- `codex.turn.ttft.duration_ms` starts at turn start and records on the
first output signal we see from the model. That includes normal
assistant text, reasoning deltas, and non-text outputs like tool-call
items.
- `codex.turn.ttfm.duration_ms` also starts at turn start, but it
records when the first agent message finishes streaming rather than when
its first delta arrives.

### Implementation notes
The timing is tracked in codex-core, not app-server, so the definition
stays consistent across CLI, TUI, and app-server clients.

I reused the existing turn lifecycle boundary that already drives
`codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
state, and record each metric once per turn.

I also wired the new metric names into the OTEL runtime metrics summary
so they show up in the same in-memory/debug snapshot path as the
existing timing metrics.
This commit is contained in:
Owen Lin
2026-03-06 10:23:48 -08:00
committed by GitHub
parent 6c98a59dbd
commit 3449e00bc9
8 changed files with 348 additions and 9 deletions

View File

@@ -74,6 +74,16 @@ fn runtime_metrics_summary_collects_tool_api_and_streaming_metrics() -> Result<(
.into(),
))));
manager.record_websocket_event(&ws_timing_response, Duration::from_millis(20));
manager.record_duration(
"codex.turn.ttft.duration_ms",
Duration::from_millis(95),
&[],
);
manager.record_duration(
"codex.turn.ttfm.duration_ms",
Duration::from_millis(180),
&[],
);
let summary = manager
.runtime_metrics_summary()
@@ -105,6 +115,8 @@ fn runtime_metrics_summary_collects_tool_api_and_streaming_metrics() -> Result<(
responses_api_engine_service_ttft_ms: 233,
responses_api_engine_iapi_tbt_ms: 377,
responses_api_engine_service_tbt_ms: 399,
turn_ttft_ms: 95,
turn_ttfm_ms: 180,
};
assert_eq!(summary, expected);