mirror of
https://github.com/openai/codex.git
synced 2026-04-30 17:36:40 +00:00
CXC-392 [With 401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream) <img width="1909" height="555" alt="401 auth tags in Sentry" src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3" /> - auth_401_*: preserved facts from the latest unauthorized response snapshot - auth_*: latest auth-related facts from the latest request attempt - auth_recovery_*: unauthorized recovery state and follow-up result Without 401 <img width="1917" height="522" alt="happy-path auth tags in Sentry" src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f" /> ###### Summary - Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation. - Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior. - Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry. - Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate. ###### Rationale (from spec findings) - The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth. - Client uploads needed to show whether unauthorized recovery ran and what the client tried next. - Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate. - The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful. ###### Scope - Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing. - Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR. ###### Trade-offs - This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies. - This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects. - Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket. ###### Client follow-up - PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing. - A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence. - `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path. ###### Testing - `cargo test -p codex-core refresh_available_models_sorts_by_priority` - `cargo test -p codex-core emit_feedback_request_tags_` - `cargo test -p codex-core emit_feedback_auth_recovery_tags_` - `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase` - `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers` - `cargo test -p codex-core identity_auth_details` - `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details` - `cargo test -p codex-core --all-features --no-run` - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
99 lines
2.6 KiB
Rust
99 lines
2.6 KiB
Rust
use crate::error::ApiError;
|
|
use codex_client::Request;
|
|
use codex_client::RequestTelemetry;
|
|
use codex_client::Response;
|
|
use codex_client::RetryPolicy;
|
|
use codex_client::StreamResponse;
|
|
use codex_client::TransportError;
|
|
use codex_client::run_with_retry;
|
|
use http::StatusCode;
|
|
use std::future::Future;
|
|
use std::sync::Arc;
|
|
use std::time::Duration;
|
|
use tokio::time::Instant;
|
|
use tokio_tungstenite::tungstenite::Error;
|
|
use tokio_tungstenite::tungstenite::Message;
|
|
|
|
/// Generic telemetry.
|
|
pub trait SseTelemetry: Send + Sync {
|
|
fn on_sse_poll(
|
|
&self,
|
|
result: &Result<
|
|
Option<
|
|
Result<
|
|
eventsource_stream::Event,
|
|
eventsource_stream::EventStreamError<TransportError>,
|
|
>,
|
|
>,
|
|
tokio::time::error::Elapsed,
|
|
>,
|
|
duration: Duration,
|
|
);
|
|
}
|
|
|
|
/// Telemetry for Responses WebSocket transport.
|
|
pub trait WebsocketTelemetry: Send + Sync {
|
|
fn on_ws_request(&self, duration: Duration, error: Option<&ApiError>, connection_reused: bool);
|
|
|
|
fn on_ws_event(
|
|
&self,
|
|
result: &Result<Option<Result<Message, Error>>, ApiError>,
|
|
duration: Duration,
|
|
);
|
|
}
|
|
|
|
pub(crate) trait WithStatus {
|
|
fn status(&self) -> StatusCode;
|
|
}
|
|
|
|
fn http_status(err: &TransportError) -> Option<StatusCode> {
|
|
match err {
|
|
TransportError::Http { status, .. } => Some(*status),
|
|
_ => None,
|
|
}
|
|
}
|
|
|
|
impl WithStatus for Response {
|
|
fn status(&self) -> StatusCode {
|
|
self.status
|
|
}
|
|
}
|
|
|
|
impl WithStatus for StreamResponse {
|
|
fn status(&self) -> StatusCode {
|
|
self.status
|
|
}
|
|
}
|
|
|
|
pub(crate) async fn run_with_request_telemetry<T, F, Fut>(
|
|
policy: RetryPolicy,
|
|
telemetry: Option<Arc<dyn RequestTelemetry>>,
|
|
make_request: impl FnMut() -> Request,
|
|
send: F,
|
|
) -> Result<T, TransportError>
|
|
where
|
|
T: WithStatus,
|
|
F: Clone + Fn(Request) -> Fut,
|
|
Fut: Future<Output = Result<T, TransportError>>,
|
|
{
|
|
// Wraps `run_with_retry` to attach per-attempt request telemetry for both
|
|
// unary and streaming HTTP calls.
|
|
run_with_retry(policy, make_request, move |req, attempt| {
|
|
let telemetry = telemetry.clone();
|
|
let send = send.clone();
|
|
async move {
|
|
let start = Instant::now();
|
|
let result = send(req).await;
|
|
if let Some(t) = telemetry.as_ref() {
|
|
let (status, err) = match &result {
|
|
Ok(resp) => (Some(resp.status()), None),
|
|
Err(err) => (http_status(err), Some(err)),
|
|
};
|
|
t.on_request(attempt, status, err, start.elapsed());
|
|
}
|
|
result
|
|
}
|
|
})
|
|
.await
|
|
}
|