feat: add auth login diagnostics (#13797)

## Problem Browser login failures historically leave support with an incomplete picture. HARs can show that the browser completed OAuth and reached the localhost callback, but they do not explain why the native client failed on the final `/oauth/token` exchange. Direct `codex login` also relied mostly on terminal stderr and the browser error page, so even when the login crate emitted better sign-in diagnostics through TUI or app-server flows, the one-shot CLI path still did not leave behind an easy artifact to collect. ## Mental model This implementation treats the browser page, the returned `io::Error`, and the normal structured log as separate surfaces with different safety requirements. The browser page and returned error preserve the detail that operators need to diagnose failures. The structured log stays narrower: it records reviewed lifecycle events, parsed safe fields, and redacted transport errors without becoming a sink for secrets or arbitrary backend bodies. Direct `codex login` now adds a fourth support surface: a small file-backed log at `codex-login.log` under the configured `log_dir`. That artifact carries the same login-target events as the other entrypoints without changing the existing stderr/browser UX. ## Non-goals This does not add auth logging to normal runtime requests, and it does not try to infer precise transport root causes from brittle string matching. The scope remains the browser-login callback flow in the `login` crate plus a direct-CLI wrapper that persists those events to disk. This also does not try to reuse the TUI logging stack wholesale. The TUI path initializes feedback, OpenTelemetry, and other session-oriented layers that are useful for an interactive app but unnecessary for a one-shot login command. ## Tradeoffs The implementation favors fidelity for caller-visible errors and restraint for persistent logs. Parsed JSON token-endpoint errors are logged safely by field. Non-JSON token-endpoint bodies remain available to the returned error so CLI and browser surfaces still show backend detail. Transport errors keep their real `reqwest` message, but attached URLs are surgically redacted. Custom issuer URLs are sanitized before logging. On the CLI side, the code intentionally duplicates a narrow slice of the TUI file-logging setup instead of sharing the full initializer. That keeps `codex login` easy to reason about and avoids coupling it to interactive-session layers that the command does not need. ## Architecture The core auth behavior lives in `codex-rs/login/src/server.rs`. The callback path now logs callback receipt, callback validation, token-exchange start, token-exchange success, token-endpoint non-2xx responses, and transport failures. App-server consumers still use this same login-server path via `run_login_server(...)`, so the same instrumentation benefits TUI, Electron, and VS Code extension flows. The direct CLI path in `codex-rs/cli/src/login.rs` now installs a small file-backed tracing layer for login commands only. That writes `codex-login.log` under `log_dir` with login-specific targets such as `codex_cli::login` and `codex_login::server`. ## Observability The main signals come from the `login` crate target and are intentionally scoped to sign-in. Structured logs include redacted issuer URLs, redacted transport errors, HTTP status, and parsed token-endpoint fields when available. The callback-layer log intentionally avoids `%err` on token-endpoint failures so arbitrary backend bodies do not get copied into the normal log file. Direct `codex login` now leaves a durable artifact for both failure and success cases. Example output from the new file-backed CLI path: Failing callback: ```text 2026-03-06T22:08:54.143612Z INFO codex_cli::login: starting browser login flow 2026-03-06T22:09:03.431699Z INFO codex_login::server: received login callback path=/auth/callback has_code=false has_state=true has_error=true state_valid=true 2026-03-06T22:09:03.431745Z WARN codex_login::server: oauth callback returned error error_code="access_denied" has_error_description=true ``` Succeeded callback and token exchange: ```text 2026-03-06T22:09:14.065559Z INFO codex_cli::login: starting browser login flow 2026-03-06T22:09:36.431678Z INFO codex_login::server: received login callback path=/auth/callback has_code=true has_state=true has_error=false state_valid=true 2026-03-06T22:09:36.436977Z INFO codex_login::server: starting oauth token exchange issuer=https://auth.openai.com/ redirect_uri=http://localhost:1455/auth/callback 2026-03-06T22:09:36.685438Z INFO codex_login::server: oauth token exchange succeeded status=200 OK ``` ## Tests - `cargo test -p codex-login` - `cargo clippy -p codex-login --tests -- -D warnings` - `cargo test -p codex-cli` - `just bazel-lock-update` - `just bazel-lock-check` - manual direct `codex login` smoke tests for both a failing callback and a successful browser login --------- Co-authored-by: Codex <noreply@openai.com>
2026-04-28 00:25:56 +00:00 · 2026-03-06 15:00:37 -08:00
parent dd4a5216c9
commit 4e68fb96e2
6 changed files with 616 additions and 12 deletions
--- a/codex-rs/cli/src/login.rs
+++ b/codex-rs/cli/src/login.rs
@@ -1,3 +1,12 @@
+//! CLI login commands and their direct-user observability surfaces.
+//!
+//! The TUI path already installs a broader tracing stack with feedback, OpenTelemetry, and other
+//! interactive-session layers. Direct `codex login` intentionally does less: it preserves the
+//! existing stderr/browser UX and adds only a small file-backed tracing layer for login-specific
+//! targets. Keeping that setup local avoids pulling the TUI's session-oriented logging machinery
+//! into a one-shot CLI command while still producing a durable `codex-login.log` artifact that
+//! support can request from users.
+
 use codex_core::CodexAuth;
 use codex_core::auth::AuthCredentialsStoreMode;
 use codex_core::auth::AuthMode;
@@ -10,9 +19,16 @@ use codex_login::run_device_code_login;
 use codex_login::run_login_server;
 use codex_protocol::config_types::ForcedLoginMethod;
 use codex_utils_cli::CliConfigOverrides;
+use std::fs::OpenOptions;
 use std::io::IsTerminal;
 use std::io::Read;
 use std::path::PathBuf;
+use tracing_appender::non_blocking;
+use tracing_appender::non_blocking::WorkerGuard;
+use tracing_subscriber::EnvFilter;
+use tracing_subscriber::Layer;
+use tracing_subscriber::layer::SubscriberExt;
+use tracing_subscriber::util::SubscriberInitExt;

 const CHATGPT_LOGIN_DISABLED_MESSAGE: &str =
    "ChatGPT login is disabled. Use API key login instead.";
@@ -20,6 +36,74 @@ const API_KEY_LOGIN_DISABLED_MESSAGE: &str =
    "API key login is disabled. Use ChatGPT login instead.";
 const LOGIN_SUCCESS_MESSAGE: &str = "Successfully logged in";

+/// Installs a small file-backed tracing layer for direct `codex login` flows.
+///
+/// This deliberately duplicates a narrow slice of the TUI logging setup instead of reusing it
+/// wholesale. The TUI stack includes session-oriented layers that are valuable for interactive
+/// runs but unnecessary for a one-shot login command. Keeping the direct CLI path local lets this
+/// command produce a durable `codex-login.log` artifact without coupling it to the TUI's broader
+/// telemetry and feedback initialization.
+fn init_login_file_logging(config: &Config) -> Option<WorkerGuard> {
+    let log_dir = match codex_core::config::log_dir(config) {
+        Ok(log_dir) => log_dir,
+        Err(err) => {
+            eprintln!("Warning: failed to resolve login log directory: {err}");
+            return None;
+        }
+    };
+
+    if let Err(err) = std::fs::create_dir_all(&log_dir) {
+        eprintln!(
+            "Warning: failed to create login log directory {}: {err}",
+            log_dir.display()
+        );
+        return None;
+    }
+
+    let mut log_file_opts = OpenOptions::new();
+    log_file_opts.create(true).append(true);
+
+    #[cfg(unix)]
+    {
+        use std::os::unix::fs::OpenOptionsExt;
+        log_file_opts.mode(0o600);
+    }
+
+    let log_path = log_dir.join("codex-login.log");
+    let log_file = match log_file_opts.open(&log_path) {
+        Ok(log_file) => log_file,
+        Err(err) => {
+            eprintln!(
+                "Warning: failed to open login log file {}: {err}",
+                log_path.display()
+            );
+            return None;
+        }
+    };
+
+    let (non_blocking, guard) = non_blocking(log_file);
+    let env_filter = EnvFilter::try_from_default_env()
+        .unwrap_or_else(|_| EnvFilter::new("codex_cli=info,codex_core=info,codex_login=info"));
+    let file_layer = tracing_subscriber::fmt::layer()
+        .with_writer(non_blocking)
+        .with_target(true)
+        .with_ansi(false)
+        .with_filter(env_filter);
+
+    // Direct `codex login` otherwise relies on ephemeral stderr and browser output.
+    // Persist the same login targets to a file so support can inspect auth failures
+    // without reproducing them through TUI or app-server.
+    if let Err(err) = tracing_subscriber::registry().with(file_layer).try_init() {
+        eprintln!(
+            "Warning: failed to initialize login log file {}: {err}",
+            log_path.display()
+        );
+        return None;
+    }
+
+    Some(guard)
+}
+
 fn print_login_server_start(actual_port: u16, auth_url: &str) {
    eprintln!(
        "Starting local login server on http://localhost:{actual_port}.\nIf your browser did not open, navigate to this URL to authenticate:\n\n{auth_url}\n\nOn a remote or headless machine? Use `codex login --device-auth` instead."
@@ -46,6 +130,8 @@ pub async fn login_with_chatgpt(

 pub async fn run_login_with_chatgpt(cli_config_overrides: CliConfigOverrides) -> ! {
    let config = load_config_or_exit(cli_config_overrides).await;
+    let _login_log_guard = init_login_file_logging(&config);
+    tracing::info!("starting browser login flow");

    if matches!(config.forced_login_method, Some(ForcedLoginMethod::Api)) {
        eprintln!("{CHATGPT_LOGIN_DISABLED_MESSAGE}");
@@ -77,6 +163,8 @@ pub async fn run_login_with_api_key(
    api_key: String,
 ) -> ! {
    let config = load_config_or_exit(cli_config_overrides).await;
+    let _login_log_guard = init_login_file_logging(&config);
+    tracing::info!("starting api key login flow");

    if matches!(config.forced_login_method, Some(ForcedLoginMethod::Chatgpt)) {
        eprintln!("{API_KEY_LOGIN_DISABLED_MESSAGE}");
@@ -133,6 +221,8 @@ pub async fn run_login_with_device_code(
    client_id: Option<String>,
 ) -> ! {
    let config = load_config_or_exit(cli_config_overrides).await;
+    let _login_log_guard = init_login_file_logging(&config);
+    tracing::info!("starting device code login flow");
    if matches!(config.forced_login_method, Some(ForcedLoginMethod::Api)) {
        eprintln!("{CHATGPT_LOGIN_DISABLED_MESSAGE}");
        std::process::exit(1);
@@ -169,6 +259,8 @@ pub async fn run_login_with_device_code_fallback_to_browser(
    client_id: Option<String>,
 ) -> ! {
    let config = load_config_or_exit(cli_config_overrides).await;
+    let _login_log_guard = init_login_file_logging(&config);
+    tracing::info!("starting login flow with device code fallback");
    if matches!(config.forced_login_method, Some(ForcedLoginMethod::Api)) {
        eprintln!("{CHATGPT_LOGIN_DISABLED_MESSAGE}");
        std::process::exit(1);