mirror of
https://github.com/openai/codex.git
synced 2026-04-29 00:55:38 +00:00
## Summary Fix a managed ChatGPT auth bug where a stale Codex process could proactively refresh using an old in-memory refresh token even after another process had already rotated auth on disk. This changes the proactive `AuthManager::auth()` path to reuse the existing guarded `refresh_token()` flow instead of calling the refresh endpoint directly from cached auth state. ## Original Issue Users reported repeated `codexd` log lines like: ```text ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token) ``` In practice this showed up most often when multiple `codexd` processes were left running. Killing the extra processes stopped the noise, which suggested the issue was caused by stale auth state across processes rather than invalid user credentials. ## Diagnosis The bug was in the proactive refresh path used by `AuthManager::auth()`: - Process A could refresh successfully, rotate refresh token `R0` to `R1`, and persist the updated auth state plus `last_refresh` to disk. - Process B could keep an older auth snapshot cached in memory, still holding `R0` and the old `last_refresh`. - Later, when Process B called `auth()`, it checked staleness from its cached in-memory auth instead of first reloading from disk. - Because that cached `last_refresh` was stale, Process B would proactively call `/oauth/token` with stale refresh token `R0`. - On failure, `auth()` logged the refresh error but kept returning the same stale cached auth, so repeated `auth()` calls could keep retrying with dead state. This differed from the existing unauthorized-recovery flow, which already did the safer thing: guarded reload from disk first, then refresh only if the on-disk auth was unchanged. ## What Changed - Switched proactive refresh in `AuthManager::auth()` to: - do a pure staleness check on cached auth - call `refresh_token()` when stale - return the original cached auth on genuine refresh failure, preserving existing outward behavior - Removed the direct proactive refresh-from-cached-state path - Added regression tests covering: - stale cached auth with newer same-account auth already on disk - the same scenario even when the refresh endpoint would fail if called ## Why This Fix `refresh_token()` already contains the right cross-process safety behavior: - guarded reload from disk - same-account verification - skip-refresh when another process already changed auth Reusing that path makes proactive refresh consistent with unauthorized recovery and prevents stale processes from trying to refresh already-rotated tokens. ## Testing Test shape: - create a fresh temp `CODEX_HOME` from `~/.codex/auth.json` - force `last_refresh` to an old timestamp so proactive refresh is required - start two long-lived helper processes against the same auth file - start `B` first so it caches stale auth and sleeps - start `A` second so it refreshes first - point both at a local mock `/oauth/token` server - inspect whether `B` makes a second refresh request with the stale in-memory token, or reloads the rotated token from disk ### Before the fix The repro showed the bug clearly: the mock server saw two refreshes with the same stale token, `A` rotated to a new token, and `B` still returned the stale token instead of reloading from disk. ```text POST /oauth/token refresh_token=rt_j6s0... POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rt_j6s0... B:returned=rt_j6s0... A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-logged-run-v2 A:returned=rotated-refresh-token-logged-run-v2 ``` ### After the fix After the fix, the mock server saw only one refresh request. `A` refreshed once, and `B` started with the stale token but reloaded and returned the rotated token. ```text POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rotated-refresh-token-fix-branch B:returned=rotated-refresh-token-fix-branch A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-fix-branch A:returned=rotated-refresh-token-fix-branch ``` This shows the new behavior: `A` refreshes once, then `B` reuses the updated auth from disk instead of making a second refresh request with the stale token.