Files
codex/codex-rs/cloud-requirements
Celia Chen f55f5c258f Fix: proactive auth refresh to reload guarded disk state first (#15357)
## Summary

Fix a managed ChatGPT auth bug where a stale Codex process could
proactively refresh using an old in-memory refresh token even after
another process had already rotated auth on disk.

This changes the proactive `AuthManager::auth()` path to reuse the
existing guarded `refresh_token()` flow instead of calling the refresh
endpoint directly from cached auth state.

## Original Issue

Users reported repeated `codexd` log lines like:

```text
ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token)
```

In practice this showed up most often when multiple `codexd` processes
were left running. Killing the extra processes stopped the noise, which
suggested the issue was caused by stale auth state across processes
rather than invalid user credentials.

## Diagnosis

The bug was in the proactive refresh path used by `AuthManager::auth()`:

- Process A could refresh successfully, rotate refresh token `R0` to
`R1`, and persist the updated auth state plus `last_refresh` to disk.
- Process B could keep an older auth snapshot cached in memory, still
holding `R0` and the old `last_refresh`.
- Later, when Process B called `auth()`, it checked staleness from its
cached in-memory auth instead of first reloading from disk.
- Because that cached `last_refresh` was stale, Process B would
proactively call `/oauth/token` with stale refresh token `R0`.
- On failure, `auth()` logged the refresh error but kept returning the
same stale cached auth, so repeated `auth()` calls could keep retrying
with dead state.

This differed from the existing unauthorized-recovery flow, which
already did the safer thing: guarded reload from disk first, then
refresh only if the on-disk auth was unchanged.

## What Changed

- Switched proactive refresh in `AuthManager::auth()` to:
  - do a pure staleness check on cached auth
  - call `refresh_token()` when stale
- return the original cached auth on genuine refresh failure, preserving
existing outward behavior
- Removed the direct proactive refresh-from-cached-state path
- Added regression tests covering:
  - stale cached auth with newer same-account auth already on disk
- the same scenario even when the refresh endpoint would fail if called

## Why This Fix

`refresh_token()` already contains the right cross-process safety
behavior:

- guarded reload from disk
- same-account verification
- skip-refresh when another process already changed auth

Reusing that path makes proactive refresh consistent with unauthorized
recovery and prevents stale processes from trying to refresh
already-rotated tokens.

## Testing

Test shape:

- create a fresh temp `CODEX_HOME` from `~/.codex/auth.json`
- force `last_refresh` to an old timestamp so proactive refresh is
required
- start two long-lived helper processes against the same auth file
- start `B` first so it caches stale auth and sleeps
- start `A` second so it refreshes first
- point both at a local mock `/oauth/token` server
- inspect whether `B` makes a second refresh request with the stale
in-memory token, or reloads the rotated token from disk

### Before the fix

The repro showed the bug clearly: the mock server saw two refreshes with
the same stale token, `A` rotated to a new token, and `B` still returned
the stale token instead of reloading from disk.

```text
POST /oauth/token refresh_token=rt_j6s0...
POST /oauth/token refresh_token=rt_j6s0...

B:cached_before=rt_j6s0...
B:cached_after=rt_j6s0...
B:returned=rt_j6s0...

A:cached_before=rt_j6s0...
A:cached_after=rotated-refresh-token-logged-run-v2
A:returned=rotated-refresh-token-logged-run-v2
```

### After the fix

After the fix, the mock server saw only one refresh request. `A`
refreshed once, and `B` started with the stale token but reloaded and
returned the rotated token.

```text
POST /oauth/token refresh_token=rt_j6s0...

B:cached_before=rt_j6s0...
B:cached_after=rotated-refresh-token-fix-branch
B:returned=rotated-refresh-token-fix-branch

A:cached_before=rt_j6s0...
A:cached_after=rotated-refresh-token-fix-branch
A:returned=rotated-refresh-token-fix-branch
```

This shows the new behavior: `A` refreshes once, then `B` reuses the
updated auth from disk instead of making a second refresh request with
the stale token.
2026-03-23 12:07:59 -07:00
..
2026-02-11 14:06:41 +00:00