mirror of
https://github.com/openai/codex.git
synced 2026-04-29 08:56:38 +00:00
Fix: proactive auth refresh to reload guarded disk state first (#15357)
## Summary Fix a managed ChatGPT auth bug where a stale Codex process could proactively refresh using an old in-memory refresh token even after another process had already rotated auth on disk. This changes the proactive `AuthManager::auth()` path to reuse the existing guarded `refresh_token()` flow instead of calling the refresh endpoint directly from cached auth state. ## Original Issue Users reported repeated `codexd` log lines like: ```text ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token) ``` In practice this showed up most often when multiple `codexd` processes were left running. Killing the extra processes stopped the noise, which suggested the issue was caused by stale auth state across processes rather than invalid user credentials. ## Diagnosis The bug was in the proactive refresh path used by `AuthManager::auth()`: - Process A could refresh successfully, rotate refresh token `R0` to `R1`, and persist the updated auth state plus `last_refresh` to disk. - Process B could keep an older auth snapshot cached in memory, still holding `R0` and the old `last_refresh`. - Later, when Process B called `auth()`, it checked staleness from its cached in-memory auth instead of first reloading from disk. - Because that cached `last_refresh` was stale, Process B would proactively call `/oauth/token` with stale refresh token `R0`. - On failure, `auth()` logged the refresh error but kept returning the same stale cached auth, so repeated `auth()` calls could keep retrying with dead state. This differed from the existing unauthorized-recovery flow, which already did the safer thing: guarded reload from disk first, then refresh only if the on-disk auth was unchanged. ## What Changed - Switched proactive refresh in `AuthManager::auth()` to: - do a pure staleness check on cached auth - call `refresh_token()` when stale - return the original cached auth on genuine refresh failure, preserving existing outward behavior - Removed the direct proactive refresh-from-cached-state path - Added regression tests covering: - stale cached auth with newer same-account auth already on disk - the same scenario even when the refresh endpoint would fail if called ## Why This Fix `refresh_token()` already contains the right cross-process safety behavior: - guarded reload from disk - same-account verification - skip-refresh when another process already changed auth Reusing that path makes proactive refresh consistent with unauthorized recovery and prevents stale processes from trying to refresh already-rotated tokens. ## Testing Test shape: - create a fresh temp `CODEX_HOME` from `~/.codex/auth.json` - force `last_refresh` to an old timestamp so proactive refresh is required - start two long-lived helper processes against the same auth file - start `B` first so it caches stale auth and sleeps - start `A` second so it refreshes first - point both at a local mock `/oauth/token` server - inspect whether `B` makes a second refresh request with the stale in-memory token, or reloads the rotated token from disk ### Before the fix The repro showed the bug clearly: the mock server saw two refreshes with the same stale token, `A` rotated to a new token, and `B` still returned the stale token instead of reloading from disk. ```text POST /oauth/token refresh_token=rt_j6s0... POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rt_j6s0... B:returned=rt_j6s0... A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-logged-run-v2 A:returned=rotated-refresh-token-logged-run-v2 ``` ### After the fix After the fix, the mock server saw only one refresh request. `A` refreshed once, and `B` started with the stale token but reloaded and returned the rotated token. ```text POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rotated-refresh-token-fix-branch B:returned=rotated-refresh-token-fix-branch A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-fix-branch A:returned=rotated-refresh-token-fix-branch ``` This shows the new behavior: `A` refreshes once, then `B` reuses the updated auth from disk instead of making a second refresh request with the stale token.
This commit is contained in:
@@ -381,6 +381,116 @@ async fn refreshes_token_when_last_refresh_is_stale() -> Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[serial_test::serial(auth_refresh)]
|
||||
#[tokio::test]
|
||||
async fn auth_reloads_disk_auth_when_cached_auth_is_stale() -> Result<()> {
|
||||
skip_if_no_network!(Ok(()));
|
||||
|
||||
let server = MockServer::start().await;
|
||||
|
||||
let ctx = RefreshTokenTestContext::new(&server)?;
|
||||
let stale_refresh = Utc::now() - Duration::days(9);
|
||||
let initial_tokens = build_tokens(INITIAL_ACCESS_TOKEN, INITIAL_REFRESH_TOKEN);
|
||||
let initial_auth = AuthDotJson {
|
||||
auth_mode: Some(AuthMode::Chatgpt),
|
||||
openai_api_key: None,
|
||||
tokens: Some(initial_tokens),
|
||||
last_refresh: Some(stale_refresh),
|
||||
};
|
||||
ctx.write_auth(&initial_auth)?;
|
||||
|
||||
let fresh_refresh = Utc::now() - Duration::days(1);
|
||||
let disk_tokens = build_tokens("disk-access-token", "disk-refresh-token");
|
||||
let disk_auth = AuthDotJson {
|
||||
auth_mode: Some(AuthMode::Chatgpt),
|
||||
openai_api_key: None,
|
||||
tokens: Some(disk_tokens.clone()),
|
||||
last_refresh: Some(fresh_refresh),
|
||||
};
|
||||
save_auth(
|
||||
ctx.codex_home.path(),
|
||||
&disk_auth,
|
||||
AuthCredentialsStoreMode::File,
|
||||
)?;
|
||||
|
||||
let cached_auth = ctx
|
||||
.auth_manager
|
||||
.auth()
|
||||
.await
|
||||
.context("auth should reload from disk")?;
|
||||
let cached = cached_auth
|
||||
.get_token_data()
|
||||
.context("token data should reload from disk")?;
|
||||
assert_eq!(cached, disk_tokens);
|
||||
|
||||
let stored = ctx.load_auth()?;
|
||||
assert_eq!(stored, disk_auth);
|
||||
|
||||
let requests = server.received_requests().await.unwrap_or_default();
|
||||
assert!(requests.is_empty(), "expected no refresh token requests");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[serial_test::serial(auth_refresh)]
|
||||
#[tokio::test]
|
||||
async fn auth_reloads_disk_auth_without_calling_expired_refresh_token() -> Result<()> {
|
||||
skip_if_no_network!(Ok(()));
|
||||
|
||||
let server = MockServer::start().await;
|
||||
Mock::given(method("POST"))
|
||||
.and(path("/oauth/token"))
|
||||
.respond_with(ResponseTemplate::new(401).set_body_json(json!({
|
||||
"error": {
|
||||
"code": "refresh_token_expired"
|
||||
}
|
||||
})))
|
||||
.expect(0)
|
||||
.mount(&server)
|
||||
.await;
|
||||
|
||||
let ctx = RefreshTokenTestContext::new(&server)?;
|
||||
let stale_refresh = Utc::now() - Duration::days(9);
|
||||
let initial_tokens = build_tokens(INITIAL_ACCESS_TOKEN, INITIAL_REFRESH_TOKEN);
|
||||
let initial_auth = AuthDotJson {
|
||||
auth_mode: Some(AuthMode::Chatgpt),
|
||||
openai_api_key: None,
|
||||
tokens: Some(initial_tokens),
|
||||
last_refresh: Some(stale_refresh),
|
||||
};
|
||||
ctx.write_auth(&initial_auth)?;
|
||||
|
||||
let fresh_refresh = Utc::now() - Duration::days(1);
|
||||
let disk_tokens = build_tokens("disk-access-token", "disk-refresh-token");
|
||||
let disk_auth = AuthDotJson {
|
||||
auth_mode: Some(AuthMode::Chatgpt),
|
||||
openai_api_key: None,
|
||||
tokens: Some(disk_tokens.clone()),
|
||||
last_refresh: Some(fresh_refresh),
|
||||
};
|
||||
save_auth(
|
||||
ctx.codex_home.path(),
|
||||
&disk_auth,
|
||||
AuthCredentialsStoreMode::File,
|
||||
)?;
|
||||
|
||||
let cached_auth = ctx
|
||||
.auth_manager
|
||||
.auth()
|
||||
.await
|
||||
.context("auth should reload from disk")?;
|
||||
let cached = cached_auth
|
||||
.get_token_data()
|
||||
.context("token data should reload from disk")?;
|
||||
assert_eq!(cached, disk_tokens);
|
||||
|
||||
let stored = ctx.load_auth()?;
|
||||
assert_eq!(stored, disk_auth);
|
||||
|
||||
server.verify().await;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[serial_test::serial(auth_refresh)]
|
||||
#[tokio::test]
|
||||
async fn refresh_token_returns_permanent_error_for_expired_refresh_token() -> Result<()> {
|
||||
|
||||
Reference in New Issue
Block a user