[codex-backend] Make thread metadata updates tolerate pending backfill (#16877)

### Summary
Fix `thread/metadata/update` so it can still patch stored thread
metadata when the list/backfill-gated `get_state_db(...)` path is
unavailable.

What was happening:
- The app logs showed `thread/metadata/update` failing with `sqlite
state db unavailable for thread ...`.
- This was not isolated to one bad thread. Once the failure started for
a user, branch metadata updates failed 100% of the time for that user.
- Reports were staggered across users, which points at local app-server
/ local SQLite state rather than one global server-side failure.
- Turns could still start immediately after the metadata update failed,
which suggests the thread itself was valid and the failure was in the
metadata endpoint DB-handle path.

The fix:
- Keep using the loaded thread state DB and the normal
`get_state_db(...)` fallback first.
- If that still returns `None`, open `StateRuntime::init(...)` directly
for this targeted metadata update path.
- Log the direct state runtime init error if that final fallback also
fails, so future reports have the real DB-open cause instead of only the
generic unavailable error.
- Add a regression test where the DB exists but backfill is not
complete, and verify `thread/metadata/update` can still repair the
stored rollout thread and patch `gitInfo`.

Relevant context / suspect PRs:
- #16434 changed state DB startup to run auto-vacuum / incremental
vacuum. This is the most suspicious timing match for per-user, staggered
local SQLite availability failures.
- #16433 dropped the old log table from the state DB, also near the
timing window.
- #13280 introduced this endpoint and made it rely on SQLite for git
metadata without resuming the thread.
- #14859 and #14888 added/consumed persisted model + reasoning effort
metadata. I checked these because of the new thread metadata fields, but
this failure happens before the endpoint reaches thread-row update/load
logic, so they seem less likely as the direct cause.

### Testing
- `cargo fmt -- --config imports_granularity=Item` completed; local
stable rustfmt emitted warnings that `imports_granularity` is unstable
- `cargo test -p codex-app-server thread_metadata_update`
- `git diff --check`
This commit is contained in:
joeytrasatti-openai
2026-04-06 13:07:19 -04:00
committed by GitHub
parent 54dbbb839e
commit 4ce97cef02
2 changed files with 71 additions and 0 deletions

View File

@@ -2754,6 +2754,24 @@ impl CodexMessageProcessor {
if state_db_ctx.is_none() {
state_db_ctx = get_state_db(&self.config).await;
}
if state_db_ctx.is_none() {
match StateRuntime::init(
self.config.sqlite_home.clone(),
self.config.model_provider_id.clone(),
)
.await
{
Ok(ctx) => {
state_db_ctx = Some(ctx);
}
Err(err) => {
warn!(
"failed to initialize state db for thread metadata update at {}: {err}",
self.config.sqlite_home.display()
);
}
}
}
let Some(state_db_ctx) = state_db_ctx else {
self.send_internal_error(
request_id,