Files
codex/codex-rs/exec-server/src
starr-openai de80fa6e31 Reconnect disconnected exec-server websocket clients with fresh sessions (#23867)
## Summary
- replace the one-shot lazy remote exec-server cache with a
lock-protected current client
- when the cached websocket client is already disconnected, create one
fresh websocket client/session on the next `get()`
- keep existing disconnect failure behavior for old process sessions and
HTTP body streams; do not add session resume or request retry

## Why
The prior PR direction was trying to grow into session restore: resume
the old `session_id`, preserve existing process handles, and add
reconnect retry policy. That is more machinery than we want for this
slice.

For now, the useful minimum is simpler: later fresh remote operations
should not be stuck behind a dead cached websocket client, but anything
already attached to the dead connection should fail loudly through the
existing disconnect path. The server already has detached-session
cleanup via its existing TTL, so this PR does not need to add
client-side session preservation.

## What Changed
- `LazyRemoteExecServerClient::get()` now keeps the current concrete
client in a small mutex-protected cache plus one async connect lock.
- If that cached client is still connected, `get()` returns it.
- If that cached websocket client has observed the transport close,
`get()` creates a brand-new websocket client with a brand-new
exec-server session and replaces the cache.
- If that cached client is stdio-backed, behavior stays one-shot: the
dead client is returned and later work surfaces the existing disconnect
error.
- No `resume_session_id`, backoff, request replay, or existing
`RemoteExecProcess` rebinding is added here.
- Added focused websocket coverage that proves two concurrent `get()`
calls after disconnect share one fresh replacement client/session.
2026-05-21 18:43:45 +02:00
..