12 KiB
codex-api-client: Proposed Design and Refactor Plan
This document proposes a clearer, smaller, and testable structure for codex-api-client, targeting the current pain points:
chat.rsandresponses.rsare large (600–1100 LOC) and mix multiple concerns.- SSE parsing, HTTP/retry logic, payload building, and domain event mapping are tangled.
- Azure/ChatGPT quirks live alongside core logic.
The goals here are separation of concerns, shared streaming and retry logic, and focused files that are easy to read and test.
Overview
- Keep the public API surface compatible:
ApiClienttrait,ResponsesApiClient,ChatCompletionsApiClient,ResponseStream, andResponseEventremain. - Internally, split responsibilities into small modules that both clients reuse.
- Centralize SSE framing and retry/backoff, so
chatandresponsesclients focus only on:- payload construction (Prompt → wire payload)
- mapping wire SSE events →
ResponseEvent
Target Module Layout
api-client/src/
api.rs # ApiClient trait (unchanged)
error.rs # Error/Result (unchanged interface)
stream.rs # ResponseEvent/ResponseStream (unchanged interface)
aggregate.rs # Aggregation mode (unchanged interface)
model_provider.rs # Provider config + headers (unchanged interface)
routed_client.rs # Facade routing to Chat/Responses (unchanged interface)
client/
mod.rs # Re-exports + shared types
config.rs # Common config structs/builders
http.rs # Request building, retries, backoff; returns ByteStream
rate_limits.rs # Header parsing → RateLimitSnapshot
sse.rs # Generic SSE line framing + idle-timeout handling
fixtures.rs # stream_from_fixture (move from responses.rs)
payload/
chat.rs # Prompt → Chat Completions JSON
responses.rs # Prompt → Responses JSON (+ Azure quirks)
tools.rs # Tool schema conversions and helpers
decode/
chat.rs # Chat SSE JSON → ResponseEvent (+ function-call state)
responses.rs # Responses SSE JSON → ResponseEvent
clients/
chat.rs # ChatCompletionsApiClient (thin; delegates to payload/http/decode)
responses.rs # ResponsesApiClient (thin; delegates to payload/http/decode)
Notes
- Modules are organized by responsibility. The
clients/layer becomes very small. client/http.rsowns retries/backoff, request building, headers, and returns aStream<Item = Result<Bytes>>.client/sse.rsowns SSE framing and idle-timeout. It surfaces framed JSON strings to decoders.decode/*mappers transform framed JSON intoResponseEventusing only parsing/state.payload/*generate request JSON. Azure and tool-shape specifics live here.client/rate_limits.rsparses headers and emits aResponseEvent::RateLimitsonce, near stream start.client/fixtures.rsprovides the file-backed stream used by tests and local dev.
Trait-Based Core
Introduce small traits for payload construction and decoding to maximize reuse and make the concrete Chat/Responses clients thin bindings.
-
PayloadBuilderfn build(&self, prompt: &Prompt) -> Result<serde_json::Value>- Implementations:
payload::chat::Builder,payload::responses::Builder.
-
ResponseDecoder- Consumes framed SSE JSON and emits
ResponseEvents. - Suggested interface:
fn on_frame(&mut self, json: &str, tx: &mpsc::Sender<Result<ResponseEvent>>, otel: &OtelEventManager) -> Result<()>- Implementations:
decode::chat::Decoder,decode::responses::Decoder.
- Consumes framed SSE JSON and emits
-
Optional adapters
RateLimitProvider:fn parse(&self, headers: &HeaderMap) -> Option<RateLimitSnapshot>RequestCustomizer: per-API header tweaks (e.g., Conversations/Session headers for Responses).
With these traits, a generic client wrapper can stitch components together:
struct GenericClient<PB, DEC> {
http: RequestExecutor,
payload: PB,
decoder: DEC,
idle: Duration,
otel: OtelEventManager,
}
impl<PB: PayloadBuilder, DEC: ResponseDecoder> GenericClient<PB, DEC> {
async fn stream(&self, prompt: &Prompt) -> Result<ResponseStream> {
let payload = self.payload.build(prompt)?;
let (headers, bytes) = self.http.execute_stream(payload, prompt).await?;
if let Some(snapshot) = rate_limits::parse(&headers) { /* emit event */ }
let sse_stream = sse::frame(bytes, self.idle, self.otel.clone());
// spawn: for each framed JSON chunk → self.decoder.on_frame(...)
/* return ResponseStream */
}
}
Chat/Responses become type aliases or thin wrappers around GenericClient with the appropriate PayloadBuilder and ResponseDecoder.
Responsibility Boundaries
-
Clients (
clients/chat.rs,clients/responses.rs)- Validate prompt constraints (e.g., Chat lacks
output_schema). - Build payload via
payload::*. - Build and send request via
client/http.rs. - Create an SSE pipeline:
http::stream(...) → sse::frame(...) → decode::<api>::map(...). - Forward
ResponseEvents to thempscchannel.
- Validate prompt constraints (e.g., Chat lacks
-
HTTP (
client/http.rs)RequestExecutor::execute_stream(req: RequestSpec) -> Result<(Headers, ByteStream)>.- Injects auth/session headers and provider headers via
ModelProviderInfo. - Centralized retry policy for non-2xx, 429, 401, 5xx, and transport errors.
- Handles
Retry-Afterand exponential backoff (backoff()). - Returns first successful response’s headers and stream; does not parse SSE.
-
SSE (
client/sse.rs)- Takes a
Stream<Item = Result<Bytes>>and produces framed JSON strings by handlingdata:lines and chunk boundaries. - Enforces idle timeout and signals early stream termination errors.
- Does no schema parsing; just a robust line/framing codec.
- Takes a
-
Decoders (
decode/chat.rs,decode/responses.rs)- Take framed JSON string(s) and emit
ResponseEvents. - Own API-specific state machines: e.g., Chat function-call accumulation; Responses “event-shaped” and “field-shaped” variants.
- No networking, no backoff, no channels.
- Take framed JSON string(s) and emit
-
Payload builders (
payload/chat.rs,payload/responses.rs,payload/tools.rs)- Convert
Promptto provider-specific JSON (Chat/Responses). Keep pure and deterministic. - Azure-specific adjustments (e.g., attach item IDs) live here.
- Convert
-
Rate limits (
client/rate_limits.rs)- Parse headers to
RateLimitSnapshot. - Emit a single
ResponseEvent::RateLimitsat stream start when present.
- Parse headers to
Stream Pipeline
ByteStream (reqwest) → sse::frame (idle timeout, data: framing) → decode::<api> → ResponseEvent
Pseudocode for both clients:
let (headers, byte_stream) = http.execute_stream(request_spec).await?;
if let Some(snapshot) = rate_limits::parse(&headers) {
tx.send(Ok(ResponseEvent::RateLimits(snapshot))).await.ok();
}
let sse_stream = sse::frame(byte_stream, idle_timeout, otel.clone());
tokio::spawn(decode::<Api>::run(sse_stream, tx.clone(), otel.clone()));
Ok(ResponseStream { rx_event })
Where decode::<Api>::run is API-specific mapping of framed JSON into ResponseEvents.
Incremental Refactor Plan
Do this in small, safe steps. Public API stays stable at each step.
- Introduce traits
- Add
PayloadBuilderandResponseDecodertraits. - Provide initial implementations backed by existing code paths to minimize churn.
- Extract shared helpers
- Move rate-limit parsing from
responses.rstoclient/rate_limits.rs. - Move
stream_from_fixturetoclient/fixtures.rs. - Keep old re-exports from
lib.rsto avoid churn.
- Isolate SSE framing
- Extract line framing + idle-timeout from
responses.rs::process_sseintoclient/sse.rs. - Have
responses.rsusesse::frameand keep its own JSON mapping for now.
- Centralize HTTP execution
- Create
client/http.rswithRequestExecutorhandling retries/backoff and returning(headers, stream). - Switch
responses.rsto use it. - Align Chat client to use
RequestExecutoras well.
- Split JSON mapping into decoders
- Move JSON →
ResponseEventmapping fromresponses.rstodecode/responses.rs. - Do the same for Chat (
chat.rs→decode/chat.rs).
- Extract payload builders
- Move payload JSON construction into
payload/chat.rsandpayload/responses.rs. - Move tool helpers into
payload/tools.rs.
- Thin the clients
- Create
clients/chat.rsandclients/responses.rsthat glue together payload → http → sse → decode. - Keep existing type names and
impl ApiClientblocks; only relocate logic behind them.
- Clean-up and local boundaries
- Remove now-unused code paths from the original large files.
- Ensure
moddeclarations reflect the new module structure.
- Tests and validation
- Unit-test
sse::frameagainst split and concatenateddata:lines. - Unit-test both decoders with small fixtures for typical and edge cases.
- Unit-test payload builders on prompts containing messages, images, tools, and reasoning.
- Keep existing integration tests using
stream_from_fixture.
File Size Targets (post-refactor)
clients/chat.rs: ~100–150 LOCclients/responses.rs: ~150–200 LOCdecode/chat.rs: ~200–250 LOC (function-call state lives here)decode/responses.rs: ~250–300 LOC (event/field-shaped mapping)client/http.rs: ~150–200 LOC (shared retries)client/sse.rs: ~120–160 LOC (framing + timeout)payload/chat.rs: ~120–180 LOCpayload/responses.rs: ~120–160 LOC
Error Handling and Retries
- Single retry policy in
client/http.rs:- Retry 429/401/5xx with
Retry-Afterwhen present or with exponential backoff. - Transport errors (DNS/reset/timeouts) are retryable up to provider-configured attempts.
- Non-retryable statuses return
UnexpectedStatuswith body for diagnosis.
- Retry 429/401/5xx with
decode/*surface protocol-specific “quota/context window exceeded” errors as stable messages already recognized by callers.
Instrumentation
sse::frametriggers idle-timeout failures and marks event kinds only when actual JSON events appear; decoders record specific kinds (e.g.,response.completed).http::execute_streamwraps the request withotel_event_manager.log_request(...)and populatesrequest_idwhen applicable.
Azure and ChatGPT Specifics
- Keep all Azure id attachment logic in
payload/responses.rs. - Keep ChatGPT auth header handling in
http.rsviaAuthProvider(unchanged trait), based onRequestSpec’s context.
Configuration
Optionally introduce typed builders for client configs in client/config.rs to reduce parameter plumbing and make defaults explicit:
ResponsesConfig::builder()
.provider(provider)
.model(model)
.conversation_id(conv_id)
.otel(otel)
.auth_provider(auth)
.build();
Builder is additive; existing constructors remain.
Backpressure and Channels
- Keep channel capacity at 1600 (as today) but make it a constant inside
clients/*so we can tune independently per client. - Decoders emit
OutputItemAddedbefore subsequent deltas for the same item when required by downstream consumers.
Migration Notes
- Public re-exports in
lib.rsremain stable. - Module moves are internal; no external callers need to change imports.
- When moving functions, preserve names and signatures where feasible to minimize diff churn.
Acceptance Criteria
- Both Chat and Responses clients reduce to thin orchestration files.
- SSE framing, retries, and rate-limit parsing exist exactly once and are used by both clients.
- All behavior remains functionally equivalent (or better tested) after refactor.
- New unit tests cover framing, decoders, and payload builders.
Open Questions
- Should
aggregate.rsown more of the delta → aggregated assembly, now that both decoders emit the sameResponseEventkinds? For this iteration, keep as-is. - Should we expose a single unified
Clientthat auto-selects Chat/Responses by provider? We already haverouted_client; keep it stable and thin it later using the new internals. - Do we want to expose backoff policy knobs at runtime? For now, keep provider-driven.
This plan preserves the external API while making internals smaller, reusable, and easier to test. It can be applied incrementally with meaningful checkpoints and test coverage increases at each step.