Files
codex/codex-rs/api-client/client.md
jif-oai 7a5786f49f V9
2025-11-10 17:52:04 +00:00

12 KiB
Raw Blame History

codex-api-client: Proposed Design and Refactor Plan

This document proposes a clearer, smaller, and testable structure for codex-api-client, targeting the current pain points:

  • chat.rs and responses.rs are large (6001100 LOC) and mix multiple concerns.
  • SSE parsing, HTTP/retry logic, payload building, and domain event mapping are tangled.
  • Azure/ChatGPT quirks live alongside core logic.

The goals here are separation of concerns, shared streaming and retry logic, and focused files that are easy to read and test.

Overview

  • Keep the public API surface compatible: ApiClient trait, ResponsesApiClient, ChatCompletionsApiClient, ResponseStream, and ResponseEvent remain.
  • Internally, split responsibilities into small modules that both clients reuse.
  • Centralize SSE framing and retry/backoff, so chat and responses clients focus only on:
    • payload construction (Prompt → wire payload)
    • mapping wire SSE events → ResponseEvent

Target Module Layout

api-client/src/
  api.rs                       # ApiClient trait (unchanged)
  error.rs                     # Error/Result (unchanged interface)
  stream.rs                    # ResponseEvent/ResponseStream (unchanged interface)
  aggregate.rs                 # Aggregation mode (unchanged interface)
  model_provider.rs            # Provider config + headers (unchanged interface)
  routed_client.rs             # Facade routing to Chat/Responses (unchanged interface)

  client/
    mod.rs                     # Re-exports + shared types
    config.rs                  # Common config structs/builders
    http.rs                    # Request building, retries, backoff; returns ByteStream
    rate_limits.rs             # Header parsing → RateLimitSnapshot
    sse.rs                     # Generic SSE line framing + idle-timeout handling
    fixtures.rs                # stream_from_fixture (move from responses.rs)

  payload/
    chat.rs                    # Prompt → Chat Completions JSON
    responses.rs               # Prompt → Responses JSON (+ Azure quirks)
    tools.rs                   # Tool schema conversions and helpers

  decode/
    chat.rs                    # Chat SSE JSON → ResponseEvent (+ function-call state)
    responses.rs               # Responses SSE JSON → ResponseEvent

  clients/
    chat.rs                    # ChatCompletionsApiClient (thin; delegates to payload/http/decode)
    responses.rs               # ResponsesApiClient (thin; delegates to payload/http/decode)

Notes

  • Modules are organized by responsibility. The clients/ layer becomes very small.
  • client/http.rs owns retries/backoff, request building, headers, and returns a Stream<Item = Result<Bytes>>.
  • client/sse.rs owns SSE framing and idle-timeout. It surfaces framed JSON strings to decoders.
  • decode/* mappers transform framed JSON into ResponseEvent using only parsing/state.
  • payload/* generate request JSON. Azure and tool-shape specifics live here.
  • client/rate_limits.rs parses headers and emits a ResponseEvent::RateLimits once, near stream start.
  • client/fixtures.rs provides the file-backed stream used by tests and local dev.

Trait-Based Core

Introduce small traits for payload construction and decoding to maximize reuse and make the concrete Chat/Responses clients thin bindings.

  • PayloadBuilder

    • fn build(&self, prompt: &Prompt) -> Result<serde_json::Value>
    • Implementations: payload::chat::Builder, payload::responses::Builder.
  • ResponseDecoder

    • Consumes framed SSE JSON and emits ResponseEvents.
    • Suggested interface:
      • fn on_frame(&mut self, json: &str, tx: &mpsc::Sender<Result<ResponseEvent>>, otel: &OtelEventManager) -> Result<()>
      • Implementations: decode::chat::Decoder, decode::responses::Decoder.
  • Optional adapters

    • RateLimitProvider: fn parse(&self, headers: &HeaderMap) -> Option<RateLimitSnapshot>
    • RequestCustomizer: per-API header tweaks (e.g., Conversations/Session headers for Responses).

With these traits, a generic client wrapper can stitch components together:

struct GenericClient<PB, DEC> {
  http: RequestExecutor,
  payload: PB,
  decoder: DEC,
  idle: Duration,
  otel: OtelEventManager,
}

impl<PB: PayloadBuilder, DEC: ResponseDecoder> GenericClient<PB, DEC> {
  async fn stream(&self, prompt: &Prompt) -> Result<ResponseStream> {
    let payload = self.payload.build(prompt)?;
    let (headers, bytes) = self.http.execute_stream(payload, prompt).await?;
    if let Some(snapshot) = rate_limits::parse(&headers) { /* emit event */ }
    let sse_stream = sse::frame(bytes, self.idle, self.otel.clone());
    // spawn: for each framed JSON chunk → self.decoder.on_frame(...)
    /* return ResponseStream */
  }
}

Chat/Responses become type aliases or thin wrappers around GenericClient with the appropriate PayloadBuilder and ResponseDecoder.

Responsibility Boundaries

  • Clients (clients/chat.rs, clients/responses.rs)

    • Validate prompt constraints (e.g., Chat lacks output_schema).
    • Build payload via payload::*.
    • Build and send request via client/http.rs.
    • Create an SSE pipeline: http::stream(...) → sse::frame(...) → decode::<api>::map(...).
    • Forward ResponseEvents to the mpsc channel.
  • HTTP (client/http.rs)

    • RequestExecutor::execute_stream(req: RequestSpec) -> Result<(Headers, ByteStream)>.
    • Injects auth/session headers and provider headers via ModelProviderInfo.
    • Centralized retry policy for non-2xx, 429, 401, 5xx, and transport errors.
    • Handles Retry-After and exponential backoff (backoff()).
    • Returns first successful responses headers and stream; does not parse SSE.
  • SSE (client/sse.rs)

    • Takes a Stream<Item = Result<Bytes>> and produces framed JSON strings by handling data: lines and chunk boundaries.
    • Enforces idle timeout and signals early stream termination errors.
    • Does no schema parsing; just a robust line/framing codec.
  • Decoders (decode/chat.rs, decode/responses.rs)

    • Take framed JSON string(s) and emit ResponseEvents.
    • Own API-specific state machines: e.g., Chat function-call accumulation; Responses “event-shaped” and “field-shaped” variants.
    • No networking, no backoff, no channels.
  • Payload builders (payload/chat.rs, payload/responses.rs, payload/tools.rs)

    • Convert Prompt to provider-specific JSON (Chat/Responses). Keep pure and deterministic.
    • Azure-specific adjustments (e.g., attach item IDs) live here.
  • Rate limits (client/rate_limits.rs)

    • Parse headers to RateLimitSnapshot.
    • Emit a single ResponseEvent::RateLimits at stream start when present.

Stream Pipeline

ByteStream (reqwest) → sse::frame (idle timeout, data: framing) → decode::<api> → ResponseEvent

Pseudocode for both clients:

let (headers, byte_stream) = http.execute_stream(request_spec).await?;
if let Some(snapshot) = rate_limits::parse(&headers) {
    tx.send(Ok(ResponseEvent::RateLimits(snapshot))).await.ok();
}
let sse_stream = sse::frame(byte_stream, idle_timeout, otel.clone());
tokio::spawn(decode::<Api>::run(sse_stream, tx.clone(), otel.clone()));
Ok(ResponseStream { rx_event })

Where decode::<Api>::run is API-specific mapping of framed JSON into ResponseEvents.

Incremental Refactor Plan

Do this in small, safe steps. Public API stays stable at each step.

  1. Introduce traits
  • Add PayloadBuilder and ResponseDecoder traits.
  • Provide initial implementations backed by existing code paths to minimize churn.
  1. Extract shared helpers
  • Move rate-limit parsing from responses.rs to client/rate_limits.rs.
  • Move stream_from_fixture to client/fixtures.rs.
  • Keep old re-exports from lib.rs to avoid churn.
  1. Isolate SSE framing
  • Extract line framing + idle-timeout from responses.rs::process_sse into client/sse.rs.
  • Have responses.rs use sse::frame and keep its own JSON mapping for now.
  1. Centralize HTTP execution
  • Create client/http.rs with RequestExecutor handling retries/backoff and returning (headers, stream).
  • Switch responses.rs to use it.
  • Align Chat client to use RequestExecutor as well.
  1. Split JSON mapping into decoders
  • Move JSON → ResponseEvent mapping from responses.rs to decode/responses.rs.
  • Do the same for Chat (chat.rsdecode/chat.rs).
  1. Extract payload builders
  • Move payload JSON construction into payload/chat.rs and payload/responses.rs.
  • Move tool helpers into payload/tools.rs.
  1. Thin the clients
  • Create clients/chat.rs and clients/responses.rs that glue together payload → http → sse → decode.
  • Keep existing type names and impl ApiClient blocks; only relocate logic behind them.
  1. Clean-up and local boundaries
  • Remove now-unused code paths from the original large files.
  • Ensure mod declarations reflect the new module structure.
  1. Tests and validation
  • Unit-test sse::frame against split and concatenated data: lines.
  • Unit-test both decoders with small fixtures for typical and edge cases.
  • Unit-test payload builders on prompts containing messages, images, tools, and reasoning.
  • Keep existing integration tests using stream_from_fixture.

File Size Targets (post-refactor)

  • clients/chat.rs: ~100150 LOC
  • clients/responses.rs: ~150200 LOC
  • decode/chat.rs: ~200250 LOC (function-call state lives here)
  • decode/responses.rs: ~250300 LOC (event/field-shaped mapping)
  • client/http.rs: ~150200 LOC (shared retries)
  • client/sse.rs: ~120160 LOC (framing + timeout)
  • payload/chat.rs: ~120180 LOC
  • payload/responses.rs: ~120160 LOC

Error Handling and Retries

  • Single retry policy in client/http.rs:
    • Retry 429/401/5xx with Retry-After when present or with exponential backoff.
    • Transport errors (DNS/reset/timeouts) are retryable up to provider-configured attempts.
    • Non-retryable statuses return UnexpectedStatus with body for diagnosis.
  • decode/* surface protocol-specific “quota/context window exceeded” errors as stable messages already recognized by callers.

Instrumentation

  • sse::frame triggers idle-timeout failures and marks event kinds only when actual JSON events appear; decoders record specific kinds (e.g., response.completed).
  • http::execute_stream wraps the request with otel_event_manager.log_request(...) and populates request_id when applicable.

Azure and ChatGPT Specifics

  • Keep all Azure id attachment logic in payload/responses.rs.
  • Keep ChatGPT auth header handling in http.rs via AuthProvider (unchanged trait), based on RequestSpecs context.

Configuration

Optionally introduce typed builders for client configs in client/config.rs to reduce parameter plumbing and make defaults explicit:

ResponsesConfig::builder()
  .provider(provider)
  .model(model)
  .conversation_id(conv_id)
  .otel(otel)
  .auth_provider(auth)
  .build();

Builder is additive; existing constructors remain.

Backpressure and Channels

  • Keep channel capacity at 1600 (as today) but make it a constant inside clients/* so we can tune independently per client.
  • Decoders emit OutputItemAdded before subsequent deltas for the same item when required by downstream consumers.

Migration Notes

  • Public re-exports in lib.rs remain stable.
  • Module moves are internal; no external callers need to change imports.
  • When moving functions, preserve names and signatures where feasible to minimize diff churn.

Acceptance Criteria

  • Both Chat and Responses clients reduce to thin orchestration files.
  • SSE framing, retries, and rate-limit parsing exist exactly once and are used by both clients.
  • All behavior remains functionally equivalent (or better tested) after refactor.
  • New unit tests cover framing, decoders, and payload builders.

Open Questions

  • Should aggregate.rs own more of the delta → aggregated assembly, now that both decoders emit the same ResponseEvent kinds? For this iteration, keep as-is.
  • Should we expose a single unified Client that auto-selects Chat/Responses by provider? We already have routed_client; keep it stable and thin it later using the new internals.
  • Do we want to expose backoff policy knobs at runtime? For now, keep provider-driven.

This plan preserves the external API while making internals smaller, reusable, and easier to test. It can be applied incrementally with meaningful checkpoints and test coverage increases at each step.