mirror of https://github.com/openai/codex.git synced 2026-05-02 10:26:45 +00:00

Files

jif-oai 7a5786f49f V9

2025-11-10 17:52:04 +00:00

12 KiB

Raw Blame History

codex-api-client: Proposed Design and Refactor Plan

This document proposes a clearer, smaller, and testable structure for codex-api-client, targeting the current pain points:

chat.rs and responses.rs are large (600–1100 LOC) and mix multiple concerns.
SSE parsing, HTTP/retry logic, payload building, and domain event mapping are tangled.
Azure/ChatGPT quirks live alongside core logic.

The goals here are separation of concerns, shared streaming and retry logic, and focused files that are easy to read and test.

Overview

Keep the public API surface compatible: ApiClient trait, ResponsesApiClient, ChatCompletionsApiClient, ResponseStream, and ResponseEvent remain.
Internally, split responsibilities into small modules that both clients reuse.
Centralize SSE framing and retry/backoff, so chat and responses clients focus only on:
- payload construction (Prompt → wire payload)
- mapping wire SSE events → ResponseEvent

Target Module Layout

api-client/src/
  api.rs                       # ApiClient trait (unchanged)
  error.rs                     # Error/Result (unchanged interface)
  stream.rs                    # ResponseEvent/ResponseStream (unchanged interface)
  aggregate.rs                 # Aggregation mode (unchanged interface)
  model_provider.rs            # Provider config + headers (unchanged interface)
  routed_client.rs             # Facade routing to Chat/Responses (unchanged interface)

  client/
    mod.rs                     # Re-exports + shared types
    config.rs                  # Common config structs/builders
    http.rs                    # Request building, retries, backoff; returns ByteStream
    rate_limits.rs             # Header parsing → RateLimitSnapshot
    sse.rs                     # Generic SSE line framing + idle-timeout handling
    fixtures.rs                # stream_from_fixture (move from responses.rs)

  payload/
    chat.rs                    # Prompt → Chat Completions JSON
    responses.rs               # Prompt → Responses JSON (+ Azure quirks)
    tools.rs                   # Tool schema conversions and helpers

  decode/
    chat.rs                    # Chat SSE JSON → ResponseEvent (+ function-call state)
    responses.rs               # Responses SSE JSON → ResponseEvent

  clients/
    chat.rs                    # ChatCompletionsApiClient (thin; delegates to payload/http/decode)
    responses.rs               # ResponsesApiClient (thin; delegates to payload/http/decode)

Notes

Modules are organized by responsibility. The clients/ layer becomes very small.
client/http.rs owns retries/backoff, request building, headers, and returns a Stream<Item = Result<Bytes>>.
client/sse.rs owns SSE framing and idle-timeout. It surfaces framed JSON strings to decoders.
decode/* mappers transform framed JSON into ResponseEvent using only parsing/state.
payload/* generate request JSON. Azure and tool-shape specifics live here.
client/rate_limits.rs parses headers and emits a ResponseEvent::RateLimits once, near stream start.
client/fixtures.rs provides the file-backed stream used by tests and local dev.

Trait-Based Core

Introduce small traits for payload construction and decoding to maximize reuse and make the concrete Chat/Responses clients thin bindings.

PayloadBuilder
- fn build(&self, prompt: &Prompt) -> Result<serde_json::Value>
- Implementations: payload::chat::Builder, payload::responses::Builder.
ResponseDecoder
- Consumes framed SSE JSON and emits ResponseEvents.
- Suggested interface:
  - fn on_frame(&mut self, json: &str, tx: &mpsc::Sender<Result<ResponseEvent>>, otel: &OtelEventManager) -> Result<()>
  - Implementations: decode::chat::Decoder, decode::responses::Decoder.
Optional adapters
- RateLimitProvider: fn parse(&self, headers: &HeaderMap) -> Option<RateLimitSnapshot>
- RequestCustomizer: per-API header tweaks (e.g., Conversations/Session headers for Responses).

With these traits, a generic client wrapper can stitch components together:

struct GenericClient<PB, DEC> {
  http: RequestExecutor,
  payload: PB,
  decoder: DEC,
  idle: Duration,
  otel: OtelEventManager,
}

impl<PB: PayloadBuilder, DEC: ResponseDecoder> GenericClient<PB, DEC> {
  async fn stream(&self, prompt: &Prompt) -> Result<ResponseStream> {
    let payload = self.payload.build(prompt)?;
    let (headers, bytes) = self.http.execute_stream(payload, prompt).await?;
    if let Some(snapshot) = rate_limits::parse(&headers) { /* emit event */ }
    let sse_stream = sse::frame(bytes, self.idle, self.otel.clone());
    // spawn: for each framed JSON chunk → self.decoder.on_frame(...)
    /* return ResponseStream */
  }
}

Chat/Responses become type aliases or thin wrappers around GenericClient with the appropriate PayloadBuilder and ResponseDecoder.

Responsibility Boundaries

Clients (clients/chat.rs, clients/responses.rs)
- Validate prompt constraints (e.g., Chat lacks output_schema).
- Build payload via payload::*.
- Build and send request via client/http.rs.
- Create an SSE pipeline: http::stream(...) → sse::frame(...) → decode::<api>::map(...).
- Forward ResponseEvents to the mpsc channel.
HTTP (client/http.rs)
- RequestExecutor::execute_stream(req: RequestSpec) -> Result<(Headers, ByteStream)>.
- Injects auth/session headers and provider headers via ModelProviderInfo.
- Centralized retry policy for non-2xx, 429, 401, 5xx, and transport errors.
- Handles Retry-After and exponential backoff (backoff()).
- Returns first successful response’s headers and stream; does not parse SSE.
SSE (client/sse.rs)
- Takes a Stream<Item = Result<Bytes>> and produces framed JSON strings by handling data: lines and chunk boundaries.
- Enforces idle timeout and signals early stream termination errors.
- Does no schema parsing; just a robust line/framing codec.
Decoders (decode/chat.rs, decode/responses.rs)
- Take framed JSON string(s) and emit ResponseEvents.
- Own API-specific state machines: e.g., Chat function-call accumulation; Responses “event-shaped” and “field-shaped” variants.
- No networking, no backoff, no channels.
Payload builders (payload/chat.rs, payload/responses.rs, payload/tools.rs)
- Convert Prompt to provider-specific JSON (Chat/Responses). Keep pure and deterministic.
- Azure-specific adjustments (e.g., attach item IDs) live here.
Rate limits (client/rate_limits.rs)
- Parse headers to RateLimitSnapshot.
- Emit a single ResponseEvent::RateLimits at stream start when present.

Stream Pipeline

ByteStream (reqwest) → sse::frame (idle timeout, data: framing) → decode::<api> → ResponseEvent

Pseudocode for both clients:

let (headers, byte_stream) = http.execute_stream(request_spec).await?;
if let Some(snapshot) = rate_limits::parse(&headers) {
    tx.send(Ok(ResponseEvent::RateLimits(snapshot))).await.ok();
}
let sse_stream = sse::frame(byte_stream, idle_timeout, otel.clone());
tokio::spawn(decode::<Api>::run(sse_stream, tx.clone(), otel.clone()));
Ok(ResponseStream { rx_event })

Where decode::<Api>::run is API-specific mapping of framed JSON into ResponseEvents.

Incremental Refactor Plan

Do this in small, safe steps. Public API stays stable at each step.

Introduce traits

Add PayloadBuilder and ResponseDecoder traits.
Provide initial implementations backed by existing code paths to minimize churn.

Extract shared helpers

Move rate-limit parsing from responses.rs to client/rate_limits.rs.
Move stream_from_fixture to client/fixtures.rs.
Keep old re-exports from lib.rs to avoid churn.

Isolate SSE framing

Extract line framing + idle-timeout from responses.rs::process_sse into client/sse.rs.
Have responses.rs use sse::frame and keep its own JSON mapping for now.

Centralize HTTP execution

Create client/http.rs with RequestExecutor handling retries/backoff and returning (headers, stream).
Switch responses.rs to use it.
Align Chat client to use RequestExecutor as well.

Split JSON mapping into decoders

Move JSON → ResponseEvent mapping from responses.rs to decode/responses.rs.
Do the same for Chat (chat.rs → decode/chat.rs).

Extract payload builders

Move payload JSON construction into payload/chat.rs and payload/responses.rs.
Move tool helpers into payload/tools.rs.

Thin the clients

Create clients/chat.rs and clients/responses.rs that glue together payload → http → sse → decode.
Keep existing type names and impl ApiClient blocks; only relocate logic behind them.

Clean-up and local boundaries

Remove now-unused code paths from the original large files.
Ensure mod declarations reflect the new module structure.

Tests and validation

Unit-test sse::frame against split and concatenated data: lines.
Unit-test both decoders with small fixtures for typical and edge cases.
Unit-test payload builders on prompts containing messages, images, tools, and reasoning.
Keep existing integration tests using stream_from_fixture.

File Size Targets (post-refactor)

clients/chat.rs: ~100–150 LOC
clients/responses.rs: ~150–200 LOC
decode/chat.rs: ~200–250 LOC (function-call state lives here)
decode/responses.rs: ~250–300 LOC (event/field-shaped mapping)
client/http.rs: ~150–200 LOC (shared retries)
client/sse.rs: ~120–160 LOC (framing + timeout)
payload/chat.rs: ~120–180 LOC
payload/responses.rs: ~120–160 LOC

Error Handling and Retries

Single retry policy in client/http.rs:
- Retry 429/401/5xx with Retry-After when present or with exponential backoff.
- Transport errors (DNS/reset/timeouts) are retryable up to provider-configured attempts.
- Non-retryable statuses return UnexpectedStatus with body for diagnosis.
decode/* surface protocol-specific “quota/context window exceeded” errors as stable messages already recognized by callers.

Instrumentation

sse::frame triggers idle-timeout failures and marks event kinds only when actual JSON events appear; decoders record specific kinds (e.g., response.completed).
http::execute_stream wraps the request with otel_event_manager.log_request(...) and populates request_id when applicable.

Azure and ChatGPT Specifics

Keep all Azure id attachment logic in payload/responses.rs.
Keep ChatGPT auth header handling in http.rs via AuthProvider (unchanged trait), based on RequestSpec’s context.

Configuration

Optionally introduce typed builders for client configs in client/config.rs to reduce parameter plumbing and make defaults explicit:

ResponsesConfig::builder()
  .provider(provider)
  .model(model)
  .conversation_id(conv_id)
  .otel(otel)
  .auth_provider(auth)
  .build();

Builder is additive; existing constructors remain.

Backpressure and Channels

Keep channel capacity at 1600 (as today) but make it a constant inside clients/* so we can tune independently per client.
Decoders emit OutputItemAdded before subsequent deltas for the same item when required by downstream consumers.

Migration Notes

Public re-exports in lib.rs remain stable.
Module moves are internal; no external callers need to change imports.
When moving functions, preserve names and signatures where feasible to minimize diff churn.

Acceptance Criteria

Both Chat and Responses clients reduce to thin orchestration files.
SSE framing, retries, and rate-limit parsing exist exactly once and are used by both clients.
All behavior remains functionally equivalent (or better tested) after refactor.
New unit tests cover framing, decoders, and payload builders.

Open Questions

Should aggregate.rs own more of the delta → aggregated assembly, now that both decoders emit the same ResponseEvent kinds? For this iteration, keep as-is.
Should we expose a single unified Client that auto-selects Chat/Responses by provider? We already have routed_client; keep it stable and thin it later using the new internals.
Do we want to expose backoff policy knobs at runtime? For now, keep provider-driven.

This plan preserves the external API while making internals smaller, reusable, and easier to test. It can be applied incrementally with meaningful checkpoints and test coverage increases at each step.

12 KiB Raw Blame History Unescape Escape