mirror of https://github.com/openai/codex.git synced 2026-04-24 06:35:50 +00:00

Files

starr-openai 4abc243801 exec-server: add in-process client mode

Co-authored-by: Codex <noreply@openai.com>

2026-03-18 10:29:08 -07:00

7.2 KiB

Raw Blame History

exec-server design notes

This document sketches a likely direction for integrating codex-exec-server with unified exec without baking the full tool-call policy stack into the server.

The goal is:

keep exec-server generic and reusable
keep approval, sandbox, and retry policy in core
preserve the unified-exec event flow the model already depends on
support retained output caps so polling and snapshot-style APIs do not grow memory without bound

Unified exec today

Today the flow for LLM-visible interactive execution is:

The model sees the exec_command and write_stdin tools.
UnifiedExecHandler parses the tool arguments and allocates a process id.
UnifiedExecProcessManager::exec_command(...) calls open_session_with_sandbox(...).
ToolOrchestrator drives approval, sandbox selection, managed network approval, and sandbox-denial retry behavior.
UnifiedExecRuntime builds a CommandSpec, asks the current SandboxAttempt to transform it into an ExecRequest, and passes that resolved request back to the process manager.
open_session_with_exec_env(...) spawns the process from that resolved ExecRequest.
Unified exec emits an ExecCommandBegin event.
Unified exec starts a background output watcher that emits ExecCommandOutputDelta events.
The initial tool call collects output until the requested yield deadline and returns an ExecCommandToolOutput snapshot to the model.
If the process is still running, unified exec stores it and later emits ExecCommandEnd when the exit watcher fires.
A later write_stdin tool call writes to the stored process, emits a TerminalInteraction event, collects another bounded snapshot, and returns that tool response to the model.

Important observation: the 250ms / 10s yield-window behavior is not really a process-server concern. It is a client-side convenience layer for the LLM tool API. The server should focus on raw process lifecycle and streaming events.

Proposed boundary

The clean split is:

exec-server server: process lifecycle, output streaming, retained output caps
exec-server client: wait, communicate, yield-window helpers, session bookkeeping
unified exec in core: tool parsing, event emission, approvals, sandboxing, managed networking, retry semantics

If exec-server is used by unified exec later, the boundary should sit between step 5 and step 6 above: after policy has produced a resolved spawn request, but before the actual PTY or pipe spawn.

Suggested process API

Start simple and explicit:

process/start
process/write
process/closeStdin
process/resize
process/terminate
process/wait
process/snapshot

Server notifications:

process/output
process/exited
optionally process/started
optionally process/failed

Suggested request shapes:

enum ProcessStartRequest {
    Direct(DirectExecSpec),
    Prepared(PreparedExecSpec),
}

struct DirectExecSpec {
    process_id: String,
    argv: Vec<String>,
    cwd: PathBuf,
    env: HashMap<String, String>,
    arg0: Option<String>,
    io: ProcessIo,
}

struct PreparedExecSpec {
    process_id: String,
    request: PreparedExecRequest,
    io: ProcessIo,
}

enum ProcessIo {
    Pty { rows: u16, cols: u16 },
    Pipe { stdin: StdinMode },
}

enum StdinMode {
    Open,
    Closed,
}

enum TerminateMode {
    Graceful { timeout_ms: u64 },
    Force,
}

Notes:

processId remains a protocol handle, not an OS pid.
wait is a good generic API because many callers want process completion without manually wiring notifications.
communicate is also a reasonable API, but it should probably start as a client helper built on top of write + closeStdin + wait + snapshot.
If an RPC form of communicate is added later, it should be a convenience wrapper rather than the primitive execution model.

Output capping

Even with event streaming, the server should retain a bounded amount of output per process so callers can poll, wait, or reconnect without unbounded memory growth.

Suggested behavior:

stream every output chunk live via process/output
retain capped output per process in memory
keep stdout and stderr separately for pipe-backed processes
for PTY-backed processes, treat retained output as a single terminal stream
expose truncation metadata on snapshots

Suggested snapshot response:

struct ProcessSnapshot {
    stdout: Vec<u8>,
    stderr: Vec<u8>,
    terminal: Vec<u8>,
    truncated: bool,
    exit_code: Option<i32>,
    running: bool,
}

Implementation-wise, the current HeadTailBuffer pattern used by unified exec is a good fit. The cap should be server config, not request config, so memory use stays predictable.

Sandboxing and networking

How unified exec does it today

Unified exec does not hand raw command args directly to the PTY layer for tool calls. Instead, it:

computes approval requirements
chooses a sandbox attempt
applies managed-network policy if needed
transforms CommandSpec into ExecRequest
spawns from that resolved ExecRequest

That split is already valuable and should be preserved.

Recommended exec-server design

Do not put approval policy into exec-server.

Instead, support two execution modes:

Direct: raw command, intended for orchestrator-side or already-trusted use
Prepared: already-resolved spawn request, intended for tool-call execution

For tool calls from the LLM side:

core runs the existing approval + sandbox + managed-network flow
core produces a resolved ExecRequest
the exec-server client sends PreparedExecSpec
exec-server spawns exactly that request and streams process events

For orchestrator-side execution:

caller sends DirectExecSpec
exec-server spawns directly without running approval or sandbox policy

This gives one generic process API while keeping the policy-sensitive logic in the place that already owns it.

Why not make exec-server own sandbox selection?

That would force exec-server to understand:

approval policy
exec policy / prefix rules
managed-network approval flow
sandbox retry semantics
guardian routing
feature-flag-driven sandbox selection
platform-specific sandbox helper configuration

That is too opinionated for a reusable process service.

Optional future server config

If exec-server grows beyond the current prototype, a config object like this would be enough:

struct ExecServerConfig {
    shutdown_grace_period_ms: u64,
    max_processes_per_connection: usize,
    retained_output_bytes_per_process: usize,
    allow_direct_exec: bool,
    allow_prepared_exec: bool,
}

That keeps policy surface small:

lifecycle limits live in the server
trust and sandbox policy stay with the caller

Mapping back to LLM-visible events

If unified exec is later backed by exec-server, the core client wrapper should keep owning the translation into the existing event model:

process/start success -> ExecCommandBegin
process/output -> ExecCommandOutputDelta
local process/write call -> TerminalInteraction
process/exited plus retained transcript -> ExecCommandEnd

That preserves the current LLM-facing contract while making the process backend swappable.

7.2 KiB Raw Blame History