clone/codex

Fork 0

mirror of https://github.com/openai/codex.git synced 2026-04-24 14:45:27 +00:00

Files

sdcoffey dd9cc542ed plan

2026-03-06 16:54:21 -08:00

36 KiB

Raw Blame History

RFC: Host-Delegated Codex App-Server for Universal Computer

Summary

Universal Computer already has the right high-level instinct: the host SDK should own orchestration, credentials, approvals, persistence, and backend selection, while the remote runtime should own local execution against the target filesystem and sandbox.

Today, those responsibilities are split awkwardly. Universal Computer's Python SDK builds the Responses request, defines tools in Python, and interprets raw model output into actionable tool calls. Codex app-server, by contrast, already has a richer Rust-native execution engine, tool surface, approval model, and event model, but it assumes Codex itself is the party speaking to the Responses API and, in normal operation, the party managing rollout persistence.

The proposal is to add a new full delegation mode to codex app-server:

Codex still runs inside the destination container or locally.
Codex still owns prompt assembly, tool registration, tool execution, approvals, and turn semantics.
The host SDK becomes the sole party that talks to the Responses API.
The host SDK also becomes the source of truth for rollout persistence.
The app-server protocol grows a small set of new server-initiated requests and client responses so Codex can ask the host to create, stream, cancel, and finalize upstream model requests.

This gives Universal Computer what it wants: reuse of Codex's Rust-native tool/runtime behavior without giving up host-side orchestration, multi-container routing, approval policy, or host-managed conversation state.

Context

What Universal Computer does well today

From the Universal Computer side, the architecture is already clean:

Agent owns declarative configuration:
- base_instructions
- developer_instructions
- user_instructions
- plugins
- tools
- sampling_params
TaskContext owns startup, manifest injection, snapshotting, and session binding.
Task is the durable rollout object:
- it stores context
- it stores resumable session state
- it streams raw Responses events
- it pauses when tool calls are pending
plugins are more than tool bundles:
- they can contribute instructions
- mutate context
- mutate sampling params
- mutate manifest/session setup

That is an important constraint: Universal Computer is not merely a remote shell. It is a host orchestration framework.

What Codex app-server already provides

Codex app-server is already surprisingly close to what we need:

thread and turn lifecycle APIs
streaming turn/item notifications
server-initiated approval requests
dynamic tools
apps/plugins/skills integration
configurable developer instructions and other session settings
client-managed notification transport

But the current model assumes:

Codex itself makes the Responses API request
Codex owns the upstream stream lifecycle
Codex is the natural home for thread persistence

That assumption is the seam that needs to change.

Design principle

The right boundary is:

Host SDK owns external orchestration
Remote Codex owns local execution semantics

More concretely:

Host-owned

Responses API transport and credentials
rollout persistence
backend selection
multi-container routing
approval UX and policy
high-level session bootstrap

Codex-owned

instruction compilation
model request planning
tool schema materialization
tool execution against the live workspace/container
item/turn state machine
normalization of model events into Codex semantics

This is the key pushback: the host should not have to reconstruct Codex's prompts, tool schemas, or internal turn loop. If we force the SDK to do that, we reintroduce the exact duplication you want to eliminate.

Goals

Support running Codex app-server in a target container or locally.
Allow the host SDK to be the only component that talks to the Responses API.
Preserve Codex as the implementation of the default tool surface.
Preserve host-side approvals for all tools.
Preserve host-side rollout persistence as the source of truth.
Allow full client-provided configuration:
- base instructions
- developer instructions
- user instructions
- tool/plugin/app config
Allow the host to override or replace the default tool set.
Keep the protocol high-level and transport-agnostic enough for non-Docker backends.

Non-goals

Re-implement Codex tool behavior in the Python SDK.
Make the host responsible for prompt assembly.
Force app-server to lose its current direct-to-Responses mode.
Solve every multi-agent routing problem in the first iteration.

Proposed model: Full Delegation Mode

Add a new app-server execution mode, conceptually:

direct mode: current behavior
fullDelegation mode: new behavior

In fullDelegation mode:

The host starts app-server inside the target environment.
The host provides all desired configuration at thread/session startup.
Codex prepares the next upstream Responses request, but does not send it.
Codex emits a server-initiated request to the host containing the prepared upstream request envelope.
The host executes that request against the Responses API.
The host streams upstream events back into app-server.
Codex consumes those events, updates turn state, emits its normal item/turn notifications, and requests approvals or user input as needed.
The host persists the resulting rollout externally.

This is not "remote shell plus JSON." It is better understood as remote Codex with externalized model transport.

Why this fits Universal Computer

Today Universal Computer's Task.run() does three important jobs:

build the request
stream events
pause for tool calls

Under this RFC:

job 1 moves from Python to Codex
job 2 remains host-owned at the transport layer
job 3 becomes cleaner, because Codex itself now owns tool interpretation and execution

That is a net simplification.

Protocol additions

The existing app-server pattern to imitate is the approval flow: Codex can already issue server-initiated JSON-RPC requests to the client and resume when the client responds.

Full delegation should reuse that same pattern.

New concepts

1. Delegated model request

Codex needs a way to say:

"Here is the exact upstream request I want to make. Please make it for me, and stream the result back."

Proposed request:

model/request

Purpose:

server-initiated request from Codex to host
carries a canonicalized Responses request envelope

This envelope should include, at minimum:

model
instructions or compiled system input
input items/messages
tool definitions
reasoning config
tool-choice config
request-level overrides derived from SDK/user configuration, such as reasoning effort, summary mode, verbosity, and other per-turn sampling controls
metadata needed for correlation
optional previous-response linkage if Codex wants it
stream expectation
opaque session/turn correlation ids

The important point is that this is Codex-authored. The host forwards it, it does not reinterpret it.

2. Delegated model stream injection

The host needs a way to stream upstream events back into Codex.

Proposed client method:

model/streamEvent

Purpose:

client-to-server notification or request delivering one upstream Responses stream event at a time

The server should accept:

raw upstream event payload
correlation id tying the event to the outstanding model/request

This lets Codex continue using its native event handling logic.

3. Terminal stream semantics

For normal operation, Codex should infer terminal model state from the raw upstream Responses events themselves, especially response.completed and response.failed. In other words, the canonical end-of-turn signal should come from the same event stream Codex is already consuming.

A separate client method is only needed for cases where the host cannot provide a terminal Responses event, for example:

the host canceled the upstream request before a terminal event was emitted
the network stream disconnected mid-flight
the host rejected the delegated request before sending it upstream

In that narrower case, a small escape hatch such as model/streamAborted is useful. It should carry:

delegated request id
abort reason such as canceled, disconnected, or requestRejected
normalized error info if relevant

This keeps the happy path simple while still giving Codex a way to distinguish "the model finished" from "the host-side transport broke."

4. Delegated model cancellation

Codex may need to ask the host to cancel an in-flight upstream request.

Proposed server request:

model/cancel

This is important for:

turn interruption
approval denial during streaming
client disconnect handling
compaction or reroute logic

5. External rollout mode

Codex needs to know it is not the durable source of truth.

Proposed thread/session config:

rolloutOwnership: "server" | "client"

In the new mode, use "client".

Behaviorally, this means:

server may still keep ephemeral in-memory turn state
server should not assume persisted thread state is canonical
resume/fork semantics should allow the client to provide prior rollout context explicitly

6. External history hydrate

If the host owns persistence, Codex needs a way to rehydrate a thread from client-supplied history.

Proposed startup field or dedicated method:

thread/start or thread/resume with initialItems / turnHistory

This should be the normalized Codex-facing history representation, not raw Responses-only items.

That keeps Codex's turn engine informed without forcing SQLite/file rollout ownership back into the container.

There is already an implicit translation boundary here today: Codex does not operate on raw SSE events as its durable thread model. It turns upstream Responses output into a richer internal history made of turns and items. Client-owned rollout mode would make that boundary explicit. The host would persist the Codex-facing item history it receives over app-server notifications, then feed that normalized history back on resume, rather than trying to reconstruct a thread from raw Responses API events alone.

Proposed event surface

A clean high-level set could be:

Server -> client

model/request
model/cancel
delegation/request for subagents or cross-container execution
existing approval requests remain unchanged
existing item/tool/requestUserInput remains unchanged

Client -> server

model/streamEvent
model/streamAborted
model/requestRejected
delegation/result
existing approval decisions remain unchanged
existing tool/user-input responses remain unchanged

Mermaid: end-to-end flow

sequenceDiagram
    participant Host as "Universal Computer SDK (host)"
    participant Codex as "Codex app-server (container/local)"
    participant API as "Responses API"

    Host->>Codex: thread/start + full config + fullDelegation
    Host->>Codex: turn/start(user input)

    Codex-->>Host: model/request(request envelope)
    Host->>API: POST /v1/responses (stream=true)

    loop Streaming
        API-->>Host: response event
        Host->>Codex: model/streamEvent(event)
        Codex-->>Host: item/turn notifications
    end

    Codex-->>Host: item/commandExecution/requestApproval
    Host->>Host: programmatic approval policy
    Host-->>Codex: approval decision

    Codex->>Codex: execute tool in container

    Codex-->>Host: model/request(next request after tool output)
    Host->>API: next Responses call
    API-->>Host: terminal event
    Host->>Codex: model/streamEvent(response.completed)

    Codex-->>Host: turn/completed
    Host->>Host: persist rollout as source of truth

Mermaid: state ownership

In plain English: the host SDK remains the control plane, and Codex inside the container remains the execution plane. The host is responsible for the things that need global visibility or trust: talking to the Responses API, persisting rollout state, deciding approval policy, and deciding where delegated work should run. Codex is responsible for the things that need local workspace context: assembling the actual model request, running the turn state machine, choosing and executing tools, and applying side effects inside the container. The diagram below is just showing that split of responsibilities rather than a strict request-by-request sequence.

flowchart LR
    subgraph Host["Host SDK"]
        H1["Responses auth + transport"]
        H2["Rollout persistence"]
        H3["Approval policy"]
        H4["Backend routing / multi-container"]
    end

    subgraph Remote["Remote Codex app-server"]
        C1["Prompt + request synthesis"]
        C2["Turn state machine"]
        C3["Tool registry + execution"]
        C4["Workspace-local side effects"]
    end

    H1 --> C2
    H2 --> C2
    H3 --> C3
    H4 --> C2
    C1 --> H1
    C2 --> H2
    C3 --> H3

Behavioral changes required inside Codex

1. Separate "prepare request" from "send request"

Today those are effectively fused. Full delegation requires Codex to:

build the canonical upstream request
stop before transport
wait for externally streamed events

That is the fundamental internal refactor.

2. Accept externally sourced Responses events as first-class input

Codex must be able to ingest a Responses event stream that it did not open itself.

This means:

correlation of event stream to active turn/request
same parsing, validation, and item synthesis path as direct mode
same terminal handling and retry semantics where applicable

3. Make thread persistence optional, not authoritative

In client-owned rollout mode, Codex should treat persistence as operational cache, not source of truth.

A good discipline is:

in-memory state for active turn execution
explicit rehydrate from client history on resume
no hidden reliance on local rollout files for correctness

4. Make tool registry fully session-configurable

This is already partly present through dynamic tools, plugins, and apps, but the new mode should make it explicit that the tool surface may be:

default Codex tools
default Codex tools plus client additions
a full client override
a minimal safe subset

The important policy question is precedence. My recommendation:

default
default + additive overrides
replace entirely

as three explicit modes, not implicit merging.

5. Preserve current approval semantics across all tools

Approvals must remain server-initiated from Codex to host, because that is the clean point where the host can inject policy without reimplementing runtime behavior.

Operationally, this likely means Codex should be started with an approval configuration that never blocks on an in-container human prompt and instead always routes approval decisions through app-server requests to the host. The host SDK then becomes the policy engine and UI surface for tool approvals, while Codex remains the party that formulates the execution request and enforces the answer.

The host should not be approving raw Responses tool call output. It should be approving Codex's normalized execution intent: "run this command," "apply this patch," "grant this network access," and so on.

6. Support host-intercepted delegation as a future sibling of model delegation

If you want multi-container delegation, do not hide subagent creation entirely inside the container runtime. Give it a parallel host-visible control point.

Concretely, app-server should emit a delegation/request event whenever Codex wants to spawn a subagent. That event should include:

the parent thread/turn context
the requested subagent instructions and input items
the requested tool/profile configuration
execution hints such as preferred cwd, sandbox, or model
enough metadata for the host to correlate the child back to the parent

The host SDK can then choose one of two paths:

run the subagent in the same container and return a delegation/result
materialize it as a separate top-level agent on another backend or container and still return a delegation/result

In both cases, Codex should treat the result as a structured child outcome rather than assuming where or how the subagent ran. That gives the SDK user real control over topology without making Codex blind to delegated work.

Configuration model

Full delegation mode must support all the configuration Universal Computer already treats as first-class:

base instructions
developer instructions
user instructions
model choice
sampling params
plugin declarations
app/plugin auth context
tool override policy
approval policy
cwd / manifest / workspace metadata

The cleanest way to do this is:

client sends declarative config to app-server
Codex composes the actual upstream request
host transmits that request unchanged

This preserves client control without creating dual prompt builders.

In practice, this means there are two useful layers of configuration:

session-level defaults supplied when establishing the thread or runtime
request-level overrides supplied per turn, such as reasoning effort, summaries, verbosity, or other sampling controls

Codex should own how those layers merge into the final upstream request, but the SDK user should still be able to express both layers declaratively.

Tooling model

Universal Computer's current plugins can affect:

tools
instructions
sampling params
context
manifest

Codex app-server should not try to mimic Python plugin objects. Instead, the protocol should expose the resulting configuration effects in transportable form.

Three buckets matter:

1. Native Codex tools

Examples:

shell
apply patch
filesystem-like behavior
skills/apps tooling

These should stay implemented in Rust.

2. Declarative client-added tools

These are already conceptually close to dynamic tools.

3. Host policy wrappers

The host may still want to:

require approval
deny certain tools
redirect certain actions
attach metadata

This should be policy/config, not alternative execution logic.

Rollout ownership

This deserves explicit treatment.

If the host is the source of truth, then app-server should not quietly persist a more authoritative local reality than the host sees.

Recommended behavior in client-owned rollout mode:

active turn state exists in memory inside Codex
the host receives all canonical turn/item notifications
the host persists them
resume requires the host to resupply prior normalized history
local persistence, if any, is cache-only and discardable

That keeps recovery honest.

Failure semantics

Full delegation mode needs explicit failure boundaries.

Host-side failures

Examples:

Responses auth failure
network failure
stream disconnect
host policy rejection

These should arrive back in Codex as delegated request failures and surface through normal turn failure notifications.

Codex-side failures

Examples:

malformed upstream event
incompatible tool result
internal turn-state fault

These should surface as Codex errors to the host.

Split-brain prevention

At most one outstanding delegated model request should be active per active turn segment unless Codex explicitly supports multiplexing. Start single-flight.

That constraint is worth being conservative about.

It is different from "only one tool runs at a time" or "only one turn exists at a time." The point is narrower: for a given live turn segment, there should be one authoritative upstream model stream that Codex is currently interpreting. If the host allowed two overlapping delegated Responses streams to feed the same turn state, Codex would need a much more complicated merge model for deltas, tool calls, and terminal events. Starting with single-flight keeps turn state deterministic.

Security and trust boundaries

This design is stronger than today's Python-tool model in one important way: the canonical executor of shell and file actions moves into the same Rust runtime that already knows Codex's approval and event semantics.

That said, the host now becomes highly privileged because it owns:

auth
transcript persistence
upstream transport
approval decisions

That is acceptable, because Universal Computer already lives at that privilege level.

Migration path

Phase 1

add fullDelegation execution mode
add model/request
add model/streamEvent
add model/streamAborted
add model/cancel
add client-owned rollout mode with startup rehydrate

This is enough for a single-container Universal Computer integration.

Phase 2

add explicit tool-set override modes
harden resume/fork semantics for externally persisted history
support more complete correlation and retry rules

Phase 3

add host-visible delegation/subagent interception
route subagents to alternate containers/backends

Open questions

What is the canonical wire format for delegated model requests? My recommendation: a Codex-defined envelope that is close to Responses payloads, but explicitly versioned and correlation-safe.
Should the host stream raw Responses events or normalized Codex events back? Raw Responses events. Normalization should remain inside Codex.
Should local persistence be disabled entirely in client-owned rollout mode? Prefer "non-authoritative cache" over "disabled," but correctness must not depend on it.
Should tool overrides be merged or replaced? There is a real difference:
- merged means "start with Codex defaults, then add or selectively override entries"
- replaced means "the client supplies the entire tool surface and Codex defaults are not implicitly present" Support both, explicitly. Implicit merge will become a policy trap.
How much of plugin behavior should be representable over protocol? Only the effects, not the Python object model.

Changes required in Universal Computer

The Codex-side protocol changes are only half of the story. To make this architecture real, Universal Computer also needs to grow a host-side integration layer that treats Codex app-server as a remote execution runtime rather than treating the Responses API as the only runtime boundary.

At a high level, Universal Computer should stop being responsible for implementing the default Codex tool surface in Python and instead become responsible for:

provisioning a compatible Codex binary
starting and supervising app-server
relaying delegated model traffic to the selected provider
persisting rollout state as the canonical host-side record
exposing SDK ergonomics for tool configuration, approvals, and delegation routing

1. Pin and provision a Codex version

Universal Computer will need an explicit notion of the Codex runtime version it expects to launch.

That likely means:

adding a pinned Codex version field to the agent or runtime configuration
defining how that resolves to a concrete binary artifact for the current host platform
making the app-server protocol version part of compatibility checks

This should be treated as a first-class runtime dependency, not an incidental local executable lookup. If the host and container disagree about protocol shape, delegation mode will fail in confusing ways, so version pinning should be deliberate.

Recommended direction:

Universal Computer pins a Codex release or build identifier explicitly
the host resolves and caches that artifact
the runtime startup path verifies the binary version before starting app-server

2. Reuse existing backends to place the Codex binary in the destination environment

Universal Computer already knows how to create and resume execution environments. It should reuse that backend abstraction for Codex provisioning rather than inventing a separate deployment system.

Concretely, the current backend model is already a good fit for binary staging:

BaseSandboxClient creates and resumes sessions
BaseSandboxSession exposes write, read, exec, and workspace materialization
manifest entries such as LocalFile already support copying a host file into the workspace and applying permissions via chmod

So the binary-placement story does not need a brand-new distribution mechanism. Universal Computer can either:

stage the pinned Codex binary as a manifest artifact with executable permissions, or
push it into the workspace during session startup with session.write(...) followed by chmod

The first option is especially attractive because it fits the existing manifest/snapshot model and keeps provisioning declarative.

At a high level, each backend would need to support:

ensuring the Codex binary is present in the target environment
placing any required companion assets if Codex needs them
starting codex app-server with the right arguments
returning a live transport handle back to the host SDK

For local execution, this step can degenerate into "use a local binary and skip copy." For remote or containerized execution, this becomes an explicit staging step.

The important design point is that backend-specific logic stays confined to:

binary placement
process startup
transport attachment
snapshot and manifest lifecycle

and not tool execution.

One nuance from the codebase: backend reuse is straightforward for file placement, but not yet for long-lived supervised process attachment. Universal Computer's shared session API supports one-shot exec everywhere, while PTY-style attached process interaction exists only on some backends. If Codex app-server is going to be launched as a long-running child process, Universal Computer will likely need one additional backend-neutral capability for "start a process and keep a live byte stream attached," rather than trying to shoehorn everything through one-shot exec.

3. Replace Python implementations of the default tool surface with symbolic tool references

Universal Computer can likely delete or de-emphasize the Python implementations of the default filesystem and shell tool behavior once Codex is the executor.

The code today makes this fairly concrete: the built-in tool surface is assembled from Python plugins like Filesystem, Shell, ApplyPatch, and Compaction. The first three are thin wrappers that bind to a SandboxSession, expose tool schemas, and add instruction fragments; they are not deep subsystems in their own right.

But the SDK still needs a way to express tool policy and shape the tool surface. So instead of Python tool implementations being the source of truth, they should become declarative references, for example:

enable Codex shell
disable Codex apply-patch
use the default Codex tool set
replace the default tool set with a minimal subset

In other words, the Python layer should continue to speak in terms of tool identities and policy, but not carry the execution logic for the built-in tools.

This is important for UX. SDK users still want to write things like:

"enable shell but not apply patch"
"disable filesystem writes"
"use only custom tools"

Those should remain easy, but they should compile down to app-server configuration rather than selecting Python classes that implement the behavior directly.

The one built-in plugin that does not fit the "just replace it with a Codex tool" bucket is compaction. In Universal Computer today, compaction is expressed as sampling-parameter and context-processing behavior rather than as a shell/filesystem tool. So the migration should separate:

built-in execution tools that move to Codex
host-side request shaping policies, like compaction thresholds, that may still belong in the SDK and need to be forwarded into delegated model requests

4. Add a dedicated app-server package or module

Universal Computer should grow a dedicated host-side app-server integration package rather than smearing the logic across the existing agent runtime.

Conceptually, that package would own:

app-server process lifecycle
connection management
protocol type definitions
delegated model request handling
approval request handling
delegated subagent handling
rollout event capture and persistence hooks

A clean package boundary here matters because this integration is not just "another tool." It is a new runtime substrate.

A useful mental split would be:

core Universal Computer agent model
backend/session abstractions
provider adapters
app-server bridge

That keeps the Codex-specific transport logic from leaking into unrelated parts of the SDK.

5. Support the new delegated app-server events

Universal Computer will need host-side handlers for the new protocol surface proposed above.

At minimum, that means understanding and responding to:

model/request
model/streamEvent
model/streamAborted
model/cancel
delegation/request
delegation/result
existing approval requests

In practice, the host runtime loop changes from:

call responses.create(...)
stream raw events
inspect pending tool calls

to:

wait for model/request from Codex
execute that request against the selected provider
feed raw upstream events back with model/streamEvent
honor model/cancel and approval flows
optionally route delegation/request to a different container or backend

That is a meaningful runtime refactor, but it is conceptually clean: Universal Computer becomes an orchestrator around Codex rather than a reimplementation of Codex behavior.

6. Add a host-side multi-provider abstraction

Today Universal Computer is structurally very OpenAI-shaped because the runtime path is built around the Responses API client. In delegated mode, that logic becomes even more central, so it should be abstracted intentionally.

The current code is explicit about this: Task stores an openai.AsyncClient and its default producer literally calls client.responses.create(...). So multi-provider support is not a small configuration tweak; it is a real runtime abstraction change.

The host needs a provider abstraction capable of:

taking a Codex-authored delegated model request
translating it to the selected upstream provider call shape
streaming provider events back into the common app-server event format
surfacing provider-specific failures in a normalized way

For OpenAI-backed flows, that can stay close to raw Responses semantics.

For Anthropic or other providers, the host may need an adapter layer that maps:

request fields
tool-calling events
reasoning/summary controls where supported
terminal and error events

back into the event shape Codex expects.

This is precisely why the translation boundary should live on the host, not in the container. Provider choice is a host concern.

Recommended direction:

define a ModelProvider or similarly named host-side interface
keep OpenAI as the reference implementation
add provider capability metadata so unsupported delegated-request features can fail clearly rather than degrade silently

There is already a hint of the right design elsewhere in Universal Computer: the memory subsystem defines normalized result schemas specifically so the rest of the system does not need to understand provider-specific formats. The delegated app-server bridge should follow the same principle for streamed model events.

7. Add host-side rollout persistence built around Codex item history

If the host is now the source of truth, Universal Computer should persist the Codex-facing event history it receives from app-server, not just the raw upstream Responses interaction.

That likely means persisting:

thread identity
turns
normalized items
approval decisions
delegation edges between parent and child agents
provider and runtime metadata

This persistence layer should support:

resume into the same container
resume into a fresh container with rehydrated history
cross-backend continuation when the SDK chooses to re-home the work

8. Transport recommendation: prefer stdio over a reliable byte stream bridge

For the host-to-container app-server transport, the safest recommendation is:

first choice: stdio over an attached process handle
second choice: a reliable byte-stream tunnel such as SSH or a backend-managed TCP stream

Why:

app-server traffic is ordered, stateful, and request-response oriented
JSON-RPC + streaming notifications want reliable delivery and backpressure

stdio is still the right target transport because Codex app-server already supports it as the primary mode. But after a deeper look at Universal Computer, there is an important implementation detail: the current shared session abstraction does not yet provide a backend-neutral "launch a long-lived child process and keep stdin/stdout attached" API. It provides:

one-shot exec everywhere
optional PTY process support on some backends such as local Unix and Modal
no equivalent attached-process primitive on Docker today

So the recommendation should be more precise:

standardize on app-server stdio as the protocol transport
add a new backend-neutral attached-process capability to Universal Computer for long-lived bridge processes
make that capability part of the expected contract for all supported backends, instead of treating it as an optimization for only a few environments
implement that capability per backend, instead of introducing a separate network protocol just to compensate for the missing primitive

If Universal Computer can directly attach to the launched process, stdio is ideal because:

it matches app-server's primary supported transport
it avoids inventing network semantics
it inherits process lifecycle naturally
it is easy to secure because nothing is exposed on a network port

For Docker specifically, that likely means adding a backend implementation that can launch Codex as an attached process rather than relying only on detached one-shot execs. For example, the backend could use an attached docker exec session or make Codex the supervised long-lived process inside the container and bridge its stdio back to the host.

If a direct process attachment is impossible because of the backend, the next best choice is a reliable stream transport tunneled over something the backend already trusts:

SSH port forwarding or command execution with pipes
a backend-provided TCP tunnel

I would not recommend treating app-server websocket as the default fallback here, because Codex app-server currently describes websocket transport as experimental and unsupported. If a backend absolutely forces a bridged network transport, prefer a reliable stream that still carries stdio-like semantics over inventing a new public network surface.

Recommendation:

standardize on stdio as the canonical transport
add a UC session-level attached-process abstraction to make stdio practical across backends
require all supported backends to implement an attached-process bridge capable of launching and supervising app-server with a live byte stream
use SSH or another reliable stream tunnel only when direct attachment is impossible
treat websocket support as an implementation detail of last resort, not the preferred contract

This keeps the transport boring, which is exactly what you want for the control plane of a remote agent runtime.

9. Suggested Universal Computer rollout plan

A pragmatic order of operations would be:

add a Codex runtime abstraction with version pinning and binary provisioning
add an app-server bridge package with stdio-based transport
implement OpenAI delegated model handling end to end
persist Codex-facing history host-side and support resume
replace Python built-in tool execution with declarative tool enablement
add subagent interception and routing
add additional provider adapters such as Anthropic

That sequence gets a single-container OpenAI-backed flow working early while leaving room for multi-provider and multi-container sophistication later.

Recommendation

Build full delegation mode as an app-server-level capability, not as a Universal Computer-specific shim.

The winning shape is:

remote Codex prepares
host transmits
remote Codex interprets
host persists

That preserves the best properties of both systems:

Universal Computer keeps its orchestration superpower
Codex becomes the reusable execution engine and tool runtime you actually want to standardize on

36 KiB Raw Blame History

RFC: Host-Delegated Codex App-Server for Universal Computer

Summary

Context

What Universal Computer does well today

What Codex app-server already provides

Design principle

Host-owned

Codex-owned

Goals

Non-goals

Proposed model: Full Delegation Mode

Why this fits Universal Computer

Protocol additions

New concepts

1. Delegated model request

2. Delegated model stream injection

3. Terminal stream semantics

4. Delegated model cancellation

5. External rollout mode

6. External history hydrate

Proposed event surface

Server -> client

Client -> server

Mermaid: end-to-end flow

Mermaid: state ownership

Behavioral changes required inside Codex

1. Separate "prepare request" from "send request"

2. Accept externally sourced Responses events as first-class input

3. Make thread persistence optional, not authoritative

4. Make tool registry fully session-configurable

5. Preserve current approval semantics across all tools

6. Support host-intercepted delegation as a future sibling of model delegation

Configuration model

Tooling model

1. Native Codex tools

2. Declarative client-added tools

3. Host policy wrappers

Rollout ownership

Failure semantics

Host-side failures

Codex-side failures

Split-brain prevention

Security and trust boundaries

Migration path

Phase 1

Phase 2

Phase 3

Open questions

Changes required in Universal Computer

1. Pin and provision a Codex version

2. Reuse existing backends to place the Codex binary in the destination environment

3. Replace Python implementations of the default tool surface with symbolic tool references

4. Add a dedicated app-server package or module

5. Support the new delegated app-server events

6. Add a host-side multi-provider abstraction

7. Add host-side rollout persistence built around Codex item history

8. Transport recommendation: prefer stdio over a reliable byte stream bridge

9. Suggested Universal Computer rollout plan

Recommendation

36 KiB

Raw Blame History