Add WebRTC transport to realtime start (#16960)

Adds WebRTC startup to the experimental app-server
`thread/realtime/start` method with an optional transport enum. The
websocket path remains the default; WebRTC offers create the realtime
session through the shared start flow and emit the answer SDP via
`thread/realtime/sdp`.

---------

Co-authored-by: Codex <noreply@openai.com>
This commit is contained in:
Ahmed Ibrahim
2026-04-07 15:43:38 -07:00
committed by GitHub
parent 6c36e7d688
commit fb3dcfde1d
42 changed files with 1574 additions and 85 deletions

View File

@@ -152,7 +152,7 @@ Example with notification opt-out:
- `turn/start` — add user input to a thread and begin Codex generation; responds with the initial `turn` object and streams `turn/started`, `item/*`, and `turn/completed` notifications. For `collaborationMode`, `settings.developer_instructions: null` means "use built-in instructions for the selected mode".
- `turn/steer` — add user input to an already in-flight regular turn without starting a new turn; returns the active `turnId` that accepted the input. Review and manual compaction turns reject `turn/steer`.
- `turn/interrupt` — request cancellation of an in-flight turn by `(thread_id, turn_id)`; success is an empty `{}` response and the turn finishes with `status: "interrupted"`.
- `thread/realtime/start` — start a thread-scoped realtime session (experimental); returns `{}` and streams `thread/realtime/*` notifications.
- `thread/realtime/start` — start a thread-scoped realtime session (experimental); returns `{}` and streams `thread/realtime/*` notifications. Omit `transport` for the websocket transport, or pass `{ "type": "webrtc", "sdp": "..." }` to create a WebRTC session from a browser-generated SDP offer; the remote answer SDP is emitted as `thread/realtime/sdp`.
- `thread/realtime/appendAudio` — append an input audio chunk to the active realtime session (experimental); returns `{}`.
- `thread/realtime/appendText` — append text input to the active realtime session (experimental); returns `{}`.
- `thread/realtime/stop` — stop the active realtime session for the thread (experimental); returns `{}`.
@@ -563,6 +563,51 @@ Invoke a plugin by including a UI mention token such as `@sample` in the text in
} } }
```
### Example: Start realtime with WebRTC
Use `thread/realtime/start` with `transport.type: "webrtc"` when a browser or webview owns the `RTCPeerConnection` and app-server should create the server-side realtime session. The transport `sdp` must be the offer SDP produced by `RTCPeerConnection.createOffer()`, not a hand-written or minimal SDP string.
The offer should include the media sections the client wants to negotiate. For the standard realtime UI flow, create the audio track/transceiver and the `oai-events` data channel before calling `createOffer()`:
```javascript
const pc = new RTCPeerConnection();
audioElement.autoplay = true;
pc.ontrack = (event) => {
audioElement.srcObject = event.streams[0];
};
const mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
pc.addTrack(mediaStream.getAudioTracks()[0], mediaStream);
pc.createDataChannel("oai-events");
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
```
Then send `offer.sdp` to app-server. Core uses `experimental_realtime_ws_backend_prompt` for the backend instructions and the thread conversation id for the realtime session id. The start response is `{}`; the remote answer SDP arrives later as `thread/realtime/sdp` and should be passed to `setRemoteDescription()`:
```json
{ "method": "thread/realtime/start", "id": 40, "params": {
"threadId": "thr_123",
"prompt": "You are on a call.",
"sessionId": null,
"transport": { "type": "webrtc", "sdp": "v=0\r\no=..." }
} }
{ "id": 40, "result": {} }
{ "method": "thread/realtime/sdp", "params": {
"threadId": "thr_123",
"sdp": "v=0\r\no=..."
} }
```
```javascript
await pc.setRemoteDescription({
type: "answer",
sdp: notification.params.sdp,
});
```
### Example: Interrupt an active turn
You can cancel a running Turn with `turn/interrupt`.