Support multimodal custom tool outputs (#12948)

## Summary

This changes `custom_tool_call_output` to use the same output payload
shape as `function_call_output`, so freeform tools can return either
plain text or structured content items.

The main goal is to let `js_repl` return image content from nested
`view_image` calls in its own `custom_tool_call_output`, instead of
relying on a separate injected message.

## What changed

- Changed `custom_tool_call_output.output` from `string` to
`FunctionCallOutputPayload`
- Updated freeform tool plumbing to preserve structured output bodies
- Updated `js_repl` to aggregate nested tool content items and attach
them to the outer `js_repl` result
- Removed the old `js_repl` special case that injected `view_image`
results as a separate pending user image message
- Updated normalization/history/truncation paths to handle multimodal
`custom_tool_call_output`
- Regenerated app-server protocol schema artifacts

## Behavior

Direct `view_image` calls still return a `function_call_output` with
image content.

When `view_image` is called inside `js_repl`, the outer `js_repl`
`custom_tool_call_output` now carries:
- an `input_text` item if the JS produced text output
- one or more `input_image` items from nested tool results

So the nested image result now stays inside the `js_repl` tool output
instead of being injected as a separate message.

## Compatibility

This is intended to be backward-compatible for resumed conversations.

Older histories that stored `custom_tool_call_output.output` as a plain
string still deserialize correctly, and older histories that used the
previous injected-image-message flow also continue to resume.

Added regression coverage for resuming a pre-change rollout containing:
- string-valued `custom_tool_call_output`
- legacy injected image message history


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12948
This commit is contained in:
Curtis 'Fjord' Hawthorne
2026-02-26 18:17:46 -08:00
committed by GitHub
parent f90e97e414
commit 7e980d7db6
20 changed files with 688 additions and 177 deletions

View File

@@ -264,7 +264,7 @@ impl ResponsesRequest {
.cloned()
.unwrap_or(Value::Null);
match output {
Value::String(text) => Some((Some(text), None)),
Value::String(_) | Value::Array(_) => Some((output_value_to_text(&output), None)),
Value::Object(obj) => Some((
obj.get("content")
.and_then(Value::as_str)
@@ -296,6 +296,87 @@ impl ResponsesRequest {
}
}
pub(crate) fn output_value_to_text(value: &Value) -> Option<String> {
match value {
Value::String(text) => Some(text.clone()),
Value::Array(items) => match items.as_slice() {
[item] if item.get("type").and_then(Value::as_str) == Some("input_text") => {
item.get("text").and_then(Value::as_str).map(str::to_string)
}
[_] | [] | [_, _, ..] => None,
},
Value::Object(_) | Value::Number(_) | Value::Bool(_) | Value::Null => None,
}
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
use wiremock::http::HeaderMap;
use wiremock::http::Method;
fn request_with_input(input: Value) -> ResponsesRequest {
ResponsesRequest(wiremock::Request {
url: "http://localhost/v1/responses"
.parse()
.expect("valid request url"),
method: Method::POST,
headers: HeaderMap::new(),
body: serde_json::to_vec(&serde_json::json!({ "input": input }))
.expect("serialize request body"),
})
}
#[test]
fn call_output_content_and_success_returns_only_single_text_content_item() {
let single_text = request_with_input(serde_json::json!([
{
"type": "function_call_output",
"call_id": "call-1",
"output": [{ "type": "input_text", "text": "hello" }]
},
{
"type": "custom_tool_call_output",
"call_id": "call-2",
"output": [{ "type": "input_text", "text": "world" }]
}
]));
assert_eq!(
single_text.function_call_output_content_and_success("call-1"),
Some((Some("hello".to_string()), None))
);
assert_eq!(
single_text.custom_tool_call_output_content_and_success("call-2"),
Some((Some("world".to_string()), None))
);
let mixed_content = request_with_input(serde_json::json!([
{
"type": "function_call_output",
"call_id": "call-3",
"output": [
{ "type": "input_text", "text": "hello" },
{ "type": "input_image", "image_url": "data:image/png;base64,abc" }
]
},
{
"type": "custom_tool_call_output",
"call_id": "call-4",
"output": [{ "type": "input_image", "image_url": "data:image/png;base64,abc" }]
}
]));
assert_eq!(
mixed_content.function_call_output_content_and_success("call-3"),
Some((None, None))
);
assert_eq!(
mixed_content.custom_tool_call_output_content_and_success("call-4"),
Some((None, None))
);
}
}
#[derive(Debug, Clone)]
pub struct WebSocketRequest {
body: Value,