mirror of
https://github.com/openai/codex.git
synced 2026-04-28 00:25:56 +00:00
Support multimodal custom tool outputs (#12948)
## Summary This changes `custom_tool_call_output` to use the same output payload shape as `function_call_output`, so freeform tools can return either plain text or structured content items. The main goal is to let `js_repl` return image content from nested `view_image` calls in its own `custom_tool_call_output`, instead of relying on a separate injected message. ## What changed - Changed `custom_tool_call_output.output` from `string` to `FunctionCallOutputPayload` - Updated freeform tool plumbing to preserve structured output bodies - Updated `js_repl` to aggregate nested tool content items and attach them to the outer `js_repl` result - Removed the old `js_repl` special case that injected `view_image` results as a separate pending user image message - Updated normalization/history/truncation paths to handle multimodal `custom_tool_call_output` - Regenerated app-server protocol schema artifacts ## Behavior Direct `view_image` calls still return a `function_call_output` with image content. When `view_image` is called inside `js_repl`, the outer `js_repl` `custom_tool_call_output` now carries: - an `input_text` item if the JS produced text output - one or more `input_image` items from nested tool results So the nested image result now stays inside the `js_repl` tool output instead of being injected as a separate message. ## Compatibility This is intended to be backward-compatible for resumed conversations. Older histories that stored `custom_tool_call_output.output` as a plain string still deserialize correctly, and older histories that used the previous injected-image-message flow also continue to resume. Added regression coverage for resuming a pre-change rollout containing: - string-valued `custom_tool_call_output` - legacy injected image message history #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/12948
This commit is contained in:
committed by
GitHub
parent
f90e97e414
commit
7e980d7db6
@@ -161,7 +161,7 @@ pub enum ResponseInputItem {
|
||||
},
|
||||
CustomToolCallOutput {
|
||||
call_id: String,
|
||||
output: String,
|
||||
output: FunctionCallOutputPayload,
|
||||
},
|
||||
}
|
||||
|
||||
@@ -261,9 +261,12 @@ pub enum ResponseItem {
|
||||
name: String,
|
||||
input: String,
|
||||
},
|
||||
// `custom_tool_call_output.output` uses the same wire encoding as
|
||||
// `function_call_output.output` so freeform tools can return either plain
|
||||
// text or structured content items.
|
||||
CustomToolCallOutput {
|
||||
call_id: String,
|
||||
output: String,
|
||||
output: FunctionCallOutputPayload,
|
||||
},
|
||||
// Emitted by the Responses API when the agent triggers a web search.
|
||||
// Example payload (from SSE `response.output_item.done`):
|
||||
@@ -1538,6 +1541,26 @@ mod tests {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn serializes_custom_tool_image_outputs_as_array() -> Result<()> {
|
||||
let item = ResponseInputItem::CustomToolCallOutput {
|
||||
call_id: "call1".into(),
|
||||
output: FunctionCallOutputPayload::from_content_items(vec![
|
||||
FunctionCallOutputContentItem::InputImage {
|
||||
image_url: "data:image/png;base64,BASE64".into(),
|
||||
},
|
||||
]),
|
||||
};
|
||||
|
||||
let json = serde_json::to_string(&item)?;
|
||||
let v: serde_json::Value = serde_json::from_str(&json)?;
|
||||
|
||||
let output = v.get("output").expect("output field");
|
||||
assert!(output.is_array(), "expected array output");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn preserves_existing_image_data_urls() -> Result<()> {
|
||||
let call_tool_result = CallToolResult {
|
||||
|
||||
Reference in New Issue
Block a user