Commit Graph

34 Commits

Author SHA1 Message Date
pakrym-oai
61142b6169 Remove ToolName display helper (#21465)
## Why

`ToolName::display()` made it too easy to flatten tool identity and
accidentally compare rendered strings. Tool identity should stay
structural until a legacy string boundary actually requires the
flattened spelling.

## What

- Removes `ToolName::display()` and relies on the existing `Display`
impl for messages and errors.
- Adds structural ordering for `ToolName` and uses it for
sorting/deduping deferred tools.
- Carries `ToolName` through tool/sandbox plumbing, flattening only at
legacy boundaries such as hook payloads, telemetry tags, and Responses
tool names.
- Updates MCP normalization tests to assert `ToolName` structure instead
of rendered strings.

## Testing

- `cargo test -p codex-mcp test_normalize_tools`
- `cargo test -p codex-core unavailable_tool`
- `just fix -p codex-protocol`
- `just fix -p codex-mcp`
- `just fix -p codex-core`
2026-05-08 12:17:48 -07:00
Andrei Eternal
eed0e07825 hooks: emit Bash PostToolUse when exec_command completes via write_stdin (#18888)
Fixes #16246.

## Why

`exec_command` already emits `PreToolUse`, but long-running unified exec
commands that finish on a later `write_stdin` poll could miss the
matching `PostToolUse`. That left the Bash hook lifecycle inconsistent,
broke expectations around `tool_use_id` and `tool_input.command`, and
meant `PostToolUse` block/replacement feedback could fail to replace the
final session output before it reached model context.

This keeps the fix scoped to the `exec_command` / `write_stdin`
lifecycle. Broader non-Bash hook expansion is still out of scope here
and remains tracked separately in #16732.

## What changed

- Compute and store `PostToolUsePayload` while handlers still have
access to their concrete output type, and carry `tool_use_id` through
that payload.
- Preserve the original hook-facing `exec_command` string through
unified exec state (`ExecCommandRequest`, `ProcessEntry`,
`PreparedProcessHandles`, and `ExecCommandToolOutput`) via
`hook_command`, and remove the now-unused `session_command` output
metadata.
- Emit exactly one Bash `PostToolUse` for long-running `exec_command`
sessions when a later `write_stdin` poll observes final completion,
using the original `exec_command` call id and hook-facing command.
- Keep one-shot `exec_command` behavior aligned with the same payload
construction, including interactive completions that return a final
result directly.
- Apply `PostToolUse` block/replacement feedback before the final
`write_stdin` completion output is sent back to the model.
- Keep `write_stdin` itself out of `PreToolUse` matching so it continues
to act as transport/polling for the original Bash tool call.
- Restore plain matcher behavior for tool-name matchers such as `Bash`
and `Edit|Write`, while still treating patterns with regex characters
(for example `mcp__.*`) as regexes.
- Add unit coverage for unified exec payload construction and parallel
session separation, plus a core integration regression that verifies a
blocked `PostToolUse` replaces the final `write_stdin` output in model
context.

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-core post_tool_use_payload`
- `cargo test -p codex-core
post_tool_use_blocks_when_exec_session_completes_via_write_stdin`
2026-04-22 17:14:22 -07:00
Dylan Hurd
86535c9901 feat(auto-review) Handle request_permissions calls (#18393)
## Summary
When auto-review is enabled, it should handle request_permissions tool.
We'll need to clean up the UX but I'm planning to do that in a separate
pass

## Testing
- [x] Ran locally
<img width="893" height="396" alt="Screenshot 2026-04-17 at 1 16 13 PM"
src="https://github.com/user-attachments/assets/4c045c5f-1138-4c6c-ac6e-2cb6be4514d8"
/>

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-20 21:48:57 -07:00
pakrym-oai
71e4c6fa17 Move codex module under session (#18249)
## Summary
- rename the core codex module root to session/mod.rs without using
#[path]
- move the codex module directory and tests under core/src/session
- remove session/mod.rs reexports so call sites use explicit child
module paths

## Testing
- cargo test -p codex-core --lib
- cargo check -p codex-core --tests
- just fmt
- just fix -p codex-core
- git diff --check
2026-04-17 16:18:53 +00:00
Akshay Nathan
7995c66032 Stream apply_patch changes (#17862)
Adds new events for streaming apply_patch changes from responses api.
This is to enable clients to show progress during file writes.

Caveat: This does not work with apply_patch in function call mode, since
that required adding streaming json parsing.
2026-04-16 18:12:19 -07:00
sayan-oai
0df7e9a820 register all mcp tools with namespace (#17404)
stacked on #17402.

MCP tools returned by `tool_search` (deferred tools) get registered in
our `ToolRegistry` with a different format than directly available
tools. this leads to two different ways of accessing MCP tools from our
tool catalog, only one of which works for each. fix this by registering
all MCP tools with the namespace format, since this info is already
available.

also, direct MCP tools are registered to responsesapi without a
namespace, while deferred MCP tools have a namespace. this means we can
receive MCP `FunctionCall`s in both formats from namespaces. fix this by
always registering MCP tools with namespace, regardless of deferral
status.

make code mode track `ToolName` provenance of tools so it can map the
literal JS function name string to the correct `ToolName` for
invocation, rather than supporting both in core.

this lets us unify to a single canonical `ToolName` representation for
each MCP tool and force everywhere to use that one, without supporting
fallbacks.
2026-04-15 21:02:59 +08:00
josiah-openai
937dd3812d Add supports_parallel_tool_calls flag to included mcps (#17667)
## Why

For more advanced MCP usage, we want the model to be able to emit
parallel MCP tool calls and have Codex execute eligible ones
concurrently, instead of forcing all MCP calls through the serial block.

The main design choice was where to thread the config. I made this
server-level because parallel safety depends on the MCP server
implementation. Codex reads the flag from `mcp_servers`, threads the
opted-in server names into `ToolRouter`, and checks the parsed
`ToolPayload::Mcp { server, .. }` at execution time. That avoids relying
on model-visible tool names, which can be incomplete in
deferred/search-tool paths or ambiguous for similarly named
servers/tools.

## What was added

Added `supports_parallel_tool_calls` for MCP servers.

Before:

```toml
[mcp_servers.docs]
command = "docs-server"
```

After:

```toml
[mcp_servers.docs]
command = "docs-server"
supports_parallel_tool_calls = true
```

MCP calls remain serial by default. Only tools from opted-in servers are
eligible to run in parallel. Docs also now warn to enable this only when
the server’s tools are safe to run concurrently, especially around
shared state or read/write races.

## Testing

Tested with a local stdio MCP server exposing real delay tools. The
model/Responses side was mocked only to deterministically emit two MCP
calls in the same turn.

Each test called `query_with_delay` and `query_with_delay_2` with `{
"seconds": 25 }`.

| Build/config | Observed | Wall time |
| --- | --- | --- |
| main with flag enabled | serial | `58.79s` |
| PR with flag enabled | parallel | `31.73s` |
| PR without flag | serial | `56.70s` |

PR with flag enabled showed both tools start before either completed;
main and PR-without-flag completed the first delay before starting the
second.

Also added an integration test.

Additional checks:

- `cargo test -p codex-tools` passed
- `cargo test -p codex-core
mcp_parallel_support_uses_exact_payload_server` passed
- `git diff --check` passed
2026-04-13 15:16:34 -07:00
sayan-oai
1325bcd3f6 chore: refactor name and namespace to single type (#17402)
avoid passing them both around, unify on a type. this now also keys
`ToolRegistry`.

tests pass
2026-04-11 23:06:22 +00:00
Ahmed Ibrahim
af8a9d2d2b remove temporary ownership re-exports (#16626)
Stacked on #16508.

This removes the temporary `codex-core` / `codex-login` re-export shims
from the ownership split and rewrites callsites to import directly from
`codex-model-provider-info`, `codex-models-manager`, `codex-api`,
`codex-protocol`, `codex-feedback`, and `codex-response-debug-context`.

No behavior change intended; this is the mechanical import cleanup layer
split out from the ownership move.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-03 00:33:34 -07:00
Michael Bolin
d4464125c5 Remove client_common tool re-exports (#16482)
## Why

`codex-rs/core/src/client_common.rs` still had a `tools` re-export
module that forwarded `codex_tools` types back into `codex-core`. After
the earlier extraction work in #16379, #16471, #16477, and #16481, that
extra layer no longer adds value.

Removing it keeps dependencies explicit: the `codex-core` modules that
actually use `ToolSpec` and related types now depend on `codex_tools`
directly instead of reaching through `client_common`.

## What Changed

- removed the `client_common::tools` re-export module from
`core/src/client_common.rs`
- updated the remaining `codex-core` consumers to import `codex_tools`
directly
- adjusted the affected test code to reference
`codex_tools::ResponsesApiTool` directly as well

This is a mechanical cleanup only. It does not change tool behavior or
runtime logic.

## Testing

- `cargo test -p codex-core client_common::tests`
- `cargo test -p codex-core tools::router::tests`
- `cargo test -p codex-core tools::context::tests`
- `cargo test -p codex-core tools::spec::tests`
2026-04-01 19:15:15 -07:00
pakrym-oai
88e5382fc4 Propagate tool errors to code mode (#15075)
Clean up error flow to push the FunctionCallError all the way up to
dispatcher and allow code mode to surface as exception.
2026-03-18 13:57:55 -07:00
pakrym-oai
a2546d5dff Expose code-mode tools through globals (#14517)
Summary
- make all code-mode tools accessible as globals so callers only need
`tools.<name>`
- rename text/image helpers and key globals (store, load, ALL_TOOLS,
etc.) to reflect the new shared namespace
- update the JS bridge, runners, descriptions, router, and tests to
follow the new API

Testing
- Not run (not requested)
2026-03-12 15:43:59 -07:00
pakrym-oai
09ba6b47ae Reuse tool runtime for code mode worker (#14496)
## Summary
- create the turn-scoped `ToolCallRuntime` before starting the code mode
worker so the worker reuses the same runtime and router
- thread the shared runtime through the code mode service/worker path
and use it for nested tool calls
- model aborted tool calls as a concrete `ToolOutput` so aborted
responses still produce valid tool output shapes

## Testing
- `just fmt`
- `cargo test -p codex-core` (still running locally)
2026-03-12 12:48:32 -07:00
Anton Panasenko
77b0c75267 feat: search_tool migrate to bring you own tool of Responses API (#14274)
## Why

to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.

## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
2026-03-11 17:51:51 -07:00
pakrym-oai
ee8f84153e Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes

Testing
- Not run (not requested)
2026-03-11 12:33:08 -07:00
pakrym-oai
c4d35084f5 Reuse McpToolOutput in McpHandler (#14229)
We already have a type to represent the MCP tool output, reuse it
instead of the custom McpHandlerOutput
2026-03-11 12:33:07 -07:00
Owen Lin
c3736cff0a feat(otel): safe tracing (#13626)
### Motivation
Today config.toml has three different OTEL knobs under `[otel]`:
- `exporter` controls where OTEL logs go
- `trace_exporter` controls where OTEL traces go
- `metrics_exporter` controls where metrics go

Those often (pretty much always?) serve different purposes.

For example, for OpenAI internal usage, the **log exporter** is already
being used for IT/security telemetry, and that use case is intentionally
content-rich: tool calls, arguments, outputs, MCP payloads, and in some
cases user content are all useful there. `log_user_prompt` is a good
example of that distinction. When it’s enabled, we include raw prompt
text in OTEL logs, which is acceptable for the security use case.

The **trace exporter** is a different story. The goal there is to give
OpenAI engineers visibility into latency and request behavior when they
run Codex locally, without sending sensitive prompt or tool data as
trace event data. In other words, traces should help answer “what was
slow?” or “where did time go?”, not “what did the user say?” or “what
did the tool return?”

The complication is that Rust’s `tracing` crate does not make a hard
distinction between “logs” and “trace events.” It gives us one
instrumentation API for logs and trace events (via `tracing::event!`),
and subscribers decide what gets treated as logs, trace events, or both.

Before this change, our OTEL trace layer was effectively attached to the
general tracing stream, which meant turning on `trace_exporter` could
pick up content-rich events that were originally written with logging
(and the `log_exporter`) in mind. That made it too easy for sensitive
data to end up in exported traces by accident.

### Concrete example
In `otel_manager.rs`, this `tracing::event!` call would be exported in
both logs AND traces (as a trace event).
```
    pub fn user_prompt(&self, items: &[UserInput]) {
        let prompt = items
            .iter()
            .flat_map(|item| match item {
                UserInput::Text { text, .. } => Some(text.as_str()),
                _ => None,
            })
            .collect::<String>();

        let prompt_to_log = if self.metadata.log_user_prompts {
            prompt.as_str()
        } else {
            "[REDACTED]"
        };

        tracing::event!(
            tracing::Level::INFO,
            event.name = "codex.user_prompt",
            event.timestamp = %timestamp(),
            // ...
            prompt = %prompt_to_log,
        );
    }
```

Instead of `tracing::event!`, we should now be using `log_event!` and
`trace_event!` instead to more clearly indicate which sink (logs vs.
traces) that event should be exported to.

### What changed
This PR makes the log and trace export distinct instead of treating them
as two sinks for the same data.

On the provider side, OTEL logs and traces now have separate
routing/filtering policy. The log exporter keeps receiving the existing
`codex_otel` events, while trace export is limited to spans and trace
events.

On the event side, `OtelManager` now emits two flavors of telemetry
where needed:
- a log-only event with the current rich payloads
- a tracing-safe event with summaries only

It also has a convenience `log_and_trace_event!` macro for emitting to
both logs and traces when it's safe to do so, as well as log- and
trace-specific fields.

That means prompts, tool args, tool output, account email, MCP metadata,
and similar content stay in the log lane, while traces get the pieces
that are actually useful for performance work: durations, counts, sizes,
status, token counts, tool origin, and normalized error classes.

This preserves current IT/security logging behavior while making it safe
to turn on trace export for employees.

### Full list of things removed from trace export
- raw user prompt text from `codex.user_prompt`
- raw tool arguments and output from `codex.tool_result`
- MCP server metadata from `codex.tool_result` (mcp_server,
mcp_server_origin)
- account identity fields like `user.email` and `user.account_id` from
trace-safe OTEL events
- `host.name` from trace resources
- generic `codex.tool_decision` events from traces
- generic `codex.sse_event` events from traces
- the full ToolCall debug payload from the `handle_tool_call` span

What traces now keep instead is mostly:
- spans
- trace-safe OTEL events
- counts, lengths, durations, status, token counts, and tool origin
summaries
2026-03-05 16:30:53 -08:00
Curtis 'Fjord' Hawthorne
7e980d7db6 Support multimodal custom tool outputs (#12948)
## Summary

This changes `custom_tool_call_output` to use the same output payload
shape as `function_call_output`, so freeform tools can return either
plain text or structured content items.

The main goal is to let `js_repl` return image content from nested
`view_image` calls in its own `custom_tool_call_output`, instead of
relying on a separate injected message.

## What changed

- Changed `custom_tool_call_output.output` from `string` to
`FunctionCallOutputPayload`
- Updated freeform tool plumbing to preserve structured output bodies
- Updated `js_repl` to aggregate nested tool content items and attach
them to the outer `js_repl` result
- Removed the old `js_repl` special case that injected `view_image`
results as a separate pending user image message
- Updated normalization/history/truncation paths to handle multimodal
`custom_tool_call_output`
- Regenerated app-server protocol schema artifacts

## Behavior

Direct `view_image` calls still return a `function_call_output` with
image content.

When `view_image` is called inside `js_repl`, the outer `js_repl`
`custom_tool_call_output` now carries:
- an `input_text` item if the JS produced text output
- one or more `input_image` items from nested tool results

So the nested image result now stays inside the `js_repl` tool output
instead of being injected as a separate message.

## Compatibility

This is intended to be backward-compatible for resumed conversations.

Older histories that stored `custom_tool_call_output.output` as a plain
string still deserialize correctly, and older histories that used the
previous injected-image-message flow also continue to resume.

Added regression coverage for resuming a pre-change rollout containing:
- string-valued `custom_tool_call_output`
- legacy injected image message history


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12948
2026-02-26 18:17:46 -08:00
Curtis 'Fjord' Hawthorne
0dcfc59171 Add js_repl_tools_only model and routing restrictions (#10671)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.


#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/10674
-  `2` https://github.com/openai/codex/pull/10672
- 👉 `3` https://github.com/openai/codex/pull/10671
-  `4` https://github.com/openai/codex/pull/10673
-  `5` https://github.com/openai/codex/pull/10670
2026-02-12 15:41:05 -08:00
Owen Lin
5ea107a088 feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567)
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397

Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image

So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.

## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).

Old shape:
`{ "output": "dynamic-ok", "success": true }`

New shape:
```
{
    "contentItems": [
      { "type": "inputText", "text": "dynamic-ok" },
      { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
    ]
  "success": true
}
```
2026-02-04 16:12:47 -08:00
jif-oai
d7482510b1 nit: trace span for regular task (#8053)
Logs are too spammy

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-12-16 16:53:15 +00:00
Ahmed Ibrahim
d802b18716 fix parallel tool calls (#7956) 2025-12-16 01:28:27 +00:00
Anton Panasenko
ad7b9d63c3 [codex] add otel tracing (#7844) 2025-12-12 17:07:17 -08:00
Ahmed Ibrahim
6e6338aa87 Inline response recording and remove process_items indirection (#7310)
- Inline response recording during streaming: `run_turn` now records
items as they arrive instead of building a `ProcessedResponseItem` list
and post‑processing via `process_items`.
- Simplify turn handling: `handle_output_item_done` returns the
follow‑up signal + optional tool future; `needs_follow_up` is set only
there, and in‑flight tool futures are drained once at the end (errors
logged, no extra state writes).
- Flattened stream loop: removed `process_items` indirection and the
extra output queue
- - Tests: relaxed `tool_parallelism::tool_results_grouped` to allow any
completion order while still requiring matching call/output IDs.
2025-12-04 12:17:54 -08:00
Dylan Hurd
497fb4a19c fix(core) serialize shell_command (#6744)
## Summary
Ensures we're serializing calls to `shell_command`

## Testing
- [x] Added unit test
2025-11-16 23:16:51 -08:00
jif-oai
e00eb50db3 feat: only wait for mutating tools for ghost commit (#6534) 2025-11-12 18:16:32 +00:00
jif-oai
ad279eacdc nit: logs to trace (#6503) 2025-11-11 13:37:06 +00:00
Ahmed Ibrahim
6ee7fbcfff feat: add the time after aborting (#5996)
Tell the model how much time passed after the user aborted the call.
2025-11-03 11:44:06 -08:00
Gabriel Peal
b0bdc04c30 [MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.

This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.


This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>

After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)  

Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.

Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.

I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 17:55:57 -04:00
jif-oai
e92c4f6561 feat: async ghost commit (#5618) 2025-10-27 10:09:10 +00:00
Ahmed Ibrahim
f59978ed3d Handle cancelling/aborting while processing a turn (#5543)
Currently we collect all all turn items in a vector, then we add it to
the history on success. This result in losing those items on errors
including aborting `ctrl+c`.

This PR:
- Adds the ability for the tool call to handle cancellation
- bubble the turn items up to where we are recording this info

Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of
error edge cases. The right thing to do is recording to the history on
the spot as `items`/`tool calls output` come. However, this isn't
possible because of having different `task_kind` that has different
`conversation_histories`. The `try_run_turn` has no idea what thread are
we using. We cannot also pass an `arc` to the `conversation_histories`
because it's a private element of `state`.

That's said, `abort` is the most common case and we should cover it
until we remove `task kind`
2025-10-23 08:47:10 -07:00
pakrym-oai
789e65b9d2 Pass TurnContext around instead of sub_id (#5421)
Today `sub_id` is an ID of a single incoming Codex Op submition. We then
associate all events triggered by this operation using the same
`sub_id`.

At the same time we are also creating a TurnContext per submission and
we'd like to start associating some events (item added/item completed)
with an entire turn instead of just the operation that started it.

Using turn context when sending events give us flexibility to change
notification scheme.
2025-10-21 08:04:16 -07:00
pakrym-oai
f2555422b9 Simplify parallel (#4829)
make tool processing return a future and then collect futures.
handle cleanup on Drop
2025-10-07 10:12:38 -07:00
jif-oai
dc3c6bf62a feat: parallel tool calls (#4663)
Add parallel tool calls. This is configurable at model level and tool
level
2025-10-05 16:10:49 +00:00