Commit Graph

2171 Commits

Author SHA1 Message Date
Ahmed Ibrahim
3663a470a1 spawn prompt (#14362)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-03-11 11:14:51 -07:00
pakrym-oai
7a6a729a97 Add ALL_TOOLS export to code mode (#14294)
So code mode can search for tools.
2026-03-11 10:59:54 -07:00
sayan-oai
907d755c94 chore: wire through plugin policies + category from marketplace.json (#14305)
wire plugin marketplace metadata through app-server endpoints:
- `plugin/list` has `installPolicy` and `authPolicy`
- `plugin/install` has plugin-level `authPolicy`

`plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
installing.


added tests.
2026-03-11 10:37:40 -07:00
Owen Lin
509f001b1f fix(otel): make HTTP trace export survive app-server runtimes (#14300)
## Summary

This PR fixes OTLP HTTP trace export in runtimes where the previous
exporter setup was unreliable, especially around app-server usage. It
also removes the old `codex_otel::otel_provider` compatibility shim and
switches remaining call sites over to the crate-root
`codex_otel::OtelProvider` export.

## What changed

- Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
- Add an async HTTP client path for trace export when we are already
inside a multi-thread Tokio runtime.
- Make provider shutdown flush traces before tearing down the tracer
provider.
- Add loopback coverage that verifies traces are actually sent to
`/v1/traces`:
  - outside Tokio
  - inside a multi-thread Tokio runtime
  - inside a current-thread Tokio runtime
- Remove the `codex_otel::otel_provider` shim and update remaining
imports.

## Why

I hit cases where spans were being created correctly but never made it
to the collector. The issue turned out to be in exporter/runtime
behavior rather than the span plumbing itself. This PR narrows that gap
and gives us regression coverage for the actual export path.
2026-03-11 09:59:49 -07:00
pakrym-oai
8ede17b3ce Allow bool web_search in ToolsToml (#14352)
Summary
- add a custom deserializer so `[tools].web_search` can be a bool
(treated as disabled) or a config object
- extend core and app-server tests to cover bool handling in TOML config

Testing
- Not run (not requested)
2026-03-11 16:24:10 +00:00
Rasmus Rygaard
8af97ce4b0 Revert "Pass more params to compaction" (#14298) 2026-03-11 08:44:55 -07:00
Channing Conger
2cfa106091 Responses: set x-client-request-id as convesration_id when talking to responses (#14312)
Right now we're sending the header session_id to responses which is
ignored/dropped. This sets a useful x-client-request-id to the
conversation_id.
2026-03-10 23:46:05 -07:00
Fouad Matin
78280f872a fix(arc_monitor): api path (#14290)
This PR just fixes the API path for ARC monitor.
2026-03-11 02:50:38 +00:00
pakrym-oai
816e447ead Add snippets annotated with types to tools when code mode enabled (#14284)
Main purpose is for code mode to understand the return type.
2026-03-10 19:20:15 -07:00
Ahmed Ibrahim
cc417c39a0 Split spawn_csv from multi_agent (#14282)
- make `spawn_csv` a standalone feature for CSV agent jobs
- keep `spawn_csv -> multi_agent` one-way and preserve restricted
subagent disable paths
2026-03-11 01:42:50 +00:00
Ahmed Ibrahim
5b10b93ba2 Add realtime start instructions config override (#14270)
- add `realtime_start_instructions` config support
- thread it into realtime context updates, schema, docs, and tests
2026-03-10 18:42:05 -07:00
pakrym-oai
566897d427 Make unified exec session_id numeric (#14279)
It's a number on the write_stdin input, make it a number on the output
and also internally.
2026-03-10 18:38:39 -07:00
pakrym-oai
24b8d443b8 Prefix code mode output with success or failure message and include error stack (#14272) 2026-03-10 18:33:52 -07:00
Ahmed Ibrahim
3f7cb03043 Stabilize websocket response.failed error delivery (#14017)
## What changed
- Drop failed websocket connections immediately after a terminal stream
error instead of awaiting a graceful close handshake before forwarding
the error to the caller.
- Keep the success path and the closed-connection guard behavior
unchanged.

## Why this fixes the flake
- The failing integration test waits for the second websocket stream to
surface the model error before issuing a follow-up request.
- On slower runners, the old error path awaited
`ws_stream.close().await` before sending the error downstream. If that
close handshake stalled, the test kept waiting for an error that had
already happened server-side and nextest timed it out.
- Dropping the failed websocket immediately makes the terminal error
observable right away and marks the session closed so the next request
reconnects cleanly instead of depending on a best-effort close
handshake.

## Code or test?
- This is a production logic fix in `codex-api`. The existing websocket
integration test already exercises the regression path.
2026-03-10 17:59:41 -07:00
Ahmed Ibrahim
567ad7fafd Show spawned agent model and effort in TUI (#14273)
- include the requested sub-agent model and reasoning effort in the
spawn begin event\n- render that metadata next to the spawned agent name
and role in the TUI transcript

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 00:46:25 +00:00
pakrym-oai
37f51382fd Rename code mode tool to exec (#14254)
Summary
- update the code-mode handler, runner, instructions, and error text to
refer to the `exec` tool name everywhere that used to say `code_mode`
- ensure generated documentation strings and tool specs describe `exec`
and rely on the shared `PUBLIC_TOOL_NAME`
- refresh the suite tests so they invoke `exec` instead of the old name

Testing
- Not run (not requested)
2026-03-11 00:30:16 +00:00
maja-openai
16daab66d9 prompt changes to guardian (#14263)
## Summary
  - update the guardian prompting
- clarify the guardian rejection message so an action may still proceed
if the user explicitly approves it after being informed of the risk

  ## Testing
  - cargo run on selected examples
2026-03-10 17:05:43 -07:00
Celia Chen
295b56bece chore: add a separate reject-policy flag for skill approvals (#14271)
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
2026-03-10 23:58:23 +00:00
pakrym-oai
18199d4e0e Add store/load support for code mode (#14259)
adds support for transferring state across code mode invocations.
2026-03-10 16:53:53 -07:00
Rasmus Rygaard
f8ef154a6b Pass more params to compaction (#14247)
Pass more params to /compact. This should give us parity with the
/responses endpoint to improve caching.

I'm torn about the MCP await. Blocking will give us parity but it seems
like we explicitly don't block on MCPs. Happy either way
2026-03-10 16:39:57 -07:00
Leo Shimonaka
de2a73cd91 feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155)
Add additional macOS Sandbox Permissions levers for the following:

- Launch Services
- Contacts
- Reminders
2026-03-10 23:34:47 +00:00
pakrym-oai
8b33485302 Add code_mode output helpers for text and images (#14244)
Summary
- document how code-mode can import `output_text`/`output_image` and
ensure `add_content` stays compatible
- add a synthetic `@openai/code_mode` module that appends content items
and validates inputs
- cover the new behavior with integration tests for structured text and
image outputs

Testing
- Not run (not requested)
2026-03-10 16:25:27 -07:00
Ahmed Ibrahim
bf936fa0c1 Clarify close_agent tool description (#14269)
- clarify the `close_agent` tool description so it nudges models to
close agents they no longer need
- keep the change scoped to the tool spec text only

Co-authored-by: Codex <noreply@openai.com>
2026-03-10 16:25:08 -07:00
gabec-openai
b73228722a Load agent metadata from role files (#14177) 2026-03-10 16:21:48 -07:00
pakrym-oai
e791559029 Add model-controlled truncation for code mode results (#14258)
Summary
- document that `@openai/code_mode` exposes
`set_max_output_tokens_per_exec_call` and that `code_mode` truncates the
final Rust-side output when the budget is exceeded
- enforce the configured budget in the Rust tool runner, reusing
truncation helpers so text-only outputs follow the unified-exec wrapper
and mixed outputs still fit within the limit
- ensure the new behavior is covered by a code-mode integration test and
string spec update

Testing
- Not run (not requested)
2026-03-10 15:57:14 -07:00
pakrym-oai
c7e28cffab Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes

Testing
- Not run (not requested)
2026-03-10 15:25:19 -07:00
Won Park
28934762d0 unifying all image saves to /tmp to bug-proof (#14149)
image-gen feature will have the model saving to /tmp by default + at all
times
2026-03-10 15:13:12 -07:00
Ahmed Ibrahim
2895d3571b Add spawn_agent model overrides (#14160)
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the
values pass through
- validate requested models against `model.model` and only check that
the selected model supports the requested reasoning effort

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-10 14:04:04 -07:00
xl-openai
2544bd02a2 feat: Allow sync with remote plugin status. (#14176)
Add forceRemoteSync to plugin/list.
When it is set to True, we will sync the local plugin status with the
remote one (backend-api/plugins/list).
2026-03-10 13:32:59 -07:00
Matthew Zeng
bda9e55c7e add(core): arc_monitor (#13936)
## Summary
- add ARC monitor support for MCP tool calls by serializing MCP approval
requests into the ARC action shape and sending the relevant
conversation/policy context to the `/api/codex/safety/arc` endpoint
- route ARC outcomes back into MCP approval flow so `ask-user` falls
back to a user prompt and `steer-model` blocks the tool call, with
guardian/ARC tests covering the new request shape
- update the TUI approval copy from “Approve Once” to “Allow” / “Allow
for this session” and refresh the related
  snapshots

---------

Co-authored-by: Fouad Matin <fouad@openai.com>
Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>
2026-03-10 13:16:47 -07:00
pakrym-oai
46e6661d4e Reuse McpToolOutput in McpHandler (#14229)
We already have a type to represent the MCP tool output, reuse it
instead of the custom McpHandlerOutput
2026-03-10 10:41:41 -07:00
pakrym-oai
e52afd28b0 Expose strongly-typed result for exec_command (#14183)
Summary
- document output types for the various tool handlers and registry so
the API exposes richer descriptions
- update unified execution helpers and client tests to align with the
new output metadata
- clean up unused helpers across tool dispatch paths

Testing
- Not run (not requested)
2026-03-10 09:54:34 -07:00
Eric Traut
e4edafe1a8 Log ChatGPT user ID for feedback tags (#13901)
There are some bug investigations that currently require us to ask users
for their user ID even though they've already uploaded logs and session
details via `/feedback`. This frustrates users and increases the time
for diagnosis.

This PR includes the ChatGPT user ID in the metadata uploaded for
`/feedback` (both the TUI and app-server).
2026-03-10 09:57:41 -06:00
Eric Traut
9a501ddb08 Fix Linux tmux segfault in user shell lookup (#13900)
Replace the Unix shell lookup path in `codex-rs/core/src/shell.rs` to
use
`libc::getpwuid_r()` instead of `libc::getpwuid()` when resolving the
current
user's shell.

Why:
- `getpwuid()` can return pointers into libc-managed shared storage
- on the musl static Linux build, concurrent callers can race on that
storage
- this matches the crash pattern reported in tmux/Linux sessions with
parallel
  shell activity

Refs:
- Fixes #13842
2026-03-10 09:57:18 -06:00
Eric Traut
b90921eba8 Fix release-mode integration test compiler failure (#13603)
Addresses #13586

This doesn't affect our CI scripts. It was user-reported.

Summary
- add `wiremock::ResponseTemplate` and `body_string_contains` imports
behind `#[cfg(not(debug_assertions))]` in
`codex-rs/core/tests/suite/view_image.rs` so release builds only pull
the helpers they actually use
2026-03-10 08:30:56 -06:00
Ahmed Ibrahim
6b7253b123 Fix unified exec test output assertion (#14184)
## Summary
- update the unified exec test to use truncated_output() instead of the
removed output field
- fix the compile failure on latest main after ExecCommandToolOutput
changed shape
2026-03-09 23:12:36 -07:00
Ahmed Ibrahim
aa6a57dfa2 Stabilize incomplete SSE retry test (#13879)
## What changed
- The retry test now uses the same streaming SSE test server used by
production-style tests instead of a wiremock sequence.
- The fixture is resolved via `find_resource!`, and the test asserts
that exactly two outbound requests were sent.

## Why this fixes the flake
- The old wiremock sequence approximated early-close behavior, but it
did not reproduce the same streaming semantics the real client sees.
- That meant the retry path depended on mock implementation details
instead of on the actual transport behavior we care about.
- Switching to the streaming SSE helper makes the test exercise the real
early-close/retry contract, and counting requests directly verifies that
we retried exactly once rather than merely hoping the sequence aligned.

## Scope
- Test-only change.
2026-03-09 22:34:44 -07:00
Ahmed Ibrahim
2e24be2134 Use realtime transcript for handoff context (#14132)
- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests
2026-03-09 22:30:03 -07:00
Channing Conger
c6343e0649 Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
### Purpose
While trying to build out CLI-Tools for the agent to use under skills we
have found that those tools sometimes need to invoke a user elicitation.
These elicitations are handled out of band of the codex app-server but
need to indicate to the exec manager that the command running is not
going to progress on the usual timeout horizon.

### Example
Model calls universal exec:
`$ download-credit-card-history --start-date 2026-01-19 --end-date
2026-02-19 > credit_history.jsonl`

download-cred-card-history might hit a hosted/preauthenticated service
to fetch data. That service might decide that the request requires an
end user approval the access to the personal data. It should be able to
signal to the running thread that the command in question is blocked on
user elicitation. In that case we want the exec to continue, but the
timeout to not expire on the tool call, essentially freezing time until
the user approves or rejects the command at which point the tool would
signal the app-server to decrement the outstanding elicitation count.
Now timeouts would proceed as normal.

### What's Added

- New v2 RPC methods:
    - thread/increment_elicitation
    - thread/decrement_elicitation
- Protocol updates in:
    - codex-rs/app-server-protocol/src/protocol/common.rs
    - codex-rs/app-server-protocol/src/protocol/v2.rs
- App-server handlers wired in:
    - codex-rs/app-server/src/codex_message_processor.rs

### Behavior

- Counter starts at 0 per thread.
- increment atomically increases the counter.
- decrement atomically decreases the counter; decrement at 0 returns
invalid request.
- Transition rules:
- 0 -> 1: broadcast pause state, pausing all active stopwatches
immediately.
    - \>0 -> >0: remain paused.
    - 1 -> 0: broadcast unpause state, resuming stopwatches.
- Core thread/session logic:
    - codex-rs/core/src/codex_thread.rs
    - codex-rs/core/src/codex.rs
    - codex-rs/core/src/mcp_connection_manager.rs

### Exec-server stopwatch integration

- Added centralized stopwatch tracking/controller:
    - codex-rs/exec-server/src/posix/stopwatch_controller.rs
- Hooked pause/unpause broadcast handling + stopwatch registration:
    - codex-rs/exec-server/src/posix/mcp.rs
    - codex-rs/exec-server/src/posix/stopwatch.rs
    - codex-rs/exec-server/src/posix.rs
2026-03-09 22:29:26 -07:00
Matthew Zeng
566e4cee4b [apps] Fix apps enablement condition. (#14011)
- [x] Fix apps enablement condition to check both the feature flag and
that the user is not an API key user.
2026-03-09 22:25:43 -07:00
pakrym-oai
a9ae43621b Move exec command truncation into ExecCommandToolOutput (#14169)
Summary
- relocate truncation logic for exec command output into the new
`ExecCommandToolOutput` response helper instead of centralized handler
code
- update all affected tools and unified exec handling to use the new
response item structure and eliminate `Function(FunctionToolOutput)`
responses
- adjust context, registry, and handler interfaces to align with the new
response semantics and error fields

Testing
- Not run (not requested)
2026-03-09 22:13:48 -07:00
xl-openai
0c33af7746 feat: support disabling bundled system skills (#13792)
Support disable bundled system skills with a config:

[skills.bundled]
enabled = false
2026-03-09 22:02:53 -07:00
pakrym-oai
710682598d Export tools module into code mode runner (#14167)
**Summary**
- allow `code_mode` to pass enabled tools metadata to the runner and
expose them via `tools.js`
- import tools inside JavaScript rather than relying only on globals or
proxies for nested tool calls
- update specs, docs, and tests to exercise the new bridge and explain
the tooling changes

**Testing**
- Not run (not requested)
2026-03-09 21:59:09 -07:00
Dylan Hurd
772259b01f fix(core) default RejectConfig.request_permissions (#14165)
## Summary
Adds a default here so existing config deserializes

## Testing
- [x] Added a unit test
2026-03-10 04:56:23 +00:00
pakrym-oai
d71e042694 Enforce single tool output type in codex handlers (#14157)
We'll need to associate output schema with each tool. Each tool can only
have on output type.
2026-03-09 21:49:44 -07:00
Andrei Eternal
244b2d53f4 start of hooks engine (#13276)
(Experimental)

This PR adds a first MVP for hooks, with SessionStart and Stop

The core design is:

- hooks live in a dedicated engine under codex-rs/hooks
- each hook type has its own event-specific file
- hook execution is synchronous and blocks normal turn progression while
running
- matching hooks run in parallel, then their results are aggregated into
a normalized HookRunSummary

On the AppServer side, hooks are exposed as operational metadata rather
than transcript-native items:

- new live notifications: hook/started, hook/completed
- persisted/replayed hook results live on Turn.hookRuns
- we intentionally did not add hook-specific ThreadItem variants

Hooks messages are not persisted, they remain ephemeral. The context
changes they add are (they get appended to the user's prompt)
2026-03-10 04:11:31 +00:00
pakrym-oai
da616136cc Add code_mode experimental feature (#13418)
A much narrower and more isolated (no node features) version of js_repl
2026-03-09 20:56:27 -07:00
pakrym-oai
aa04ea6bd7 Refactor tool output into trait implementations (#14152)
First state to making tool outputs strongly typed (and `renderable`).
2026-03-09 19:38:32 -07:00
viyatb-oai
1165a16e6f fix: keep permissions profiles forward compatible (#14107)
## Summary
- preserve unknown `:special_path` tokens, including nested entries, so
older Codex builds warn and ignore instead of failing config load
- fail closed with a startup warning when a permissions profile has
missing or empty filesystem entries instead of aborting profile
compilation
- normalize Windows verbatim paths like `\?\C:\...` before absolute-path
validation while keeping explicit errors for truly invalid paths

## Testing
- just fmt
- cargo test -p codex-core permissions_profiles_allow
- cargo test -p codex-core
normalize_absolute_path_for_platform_simplifies_windows_verbatim_paths
- cargo test -p codex-protocol
unknown_special_paths_are_ignored_by_legacy_bridge
- cargo clippy -p codex-core -p codex-protocol --all-targets -- -D
warnings
- cargo clean
2026-03-09 18:43:38 -07:00
viyatb-oai
b0cbc25a48 fix(protocol): preserve legacy workspace-write semantics (#13957)
## Summary
This is a fast follow to the initial `[permissions]` structure.

- keep the new split-policy carveout behavior for narrower non-write
entries under broader writable roots
- preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
that drops only redundant nested readable roots when projecting from
`SandboxPolicy`
- route the legacy macOS seatbelt adapter through that same legacy
bridge so redundant nested readable roots do not become read-only
carveouts on macOS
- derive the legacy bridge for `command_exec` using the sandbox root cwd
rather than the request cwd so policy derivation matches later sandbox
enforcement
- add regression coverage for the legacy macOS nested-readable-root case

## Examples
### Legacy `workspace-write` on macOS
A legacy `workspace-write` policy can redundantly list a nested readable
root under an already-writable workspace root.

For example, legacy config can effectively mean:
- workspace root (`.` / `cwd`) is writable
- `docs/` is also listed in `readable_roots`

The new shared split-policy helper intentionally treats a narrower
non-write entry under a broader writable root as a carveout for real
`[permissions]` configs. Without this fast follow, the unchanged macOS
seatbelt legacy adapter could project that legacy shape into a
`FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
under the writable workspace root. In practice, legacy callers on macOS
could unexpectedly lose write access inside `docs/`, even though that
path was writable before the `[permissions]` migration work.

This change fixes that by routing the legacy seatbelt path through the
cwd-aware legacy bridge, so:
- legacy `workspace-write` keeps `docs/` writable when `docs/` was only
a redundant readable root
- explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
'read'` still make `docs/` read-only, which is the new intended
split-policy behavior

### Legacy `command_exec` with a subdirectory cwd
`command_exec` can run a command from a request cwd that is narrower
than the sandbox root cwd.

For example:
- sandbox root cwd is `/repo`
- request cwd is `/repo/subdir`
- legacy policy is still `workspace-write` rooted at `/repo`

Before this fast follow, `command_exec` derived the legacy bridge using
the request cwd, but the sandbox was later built using the sandbox root
cwd. That mismatch could miss redundant legacy readable roots during
projection and accidentally reintroduce read-only carveouts for paths
that should still be writable under the legacy model.

This change fixes that by deriving the legacy bridge with the same
sandbox root cwd that sandbox enforcement later uses.

## Verification
- `just fmt`
- `cargo test -p codex-core
seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
- `cargo test -p codex-core test_sandbox_config_parsing`
- `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
warnings`
- `cargo clean`
2026-03-09 18:43:27 -07:00