Commit Graph

1616 Commits

Author SHA1 Message Date
jif-oai
a6e9469fa4 chore: unify memory job flow (#11334) 2026-02-10 20:26:39 +00:00
Ahmed Ibrahim
5e01450963 Strip unsupported images from prompt history to guard against model switch (#11349)
- Make `ContextManager::for_prompt` modality-aware and strip input_image
content when the active model is text-only.
- Added a test for multi-model -> text-only model switch
2026-02-10 11:58:00 -08:00
iceweasel-oai
82f93a13b2 include sandbox (seatbelt, elevated, etc.) as in turn metadata header (#10946)
This will help us understand retention/usage for folks who use the
Windows (or any other) sandboxes
2026-02-10 19:50:07 +00:00
viyatb-oai
62d0f302fd fix(core): canonicalize wrapper approvals and support heredoc prefix … (#10941)
## Summary
- Reduced repeated approvals for equivalent wrapper commands and fixed
execpolicy matching for heredoc-style shell invocations, with minimal
behavior change and fail-closed defaults.

## Fixes
1. Canonicalized approval matching for wrappers so equivalent commands
map to the same approval intent.
2. Added heredoc-aware prefix extraction for execpolicy so commands like
`python3 <<'PY' ... PY` match rules such as `prefix_rule(["python3"],
...)`.
3. Kept fallback behavior conservative: if parsing is ambiguous,
existing prompt behavior is preserved.

## Edge Cases Covered
- Wrapper path/name differences: `/bin/bash` vs `bash`, `/bin/zsh` vs
`zsh`.
- Shell modes: `-c` and `-lc`.
- Heredoc forms: quoted delimiter (`<<'PY'`) and unquoted delimiter (`<<
PY`).
- Multi-command heredoc scripts are rejected by the fallback
- Non-heredoc redirections (`>`, etc.) are not treated as heredoc prefix
matches.
- Complex scripts still fall back to prior behavior rather than
expanding permissions.

---------

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
2026-02-10 11:46:40 -08:00
pakrym-oai
e4b5384539 Extract tool building (#11337)
Make it clear what input go into building tools and allow for easy reuse
for pre-warm request
2026-02-10 11:45:23 -08:00
Ahmed Ibrahim
9c4656000f Sanitize MCP image output for text-only models (#11346)
- Replace image blocks in MCP tool results with a text placeholder when
the active model does not accept image input.
- Add an e2e rmcp test to verify sanitized tool output is what gets sent
back to the model.
2026-02-10 11:25:32 -08:00
Ahmed Ibrahim
6e96e4837e Always expose view_image and return unsupported image-input error (#11336)
- Keep `view_image` in the advertised tool list for all models.
- Return a clear error when the current model does not support image
inputs, and cover it with a unit test.
2026-02-10 11:25:12 -08:00
jif-oai
847a6092e6 fix: reduce usage of open_if_present (#11344) 2026-02-10 19:25:07 +00:00
pakrym-oai
0639c33892 Compare full request for websockets incrementality (#11343)
Tools can dynamically change mid-turn now. We need to be more thorough
about reusing incremental connections.
2026-02-10 19:14:36 +00:00
Michael Bolin
548afa5749 core: remove stale apply_patch SandboxPolicy TODO in seatbelt (#11345)
The `TODO` in `core/src/seatbelt.rs` claimed that `apply_patch` still needed to honor `SandboxPolicy`. That was true when the comment was added, but it is no longer true.

Analysis:
- The TODO was introduced in #1762, when seatbelt code was split out of `exec.rs`.
- `apply_patch` sandboxing was later implemented in #1705.
- Today, `apply_patch` calls are routed through the tool orchestrator and delegated to `ApplyPatchRuntime`, which executes via `execute_env()` using the active sandbox attempt policy.
- On macOS, the sandbox transform path for that execution still builds seatbelt args with `create_seatbelt_command_args(command, policy, sandbox_policy_cwd)`, so the same `SandboxPolicy` gates `apply_patch` writes and network behavior.

Because this behavior is already enforced, the TODO is stale and removing it avoids implying missing sandbox coverage where none exists.

No functional behavior change; comment-only cleanup.
2026-02-10 19:10:02 +00:00
Dylan Hurd
f3bbcc987d test(core): stabilize ARM bazel remote-model and parallelism tests (#11330)
## Summary
- keep wiremock MockServer handles alive through async assertions in
remote model suite tests
- assert /models request count in remote_models_hide_picker_only_models
- use a slightly higher parallel timing threshold on aarch64 while
keeping existing x86 threshold

## Validation
- just fmt
- targeted tests:
- cargo test -p codex-core --test all
suite::remote_models::remote_models_merge_replaces_overlapping_model --
--exact
- cargo test -p codex-core --test all
suite::remote_models::remote_models_hide_picker_only_models -- --exact
- cargo test -p codex-core --test all
suite::tool_parallelism::shell_tools_run_in_parallel -- --exact
- soak loop: 40 iterations of all three targeted tests

## Notes
- cargo test -p codex-core has one unrelated local-env failure in
shell_snapshot::tests::try_new_creates_and_deletes_snapshot_file from
exported certificate env content in this workspace.
- local bazel test //codex-rs/core:core-all-test failed to build due
missing rust-objcopy in this host toolchain.
2026-02-10 10:57:50 -08:00
guinness-oai
099ed802b2 Treat first rollout session_meta as canonical thread identity (#11241)
During thread/fork, the new rollout includes the fork’s own session_meta
plus copied history that can contain older session_meta entries from the
source thread. thread/list was overwriting metadata on later
session_meta lines, so a fork could be reported with the source thread’s
thread_id. This fix only uses the first session_meta, so the fork keeps
its own ID.
2026-02-10 10:32:11 -08:00
Matthew Zeng
48e415bdef [apps] Improve app installation flow. (#11249)
- [x] Add buttons to start the installation flow and verify installation
completes.
- [x] Hard refresh apps list when the /apps view opens.
2026-02-10 17:59:43 +00:00
Shijie Rao
c4b771a16f Fix: update parallel tool call exec approval to approve on request id (#11162)
### Summary

In parallel tool call, exec command approvals were not approved at
request level but at a turn level. i.e. when a single request is
approved, the system currently treats all requests in turn as approved.

### Before

https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8

### After

https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc
2026-02-10 09:38:00 -08:00
Max Johnson
47356ff83c Revert "Add app-server transport layer with websocket support (#10693)" (#11323)
Suspected cause of deadlocking bug
2026-02-10 17:37:49 +00:00
Fouad Matin
693bac1851 fix(protocol): approval policy never prompt (#11288)
This removes overly directed language about how the model should behave
when it's in `approval_policy=never` mode.

---------

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
2026-02-10 09:27:46 -08:00
pakrym-oai
3322b99900 Remove ApiPrompt (#11265)
Keep things simple and build a full Responses API request request right
in the model client
2026-02-10 16:12:31 +00:00
jif-oai
59c625458b Fix pending input test waiting logic (#11322)
## Summary
- remove redundant user message wait that could time out and cause
flakiness
- rely on the existing turn-complete wait to ensure the follow-up
request is observed

## Testing
- Not run (not requested)
2026-02-10 15:40:53 +00:00
jif-oai
e57892b211 feat: phase 2 consolidation (#11306)
Consolidation phase of memories

Cleaning and better handling of concurrency
2026-02-10 14:31:16 +00:00
jif-oai
d735df1f50 Extract hooks into dedicated crate (#11311)
Summary
- move `core/src/hooks` implementation into a new `codex-hooks` crate
with its own manifest
- update `codex-rs` workspace and `codex-core` crate to depend on the
extracted `hooks` crate and wire up the shared APIs
- ensure references, modules, and lockfile reflect the new crate layout

Testing
- Not run (not requested)
2026-02-10 13:42:17 +00:00
jif-oai
1d5eba0090 feat: align memory phase 1 and make it stronger (#11300)
## Align with the new phase-1 design

Basically we know run phase 1 in parallel by considering:
* Max 64 rollouts
* Max 1 month old
* Consider the most recent first

This PR also adds stronger parallelization capabilities by detecting
stale jobs, retry policies, ownership of computation to prevent double
computations etc etc
2026-02-10 13:42:09 +00:00
jif-oai
223fadc760 Fix spawn_agent input type (#11304) 2026-02-10 12:16:39 +00:00
jif-oai
87ccc5bbae feat: add connector capabilities to sub-agents (#11191) 2026-02-10 11:53:01 +00:00
jif-oai
6049ff02a0 memories: add extraction and prompt module foundation (#11200)
## Summary
- add the new `core/src/memories` module (phase-one parsing, rollout
filtering, storage, selection, prompts)
- add Askama-backed memory templates for stage-one input/system and
consolidation prompts
- add module tests for parsing, filtering, path bucketing, and summary
maintenance

## Testing
- just fmt
- cargo test -p codex-core --lib memories::
2026-02-10 10:10:24 +00:00
Michael Bolin
44ebf4588f feat: retain NetworkProxy, when appropriate (#11207)
As of this PR, `SessionServices` retains a
`Option<StartedNetworkProxy>`, if appropriate.

Now the `network` field on `Config` is `Option<NetworkProxySpec>`
instead of `Option<NetworkProxy>`.

Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
create the `StartedNetworkProxy`, which is a new struct that retains the
`NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
when it is dropped.)

The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
the appropriate places.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
* #11285
* __->__ #11207
2026-02-10 02:09:23 -08:00
alexsong-oai
9fded117ac feat: support configurable metric_exporter (#10940) 2026-02-10 08:14:28 +00:00
viyatb-oai
3391e5ea86 feat(sandbox): enforce proxy-aware network routing in sandbox (#11113)
## Summary
- expand proxy env injection to cover common tool env vars
(`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` families +
tool-specific variants)
- harden macOS Seatbelt network policy generation to route through
inferred loopback proxy endpoints and fail closed when proxy env is
malformed
- thread proxy-aware Linux sandbox flags and add minimal bwrap netns
isolation hook for restricted non-proxy runs
- add/refresh tests for proxy env wiring, Seatbelt policy generation,
and Linux sandbox argument wiring
2026-02-10 07:44:21 +00:00
alexsong-oai
91704c5672 feat: add SkillPolicy to skill metadata and support allow_implicit_invocation (#11244)
Tested by setting the policy in agents/openai.yaml to true, false, and
leaving it unset (default).
```
policy:
  allow_implicit_invocation: false
```
<img width="847" height="289" alt="Screenshot 2026-02-09 at 3 42 41 PM"
src="https://github.com/user-attachments/assets/d3476264-3355-47cf-894a-4ffba53e3481"
/>
2026-02-09 23:13:27 -08:00
Matthew Zeng
005e040f97 [apps] Add thread_id param to optionally load thread config for apps feature check. (#11279)
- [x] Add thread_id param to optionally load thread config for apps
feature check
2026-02-09 23:10:26 -08:00
Eric Traut
bb974c78de Disable dynamic model refresh for custom model providers (#11239)
The dynamic model refresh feature (`https://api.openai.com/v1/models`
endpoint) is currently gated on a runtime check for an auth method other
than API Key. It should be gated on a check specifically for ChatGPT
Auth because some custom model providers (e.g. for local models) use no
auth mechanism. A call to `self.auth_manager.auth_mode()` will return
`None` in this case.

Addresses #11213
2026-02-09 21:36:09 -08:00
Owen Lin
53741013ab fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
…ount_id and chatgpt_plan_type

### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012

Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).

The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
2026-02-09 20:48:58 -08:00
Dylan Hurd
168c359b71 Adjust shell command timeouts for Windows (#11247)
Summary
- add platform-aware defaults for shell command timeouts so Windows
tests get longer waits
- keep medium timeout longer on Windows to ensure flakiness is reduced

Testing
- Not run (not requested)
2026-02-09 20:03:32 -08:00
Michael Bolin
862ab63071 chore: change ConfigState so it no longer depends on a single config.toml file for reloading (#11262)
If anything, it should depend on `ConfigLayerStack`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11262).
* #11207
* __->__ #11262
2026-02-09 19:26:39 -08:00
Ahmed Ibrahim
d1df3bd63b Revert "Revert "Update models.json"" (#11256)
Reverts openai/codex#11255
2026-02-09 19:22:41 -08:00
Ahmed Ibrahim
03adb5db3e Revert "Update models.json" (#11255)
Reverts openai/codex#9739
2026-02-09 17:44:11 -08:00
github-actions[bot]
c816c430a0 Update models.json (#9739)
Automated update of models.json.

---------

Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
2026-02-09 17:20:18 -08:00
Ahmed Ibrahim
a1abd53b6a Remove offline fallback for models (#11238)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-02-09 16:58:54 -08:00
Dylan Hurd
d65f09b913 fix(feature) UnderDevelopment feature must be off (#11242)
## Summary
1. Bump RemoteModels to Stable
2. Assert that all UnderDevelopment features are off by default

## Testing
- [x] Added unit test
2026-02-09 15:14:15 -08:00
Ahmed Ibrahim
481145e959 Use longest remote model prefix matching (#11228)
Match model metadata by longest matching remote slug prefix before local
fallback.

- Update `get_model_info` to prefer the most specific remote slug prefix
for the requested model.
- Add an integration test to assert `gpt-5.3-codex-test` resolves to
`gpt-5.3-codex` over `gpt-5.3`.
2026-02-09 15:05:56 -08:00
Matthew Zeng
d90df4761b [apps] Add gated instructions for Apps. (#10924)
- [x] Add gated instructions for Apps.
2026-02-09 14:48:09 -08:00
jif-oai
ffd4bd345c feat: tie shell snapshot to cwd (#11231)
Fix for this: https://github.com/openai/codex/issues/11223

Basically we tie the shell snapshot to a `cwd` to handle `cwd`-based env
setups
2026-02-09 22:14:39 +00:00
jif-oai
c2ca51273f feat: use a notify instead of grace to close ue process (#11219) 2026-02-09 22:14:33 +00:00
xl-openai
cca13fb03a skill-creator: Remove invalid reference. (#10960)
Remove references to two files that do not exist.
2026-02-09 13:37:27 -08:00
xl-openai
a33ee46e3b feat: extend skills/list to support additional roots. (#10835)
Add an optional perCwdExtraUserRoots
2026-02-09 13:30:38 -08:00
jif-oai
74ecd6e3b2 state: add memory consolidation lock primitives (#11199)
## Summary
- add a migration for memory_consolidation_locks
- add acquire/release lock primitives to codex-state runtime
- add core/state_db wrappers and cwd normalization for memory queries
and lock keys

## Testing
- cargo test -p codex-state memory_consolidation_lock_
- cargo test -p codex-core --lib state_db::
2026-02-09 21:04:20 +00:00
Anton Panasenko
becc3a0424 feat: search_tool (#10657)
**Why We Did This**
- The goal is to reduce MCP tool context pollution by not exposing the
full MCP tool list up front
- It forces an explicit discovery step (`search_tool_bm25`) so the model
narrows tool scope before making MCP calls, which helps relevance and
lowers prompt/tool clutter.

**What It Changed**
- Added a new experimental feature flag `search_tool` in
`core/src/features.rs:90` and `core/src/features.rs:430`.
- Added config/schema support for that flag in
`core/config.schema.json:214` and `core/config.schema.json:1235`.
- Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
`core/Cargo.toml:23`.
- Added new tool handler `search_tool_bm25` in
`core/src/tools/handlers/search_tool_bm25.rs:18`.
- Registered the handler and tool spec in
`core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
`core/src/tools/spec.rs:1344`.
- Extended `ToolsConfig` to carry `search_tool` enablement in
`core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
- Injected dedicated developer instructions for tool-discovery workflow
in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
`core/templates/search_tool/developer_instructions.md:1`.
- Added session state to store one-shot selected MCP tools in
`core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
- Added filtering so when feature is enabled, only selected MCP tools
are exposed on the next request (then consumed) in
`core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
- Added E2E suite coverage for
enablement/instructions/hide-until-search/one-turn-selection in
`core/tests/suite/search_tool.rs:72`,
`core/tests/suite/search_tool.rs:109`,
`core/tests/suite/search_tool.rs:147`, and
`core/tests/suite/search_tool.rs:218`.
- Refactored test helper utilities to support config-driven tool
collection in `core/tests/suite/tools.rs:281`.

**Net Behavioral Effect**
- With `search_tool` **off**: existing MCP behavior (tools exposed
normally).
- With `search_tool` **on**: MCP tools start hidden, model must call
`search_tool_bm25`, and only returned `selected_tools` are available for
the next model call.
2026-02-09 12:53:50 -08:00
Charley Cunningham
9450cd9ce5 core: add focused diagnostics for remote compaction overflows (#11133)
## Summary
- add targeted remote-compaction failure diagnostics in compact_remote
logging
- log the specific values needed to explain overflow timing:
  - last_api_response_total_tokens
  - estimated_tokens_of_items_added_since_last_successful_api_response
  - estimated_bytes_of_items_added_since_last_successful_api_response
  - failing_compaction_request_body_bytes
- simplify breakdown naming and remove
last_api_response_total_bytes_estimate (it was an approximation and not
useful for debugging)

## Why
When compaction fails with context_length_exceeded, we need concrete,
low-ambiguity numbers that map directly to:
1) what the API most recently reported, and
2) what local history added since then.

This keeps the failure logs actionable without adding broad, noisy
metrics.

## Testing
- just fmt
- cargo test -p codex-core
2026-02-09 12:42:20 -08:00
jif-oai
c2bfd1e473 Revert "chore: enable sub agents" (#11230)
Reverts openai/codex#11173
2026-02-09 20:22:38 +00:00
pakrym-oai
ccd17374cb Move warmup to the task level (#11216)
Instead of storing a special connection on the client level make the
regular task responsible for establishing a normal client session and
open a connection on it.

Then when the turn is started we pass in a pre-established session.
2026-02-09 10:57:52 -08:00
Eric Traut
9346d321d2 Fixed bug in file watcher that results in spurious skills update events and large log files (#11217)
On some platforms, the "notify" file watcher library emits events for
file opens and reads, not just file modifications or deletes. The
previous implementation didn't take this into account.

Furthermore, the `tracing.info!` call that I previously added was
emitting a lot of logs. I had assumed incorrectly that `info` level
logging was disabled by default, but it's apparently enabled for this
crate. This is resulting in large logs (hundreds of MB) for some users.
2026-02-09 10:33:57 -08:00