Commit Graph

42 Commits

Author SHA1 Message Date
Max Johnson
7053aa5457 Reapply "Add app-server transport layer with websocket support" (#11370)
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.

This reverts commit 47356ff83c.

## Summary

To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
   - inbound handling (`transport_event_rx`)
   - outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks

## Validation

Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests

<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-02-11 18:13:39 +00:00
Max Johnson
47356ff83c Revert "Add app-server transport layer with websocket support (#10693)" (#11323)
Suspected cause of deadlocking bug
2026-02-10 17:37:49 +00:00
sayan-oai
fc05374344 chore: add phase to message responseitem (#10455)
### What

add wiring for `phase` field on `ResponseItem::Message` to lay
groundwork for differentiating model preambles and final messages.
currently optional.

follows pattern in #9698.

updated schemas with `just write-app-server-schema` so we can see type
changes.

### Tests
Updated existing tests for SSE parsing and hydrating from history
2026-02-03 02:52:26 +00:00
iceweasel-oai
c40ad65bd8 remove sandbox globals. (#9797)
Threads sandbox updates through OverrideTurnContext for active turn
Passes computed sandbox type into safety/exec
2026-01-27 11:04:23 -08:00
pakrym-oai
998e88b12a Use test_codex more (#9961)
Reduces boilderplate.
2026-01-26 18:52:10 -08:00
jif-oai
83775f4df1 feat: ephemeral threads (#9765)
Add ephemeral threads capabilities. Only exposed through the
`app-server` v2

The idea is to disable the rollout recorder for those threads.
2026-01-24 14:57:40 +00:00
Dylan Hurd
8b3521ee77 feat(core) update Personality on turn (#9644)
## Summary
Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
is explicitly unused by the clients so far, to simplify PRs - app-server
and tui implementations will be follow-ups.

## Testing
- [x] added integration tests
2026-01-22 12:04:23 -08:00
pakrym-oai
b511c38ddb Support end_turn flag (#9698)
Experimental flag that signals the end of the turn.
2026-01-22 17:27:48 +00:00
Dylan Hurd
80d7a5d7fe chore(instructions) Remove unread SessionMeta.instructions field (#9423)
### Description
- Remove the now-unused `instructions` field from the session metadata
to simplify SessionMeta and stop propagating transient instruction text
through the rollout recorder API. This was only saving
user_instructions, and was never being read.
- Stop passing user instructions into the rollout writer at session
creation so the rollout header only contains canonical session metadata.

### Testing

- Ran `just fmt` which completed successfully.
- Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
-p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
codex-tui2` which completed (Clippy fixes applied) as part of
verification.
- Ran `cargo test -p codex-protocol` which passed (28 tests).
- Ran `cargo test -p codex-core` which showed failures in a small set of
tests (not caused by the protocol type change directly):
`default_client::tests::test_create_client_sets_default_headers`,
several `models_manager::manager::tests::refresh_available_models_*`,
and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
tests failed in this CI run).
- Ran `cargo test -p codex-app-server` which reported several failing
integration tests (including
`suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
`suite::output_schema::send_user_turn_*`, and
`suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
- `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
attempted but aborted due to disk space exhaustion (`No space left on
device`).

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
2026-01-17 16:02:28 -08:00
Ahmed Ibrahim
146d54cede Add collaboration_mode override to turns (#9408) 2026-01-16 21:51:25 -08:00
charley-oai
4a9c2bcc5a Add text element metadata to types (#9235)
Initial type tweaking PR to make the diff of
https://github.com/openai/codex/pull/9116 smaller

This should not change any behavior, just adds some fields to types
2026-01-14 16:41:50 -08:00
pakrym-oai
92472e7baa Use current model for review (#9179)
Instead of having a hard-coded default review model, use the current
model for running `/review` unless one is specified in the config.

Also inherit current reasoning effort
2026-01-14 08:59:41 -08:00
jif-oai
1aed01e99f renaming: task to turn (#8963) 2026-01-09 17:31:17 +00:00
Ahmed Ibrahim
0d3e673019 remove get_responses_requests and get_responses_request_bodies to use in-place matcher (#8858) 2026-01-08 13:57:48 -08:00
jif-oai
116059c3a0 chore: unify conversation with thread name (#8830)
Done and verified by Codex + refactor feature of RustRover
2026-01-07 17:04:53 +00:00
Anton Panasenko
807f8a43c2 feat: expose outputSchema to user_turn/turn_start app_server API (#8377)
What changed
- Added `outputSchema` support to the app-server APIs, mirroring `codex
exec --output-schema` behavior.
- V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
assistant message for that turn.
- V2 `turn/start` now accepts `outputSchema` and constrains the final
assistant message for that turn (explicitly per-turn only).

Core behavior
- `Op::UserTurn` already supported `final_output_json_schema`; now V1
`sendUserTurn` forwards `outputSchema` into that field.
- `Op::UserInput` now carries `final_output_json_schema` for per-turn
settings updates; core maps it into
`SessionSettingsUpdate.final_output_json_schema` so it applies to the
created turn context.
- V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
(it’s applied only for the current turn). Other overrides
(cwd/model/etc) keep their existing persistent behavior.

API / docs
- `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
`outputSchema`).
- `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
- `codex-rs/app-server/README.md`: document `outputSchema` for
`turn/start` and clarify it applies only to the current turn.
- `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
`sendUserTurn` and v2 `turn/start`.

Tests added/updated
- New app-server integration tests asserting `outputSchema` is forwarded
into outbound `/responses` requests as `text.format`:
  - `codex-rs/app-server/tests/suite/output_schema.rs`
  - `codex-rs/app-server/tests/suite/v2/output_schema.rs`
- Added per-turn semantics tests (schema does not leak to the next
turn):
  - `send_user_turn_output_schema_is_per_turn_v1`
  - `turn_start_output_schema_is_per_turn_v2`
- Added protocol wire-compat tests for the merged op:
  - serialize omits `final_output_json_schema` when `None`
  - deserialize works when field is missing
  - serialize includes `final_output_json_schema` when `Some(schema)`

Call site updates (high level)
- Updated all `Op::UserInput { .. }` constructions to include
`final_output_json_schema`:
  - `codex-rs/app-server/src/codex_message_processor.rs`
  - `codex-rs/core/src/codex_delegate.rs`
  - `codex-rs/mcp-server/src/codex_tool_runner.rs`
  - `codex-rs/tui/src/chatwidget.rs`
  - `codex-rs/tui2/src/chatwidget.rs`
  - plus impacted core tests.

Validation
- `just fmt`
- `cargo test -p codex-core`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-protocol`
- `cargo clippy --all-features --tests --profile dev --fix -- -D
warnings`
2026-01-05 10:27:00 -08:00
Thibault Sottiaux
0b53aed2d0 fix: /review to respect session cwd (#8738)
Fixes /review base-branch prompt resolution to use the session/turn cwd
(respecting runtime cwd overrides) so merge-base/diff guidance is
computed from the intended repo; adds a regression test for cwd
overrides; tested with cargo test -p codex-core --test all
review_uses_overridden_cwd_for_base_branch_merge_base.
2026-01-05 12:11:20 +00:00
Michael Bolin
3d4ced3ff5 chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
this PR updates all call non-test call sites to use it instead of
`Config::load_from_base_config_with_overrides()`.

This is important because `load_from_base_config_with_overrides()` uses
an empty `ConfigRequirements`, which is a reasonable default for testing
so the tests are not influenced by the settings on the host. This method
is now guarded by `#[cfg(test)]` so it cannot be used by business logic.

Because `ConfigBuilder::build()` is `async`, many of the test methods
had to be migrated to be `async`, as well. On the bright side, this made
it possible to eliminate a bunch of `block_on_future()` stuff.
2025-12-18 16:12:52 -08:00
xl-openai
da3869eeb6 Support SYSTEM skills. (#8220)
1. Remove PUBLIC skills and introduce SYSTEM skills embedded in the
binary and installed into $CODEX_HOME/skills/.system at startup.
2. Skills are now always enabled (feature flag removed).
3. Update skills/list to accept forceReload and plumb it through (not
used by clients yet).
2025-12-17 18:48:28 -08:00
Eric Traut
bbc5675974 Revert "chore: review in read-only (#7593)" (#8127)
This reverts commit 291b54a762.

This commit was intended to prevent the model from making code changes
during `/review`, which is sometimes does. Unfortunately, it has other
unintended side effects that cause `/review` to fail in a variety of
ways. See #8115 and #7815. We've therefore decided to revert this
change.
2025-12-16 12:01:54 -08:00
Ahmed Ibrahim
cb9a189857 make model optional in config (#7769)
- Make Config.model optional and centralize default-selection logic in
ModelsManager, including a default_model helper (with
codex-auto-balanced when available) so sessions now carry an explicit
chosen model separate from the base config.
- Resolve `model` once in `core` and `tui` from config. Then store the
state of it on other structs.
- Move refreshing models to be before resolving the default model
2025-12-10 11:19:00 -08:00
jif-oai
291b54a762 chore: review in read-only (#7593) 2025-12-04 10:01:12 -08:00
jif-oai
4b78e2ab09 chore: review everywhere (#7444) 2025-12-02 11:26:27 +00:00
jif-oai
a421eba31f fix: disable review rollout filtering (#7371) 2025-12-01 09:04:13 +00:00
jif-oai
aaec8abf58 feat: detached review (#7292) 2025-11-28 11:34:57 +00:00
jif-oai
8ddae8cde3 feat: review in app server (#6613) 2025-11-18 21:58:54 +00:00
Ahmed Ibrahim
ddcc60a085 Update defaults to gpt-5.1 (#6652)
## Summary
- update documentation, example configs, and automation defaults to
reference gpt-5.1 / gpt-5.1-codex
- bump the CLI and core configuration defaults, model presets, and error
messaging to the new models while keeping the model-family/tool coverage
for legacy slugs
- refresh tests, fixtures, and TUI snapshots so they expect the upgraded
defaults

## Testing
- `cargo test -p codex-core
config::tests::test_precedence_fixture_with_gpt5_profile`


------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
2025-11-17 17:40:11 -08:00
pakrym-oai
e8905f6d20 Prefer wait_for_event over wait_for_event_with_timeout (#6349) 2025-11-06 18:11:11 -08:00
Ahmed Ibrahim
7aa46ab5fc ignore agent message deltas for the review mode (#5937)
The deltas produce the whole json output. ignore them.
2025-10-30 00:47:55 +00:00
Michael Bolin
5907422d65 feat: annotate conversations with model_provider for filtering (#5658)
Because conversations that use the Responses API can have encrypted
reasoning messages, trying to resume a conversation with a different
provider could lead to confusing "failed to decrypt" errors. (This is
reproducible by starting a conversation using ChatGPT login and resuming
it as a conversation that uses OpenAI models via Azure.)

This changes `ListConversationsParams` to take a `model_providers:
Option<Vec<String>>` and adds `model_provider` on each
`ConversationSummary` it returns so these cases can be disambiguated.

Note this ended up making changes to
`codex-rs/core/src/rollout/tests.rs` because it had a number of cases
where it expected `Some` for the value of `next_cursor`, but the list of
rollouts was complete, so according to this docstring:


bcd64c7e72/codex-rs/app-server-protocol/src/protocol.rs (L334-L337)

If there are no more items to return, then `next_cursor` should be
`None`. This PR updates that logic.






---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658).
* #5803
* #5793
* __->__ #5658
2025-10-27 02:03:30 -07:00
Ahmed Ibrahim
f178805252 Add feedback upload request handling (#5682) 2025-10-27 05:53:39 +00:00
pakrym-oai
9c903c4716 Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
Adds a new ItemStarted event and delivers UserMessage as the first item
type (more to come).


Renames `InputItem` to `UserInput` considering we're using the `Item`
suffix for actual items.
2025-10-20 13:34:44 -07:00
dedrisian-oai
b016a3e7d8 Remove instruction hack for /review (#4896)
We use to put the review prompt in the first user message as well to
bypass statsig overrides, but now that's been resolved and instructions
are being respected, so we're duplicating the review instructions.
2025-10-07 12:47:00 -07:00
pakrym-oai
06853d94f0 Use wait_for_event helpers in tests (#4753)
## Summary
- replace manual event polling loops in several core test suites with
the shared wait_for_event helpers
- keep prior assertions intact by using closure captures for stateful
expectations, including plan updates, patch lifecycles, and review flow
checks
- rely on wait_for_event_with_timeout where longer waits are required,
simplifying timeout handling

## Testing
- just fmt


------
https://chatgpt.com/codex/tasks/task_i_68e1d58582d483208febadc5f90dd95e
2025-10-04 22:04:05 -07:00
Jeremy Rose
4a5f05c136 make tests pass cleanly in sandbox (#4067)
This changes the reqwest client used in tests to be sandbox-friendly,
and skips a bunch of other tests that don't work inside the
sandbox/without network.
2025-09-25 13:11:14 -07:00
pakrym-oai
14a115d488 Add non_sandbox_test helper (#3880)
Makes tests shorter
2025-09-22 14:50:41 +00:00
dedrisian-oai
72733e34c4 Add dev message upon review out (#3758)
Proposal: We want to record a dev message like so:

```
{
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "<user_action>
  <context>User initiated a review task. Here's the full review output from reviewer model. User may select one or more comments to resolve.</context>
  <action>review</action>
  <results>
  {findings_str}
  </results>
</user_action>"
        }
      ]
    },
```

Without showing in the chat transcript.

Rough idea, but it fixes issue where the user finishes a review thread,
and asks the parent "fix the rest of the review issues" thinking that
the parent knows about it.

### Question: Why not a tool call?

Because the agent didn't make the call, it was a human. + we haven't
implemented sub-agents yet, and we'll need to think about the way we
represent these human-led tool calls for the agent.
2025-09-16 18:43:32 -07:00
dedrisian-oai
7fe4021f95 Review mode core updates (#3701)
1. Adds the environment prompt (including cwd) to review thread
2. Prepends the review prompt as a user message (temporary fix so the
instructions are not replaced on backend)
3. Sets reasoning to low
4. Sets default review model to `gpt-5-codex`
2025-09-16 13:36:51 -07:00
dedrisian-oai
2aa84b8891 Fix EventMsg Optional (#3604) 2025-09-15 00:34:33 +00:00
Ahmed Ibrahim
a30e5e40ee enable-resume (#3537)
Adding the ability to resume conversations.
we have one verb `resume`. 

Behavior:

`tui`:
`codex resume`: opens session picker
`codex resume --last`: continue last message
`codex resume <session id>`: continue conversation with `session id`

`exec`:
`codex resume --last`: continue last conversation
`codex resume <session id>`: continue conversation with `session id`

Implementation:
- I added a function to find the path in `~/.codex/sessions/` with a
`UUID`. This is helpful in resuming with session id.
- Added the above mentioned flags
- Added lots of testing
2025-09-14 19:33:19 -04:00
dedrisian-oai
b2f6fc3b9a Fix flaky windows test (#3564)
There are exactly 4 types of flaky tests in Windows x86 right now:

1. `review_input_isolated_from_parent_history` => Times out waiting for
closing events
2. `review_does_not_emit_agent_message_on_structured_output` => Times
out waiting for closing events
3. `auto_compact_runs_after_token_limit_hit` => Times out waiting for
closing events
4. `auto_compact_runs_after_token_limit_hit` => Also has a problem where
auto compact should add a third request, but receives 4 requests.

1, 2, and 3 seem to be solved with increasing threads on windows runner
from 2 -> 4.

Don't know yet why # 4 is happening, but probably also because of
WireMock issues on windows causing races.
2025-09-14 23:20:25 +00:00
dedrisian-oai
90a0fd342f Review Mode (Core) (#3401)
## 📝 Review Mode -- Core

This PR introduces the Core implementation for Review mode:

- New op `Op::Review { prompt: String }:` spawns a child review task
with isolated context, a review‑specific system prompt, and a
`Config.review_model`.
- `EnteredReviewMode`: emitted when the child review session starts.
Every event from this point onwards reflects the review session.
- `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review
finishes or is interrupted, with optional structured findings:

```json
{
  "findings": [
    {
      "title": "<≤ 80 chars, imperative>",
      "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>",
      "confidence_score": <float 0.0-1.0>,
      "priority": <int 0-3>,
      "code_location": {
        "absolute_file_path": "<file path>",
        "line_range": {"start": <int>, "end": <int>}
      }
    }
  ],
  "overall_correctness": "patch is correct" | "patch is incorrect",
  "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>",
  "overall_confidence_score": <float 0.0-1.0>
}
```

## Questions

### Why separate out its own message history?

We want the review thread to match the training of our review models as
much as possible -- that means using a custom prompt, removing user
instructions, and starting a clean chat history.

We also want to make sure the review thread doesn't leak into the parent
thread.

### Why do this as a mode, vs. sub-agents?

1. We want review to be a synchronous task, so it's fine for now to do a
bespoke implementation.
2. We're still unclear about the final structure for sub-agents. We'd
prefer to land this quickly and then refactor into sub-agents without
rushing that implementation.
2025-09-12 23:25:10 +00:00