Commit Graph

53 Commits

Author SHA1 Message Date
Charley Cunningham
85034b189e core: snapshot tests for compaction requests, post-compaction layout, some additional compaction tests (#11487)
This PR keeps compaction context-layout test coverage separate from
runtime compaction behavior changes, so runtime logic review can stay
focused.

## Included
- Adds reusable context snapshot helpers in
`core/tests/common/context_snapshot.rs` for rendering model-visible
request/history shapes.
- Standardizes helper naming for readability:
  - `format_request_input_snapshot`
  - `format_response_items_snapshot`
  - `format_labeled_requests_snapshot`
  - `format_labeled_items_snapshot`
- Expands snapshot coverage for both local and remote compaction flows:
  - pre-turn auto-compaction
  - pre-turn failure/context-window-exceeded paths
  - mid-turn continuation compaction
  - manual `/compact` with and without prior user turns
- Captures both sides where relevant:
  - compaction request shape
  - post-compaction history layout shape
- Adds/uses shared request-inspection helpers so assertions target
structured request content instead of ad-hoc JSON string parsing.
- Aligns snapshots/assertions to current behavior and leaves explicit
`TODO(ccunningham)` notes where behavior is known and intentionally
deferred.

## Not Included
- No runtime compaction logic changes.
- No model-visible context/state behavior changes.
2026-02-14 19:57:10 -08:00
Charley Cunningham
67e577da53 Handle model-switch base instructions after compaction (#11659)
Strip trailing <model_switch> during model-switch compaction request,
and append <model_switch> after model switch compaction
2026-02-13 19:02:53 -08:00
Ahmed Ibrahim
40de788c4d Clamp auto-compact limit to context window (#11516)
- Clamp auto-compaction to the minimum of configured limit and 90% of
context window
- Add an e2e compact test for clamped behavior
- Update remote compact tests to account for earlier auto-compaction in
setup turns
2026-02-11 17:41:08 -08:00
Ahmed Ibrahim
6938150c5e Pre-sampling compact with previous model context (#11504)
- Run pre-sampling compact through a single helper that builds
previous-model turn context and compacts before the follow-up request
when switching to a smaller context window.
- Keep compaction events on the parent turn id and add compact suite
coverage for switch-in-session and resume+switch flows.
2026-02-11 17:24:06 -08:00
Eric Traut
dd80e332c4 Removed the "remote_compaction" feature flag (#10840)
This feature is always on now
2026-02-05 23:54:57 -08:00
sayan-oai
fc05374344 chore: add phase to message responseitem (#10455)
### What

add wiring for `phase` field on `ResponseItem::Message` to lay
groundwork for differentiating model preambles and final messages.
currently optional.

follows pattern in #9698.

updated schemas with `just write-app-server-schema` so we can see type
changes.

### Tests
Updated existing tests for SSE parsing and hydrating from history
2026-02-03 02:52:26 +00:00
Ahmed Ibrahim
26590d7927 Ensure auto-compaction starts after turn started (#10129)
Start auto-compaction only after TurnStarted is emitted.\nAdd an
integration test for deterministic ordering.
2026-01-28 16:51:20 -08:00
Ahmed Ibrahim
b7edeee8ca compaction (#10034)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-01-28 11:36:11 -08:00
pakrym-oai
998e88b12a Use test_codex more (#9961)
Reduces boilderplate.
2026-01-26 18:52:10 -08:00
jif-oai
83775f4df1 feat: ephemeral threads (#9765)
Add ephemeral threads capabilities. Only exposed through the
`app-server` v2

The idea is to disable the rollout recorder for those threads.
2026-01-24 14:57:40 +00:00
Dylan Hurd
8b3521ee77 feat(core) update Personality on turn (#9644)
## Summary
Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
is explicitly unused by the clients so far, to simplify PRs - app-server
and tui implementations will be follow-ups.

## Testing
- [x] added integration tests
2026-01-22 12:04:23 -08:00
pakrym-oai
b511c38ddb Support end_turn flag (#9698)
Experimental flag that signals the end of the turn.
2026-01-22 17:27:48 +00:00
Ahmed Ibrahim
b11e96fb04 Act on reasoning-included per turn (#9402)
- Reset reasoning-included flag each turn and update compaction test
2026-01-19 11:23:25 -08:00
Ahmed Ibrahim
146d54cede Add collaboration_mode override to turns (#9408) 2026-01-16 21:51:25 -08:00
charley-oai
4a9c2bcc5a Add text element metadata to types (#9235)
Initial type tweaking PR to make the diff of
https://github.com/openai/codex/pull/9116 smaller

This should not change any behavior, just adds some fields to types
2026-01-14 16:41:50 -08:00
Ahmed Ibrahim
87f7226cca Assemble sandbox/approval/network prompts dynamically (#8961)
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.

Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
2026-01-12 23:12:59 +00:00
jif-oai
1aed01e99f renaming: task to turn (#8963) 2026-01-09 17:31:17 +00:00
Ahmed Ibrahim
0d3e673019 remove get_responses_requests and get_responses_request_bodies to use in-place matcher (#8858) 2026-01-08 13:57:48 -08:00
jif-oai
116059c3a0 chore: unify conversation with thread name (#8830)
Done and verified by Codex + refactor feature of RustRover
2026-01-07 17:04:53 +00:00
Anton Panasenko
807f8a43c2 feat: expose outputSchema to user_turn/turn_start app_server API (#8377)
What changed
- Added `outputSchema` support to the app-server APIs, mirroring `codex
exec --output-schema` behavior.
- V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
assistant message for that turn.
- V2 `turn/start` now accepts `outputSchema` and constrains the final
assistant message for that turn (explicitly per-turn only).

Core behavior
- `Op::UserTurn` already supported `final_output_json_schema`; now V1
`sendUserTurn` forwards `outputSchema` into that field.
- `Op::UserInput` now carries `final_output_json_schema` for per-turn
settings updates; core maps it into
`SessionSettingsUpdate.final_output_json_schema` so it applies to the
created turn context.
- V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
(it’s applied only for the current turn). Other overrides
(cwd/model/etc) keep their existing persistent behavior.

API / docs
- `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
`outputSchema`).
- `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
- `codex-rs/app-server/README.md`: document `outputSchema` for
`turn/start` and clarify it applies only to the current turn.
- `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
`sendUserTurn` and v2 `turn/start`.

Tests added/updated
- New app-server integration tests asserting `outputSchema` is forwarded
into outbound `/responses` requests as `text.format`:
  - `codex-rs/app-server/tests/suite/output_schema.rs`
  - `codex-rs/app-server/tests/suite/v2/output_schema.rs`
- Added per-turn semantics tests (schema does not leak to the next
turn):
  - `send_user_turn_output_schema_is_per_turn_v1`
  - `turn_start_output_schema_is_per_turn_v2`
- Added protocol wire-compat tests for the merged op:
  - serialize omits `final_output_json_schema` when `None`
  - deserialize works when field is missing
  - serialize includes `final_output_json_schema` when `Some(schema)`

Call site updates (high level)
- Updated all `Op::UserInput { .. }` constructions to include
`final_output_json_schema`:
  - `codex-rs/app-server/src/codex_message_processor.rs`
  - `codex-rs/core/src/codex_delegate.rs`
  - `codex-rs/mcp-server/src/codex_tool_runner.rs`
  - `codex-rs/tui/src/chatwidget.rs`
  - `codex-rs/tui2/src/chatwidget.rs`
  - plus impacted core tests.

Validation
- `just fmt`
- `cargo test -p codex-core`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-protocol`
- `cargo clippy --all-features --tests --profile dev --fix -- -D
warnings`
2026-01-05 10:27:00 -08:00
Ahmed Ibrahim
efd2d76484 Account for last token count on resume (#8677)
last token count in context manager is initialized to 0. Gets populated
only on events from server.

This PR populates it on resume so we can decide if we need to compact or
not.
2026-01-02 23:20:20 +00:00
Michael Bolin
3d4ced3ff5 chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
this PR updates all call non-test call sites to use it instead of
`Config::load_from_base_config_with_overrides()`.

This is important because `load_from_base_config_with_overrides()` uses
an empty `ConfigRequirements`, which is a reasonable default for testing
so the tests are not influenced by the settings on the host. This method
is now guarded by `#[cfg(test)]` so it cannot be used by business logic.

Because `ConfigBuilder::build()` is `async`, many of the test methods
had to be migrated to be `async`, as well. On the bright side, this made
it possible to eliminate a bunch of `block_on_future()` stuff.
2025-12-18 16:12:52 -08:00
xl-openai
da3869eeb6 Support SYSTEM skills. (#8220)
1. Remove PUBLIC skills and introduce SYSTEM skills embedded in the
binary and installed into $CODEX_HOME/skills/.system at startup.
2. Skills are now always enabled (feature flag removed).
3. Update skills/list to accept forceReload and plumb it through (not
used by clients yet).
2025-12-17 18:48:28 -08:00
jif-oai
c9f5b9a6df feat: do not compact on last user turn (#8060) 2025-12-16 15:36:33 +00:00
pakrym-oai
b3ddd50eee Remote compact for API-key users (#7835) 2025-12-12 10:05:02 -08:00
Ahmed Ibrahim
cb9a189857 make model optional in config (#7769)
- Make Config.model optional and centralize default-selection logic in
ModelsManager, including a default_model helper (with
codex-auto-balanced when available) so sessions now carry an explicit
chosen model separate from the base config.
- Resolve `model` once in `core` and `tui` from config. Then store the
state of it on other structs.
- Move refreshing models to be before resolving the default model
2025-12-10 11:19:00 -08:00
Ahmed Ibrahim
b519267d05 Account for encrypted reasoning for auto compaction (#7113)
- The total token used returned from the api doesn't account for the
reasoning items before the assistant message
- Account for those for auto compaction
- Add the encrypted reasoning effort in the common tests utils
- Add a test to make sure it works as expected
2025-11-22 03:06:45 +00:00
pakrym-oai
52d0ec4cd8 Delete tiktoken-rs (#7018) 2025-11-20 11:15:04 -08:00
jif-oai
c56d0c159b fix: local compaction (#6844) 2025-11-18 22:18:10 +00:00
jif-oai
838531d3e4 feat: remote compaction (#6795)
Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-11-18 16:51:16 +00:00
Ahmed Ibrahim
0b28e72b66 Improve compact (#6692)
This PR does the following:
- Add compact prefix to the summary
- Change the compaction prompt
- Allow multiple compaction for long running tasks
- Filter out summary messages on the following compaction

Considerations:
- Filtering out the summary message isn't the most clean
- Theoretically, we can end up in infinite compaction loop if the user
messages > compaction limit . However, that's not possible in today's
code because we have hard cap on user messages.
- We need to address having multiple user messages because it confuses
the model.

Testing:
- Making sure that after compact we always end up with one user message
(task) and one summary, even on multiple compaction.
2025-11-15 07:17:51 +00:00
jif-oai
2a417c47ac feat: proxy context left after compaction (#6597) 2025-11-13 16:54:03 +01:00
Andrew Nikolin
131c384361 Fix warning message phrasing (#6446)
Small fix for sentence phrasing in the warning message

Co-authored-by: AndrewNikolin <877163+AndrewNikolin@users.noreply.github.com>
2025-11-09 22:12:28 -08:00
Eric Traut
d7953aed74 Fixes intermittent test failures in CI (#6282)
I'm seeing two tests fail intermittently in CI. This PR attempts to
address (or at least mitigate) the flakiness.

* summarize_context_three_requests_and_instructions - The test snapshots
server.received_requests() immediately after observing TaskComplete.
Because the OpenAI /v1/responses call is streamed, the HTTP request can
still be draining when that event fires, so wiremock occasionally
reports only two captured requests. Fix is to wait for async activity to
complete.
* archive_conversation_moves_rollout_into_archived_directory - times out
on a slow CI run. Mitigation is to increase timeout value from 10s to
20s.
2025-11-05 13:12:25 -08:00
jif-oai
611e00c862 feat: compactor 2 (#6027)
Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-10-31 14:27:08 -07:00
Ahmed Ibrahim
c8ebb2a0dc Add warning on compact (#6052)
This PR introduces the ability for `core` to send `warnings` as it can
send `errors. It also sends a warning on compaction.

<img width="811" height="187" alt="image"
src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a"
/>
2025-10-31 13:27:33 -07:00
jif-oai
f4f9695978 feat: compaction prompt configurable (#5959)
```
 codex -c compact_prompt="Summarize in bullet points"
 ```
2025-10-30 14:24:24 +00:00
pakrym-oai
9c903c4716 Add ItemStarted/ItemCompleted events for UserInputItem (#5306)
Adds a new ItemStarted event and delivers UserMessage as the first item
type (more to come).


Renames `InputItem` to `UserInput` considering we're using the `Item`
suffix for actual items.
2025-10-20 13:34:44 -07:00
Ahmed Ibrahim
049a61bcfc Auto compact at ~90% (#5292)
Users now hit a window exceeded limit and they usually don't know what
to do. This starts auto compact at ~90% of the window.
2025-10-20 11:29:49 -07:00
jif-oai
687a13bbe5 feat: truncate on compact (#4942)
Truncate the message during compaction if it is just too large
Do it iteratively as tokenization is basically free on server-side
2025-10-08 18:11:08 +01:00
pakrym-oai
191d620707 Use response helpers when mounting SSE test responses (#4783)
## Summary
- replace manual wiremock SSE mounts in the compact suite with the
shared response helpers
- simplify the exec auth_env integration test by using the
mount_sse_once_match helper
- rely on mount_sse_sequence plus server request collection to replace
the bespoke SeqResponder utility in tests

## Testing
- just fmt

------
https://chatgpt.com/codex/tasks/task_i_68e2e238f2a88320a337f0b9e4098093
2025-10-05 21:58:16 +00:00
vishnu-oai
04c1782e52 OpenTelemetry events (#2103)
### Title

## otel

Codex can emit [OpenTelemetry](https://opentelemetry.io/) **log events**
that
describe each run: outbound API requests, streamed responses, user
input,
tool-approval decisions, and the result of every tool invocation. Export
is
**disabled by default** so local runs remain self-contained. Opt in by
adding an
`[otel]` table and choosing an exporter.

```toml
[otel]
environment = "staging"   # defaults to "dev"
exporter = "none"          # defaults to "none"; set to otlp-http or otlp-grpc to send events
log_user_prompt = false    # defaults to false; redact prompt text unless explicitly enabled
```

Codex tags every exported event with `service.name = "codex-cli"`, the
CLI
version, and an `env` attribute so downstream collectors can distinguish
dev/staging/prod traffic. Only telemetry produced inside the
`codex_otel`
crate—the events listed below—is forwarded to the exporter.

### Event catalog

Every event shares a common set of metadata fields: `event.timestamp`,
`conversation.id`, `app.version`, `auth_mode` (when available),
`user.account_id` (when available), `terminal.type`, `model`, and
`slug`.

With OTEL enabled Codex emits the following event types (in addition to
the
metadata above):

- `codex.api_request`
  - `cf_ray` (optional)
  - `attempt`
  - `duration_ms`
  - `http.response.status_code` (optional)
  - `error.message` (failures)
- `codex.sse_event`
  - `event.kind`
  - `duration_ms`
  - `error.message` (failures)
  - `input_token_count` (completion only)
  - `output_token_count` (completion only)
  - `cached_token_count` (completion only, optional)
  - `reasoning_token_count` (completion only, optional)
  - `tool_token_count` (completion only)
- `codex.user_prompt`
  - `prompt_length`
  - `prompt` (redacted unless `log_user_prompt = true`)
- `codex.tool_decision`
  - `tool_name`
  - `call_id`
- `decision` (`approved`, `approved_for_session`, `denied`, or `abort`)
  - `source` (`config` or `user`)
- `codex.tool_result`
  - `tool_name`
  - `call_id`
  - `arguments`
  - `duration_ms` (execution time for the tool)
  - `success` (`"true"` or `"false"`)
  - `output`

### Choosing an exporter

Set `otel.exporter` to control where events go:

- `none` – leaves instrumentation active but skips exporting. This is
the
  default.
- `otlp-http` – posts OTLP log records to an OTLP/HTTP collector.
Specify the
  endpoint, protocol, and headers your collector expects:

  ```toml
  [otel]
  exporter = { otlp-http = {
    endpoint = "https://otel.example.com/v1/logs",
    protocol = "binary",
    headers = { "x-otlp-api-key" = "${OTLP_TOKEN}" }
  }}
  ```

- `otlp-grpc` – streams OTLP log records over gRPC. Provide the endpoint
and any
  metadata headers:

  ```toml
  [otel]
  exporter = { otlp-grpc = {
    endpoint = "https://otel.example.com:4317",
    headers = { "x-otlp-meta" = "abc123" }
  }}
  ```

If the exporter is `none` nothing is written anywhere; otherwise you
must run or point to your
own collector. All exporters run on a background batch worker that is
flushed on
shutdown.

If you build Codex from source the OTEL crate is still behind an `otel`
feature
flag; the official prebuilt binaries ship with the feature enabled. When
the
feature is disabled the telemetry hooks become no-ops so the CLI
continues to
function without the extra dependencies.

---------

Co-authored-by: Anton Panasenko <apanasenko@openai.com>
2025-09-29 11:30:55 -07:00
Jeremy Rose
4a5f05c136 make tests pass cleanly in sandbox (#4067)
This changes the reqwest client used in tests to be sandbox-friendly,
and skips a bunch of other tests that don't work inside the
sandbox/without network.
2025-09-25 13:11:14 -07:00
jif-oai
b84a920067 chore: compact do not modify instructions (#4088)
Keep the developer instruction and insert the summarisation message as a
user message instead
2025-09-23 17:59:17 +01:00
pakrym-oai
14a115d488 Add non_sandbox_test helper (#3880)
Makes tests shorter
2025-09-22 14:50:41 +00:00
pakrym-oai
881c7978f1 Move responses mocking helpers to a shared lib (#3878)
These are generally useful
2025-09-18 17:53:14 -07:00
jif-oai
4a5d6f7c71 Make ESC button work when auto-compaction (#3857)
Only emit a task finished when the compaction comes from a `/compact`
2025-09-18 15:34:16 +00:00
dedrisian-oai
b2f6fc3b9a Fix flaky windows test (#3564)
There are exactly 4 types of flaky tests in Windows x86 right now:

1. `review_input_isolated_from_parent_history` => Times out waiting for
closing events
2. `review_does_not_emit_agent_message_on_structured_output` => Times
out waiting for closing events
3. `auto_compact_runs_after_token_limit_hit` => Times out waiting for
closing events
4. `auto_compact_runs_after_token_limit_hit` => Also has a problem where
auto compact should add a third request, but receives 4 requests.

1, 2, and 3 seem to be solved with increasing threads on windows runner
from 2 -> 4.

Don't know yet why # 4 is happening, but probably also because of
WireMock issues on windows causing races.
2025-09-14 23:20:25 +00:00
Ahmed Ibrahim
bbea6bbf7e Handle resuming/forking after compact (#3533)
We need to construct the history different when compact happens. For
this, we need to just consider the history after compact and convert
compact to a response item.

This needs to change and use `build_compact_history` when this #3446 is
merged.
2025-09-14 13:23:31 +00:00
jif-oai
ea225df22e feat: context compaction (#3446)
## Compact feature:
1. Stops the model when the context window become too large
2. Add a user turn, asking for the model to summarize
3. Build a bridge that contains all the previous user message + the
summary. Rendered from a template
4. Start sampling again from a clean conversation with only that bridge
2025-09-12 13:07:10 -07:00