Commit Graph

18 Commits

Author SHA1 Message Date
sayan-oai
060a320e7d fix: show user warning when using default fallback metadata (#11690)
### What
It's currently unclear when the harness falls back to the default,
generic `ModelInfo`. This happens when the `remote_models` feature is
disabled or the model is truly unknown, and can lead to bad performance
and issues in the harness.

Add a user-facing warning when this happens so they are aware when their
setup is broken.

### Tests
Added tests, tested locally.
2026-02-15 18:46:05 -08:00
Charley Cunningham
f24669d444 Persist complete TurnContextItem state via canonical conversion (#11656)
## Summary

This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.

Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.

No context injection/diff behavior changes are included in this PR.

## Why this change

The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.

This PR addresses both with a minimal, reviewable change.

## Changes

### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.

Files:
- `codex-rs/protocol/src/protocol.rs`

### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.

Files:
- `codex-rs/core/src/codex.rs`

### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
  - sampling request path
  - compaction path

Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`

### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
  - deserializing old payloads with no `network`
  - serializing when `network` is present

Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.
2026-02-12 17:22:44 -08:00
Michael Bolin
a4cc1a4a85 feat: introduce Permissions (#11633)
## Why
We currently carry multiple permission-related concepts directly on
`Config` for shell/unified-exec behavior (`approval_policy`,
`sandbox_policy`, `network`, `shell_environment_policy`,
`windows_sandbox_mode`).

Consolidating these into one in-memory struct makes permission handling
easier to reason about and sets up the next step: supporting named
permission profiles (`[permissions.PROFILE_NAME]`) without changing
behavior now.

This change is mostly mechanical: it updates existing callsites to go
through `config.permissions`, but it does not yet refactor those
callsites to take a single `Permissions` value in places where multiple
permission fields are still threaded separately.

This PR intentionally **does not** change the on-disk `config.toml`
format yet and keeps compatibility with legacy config keys.

## What Changed
- Introduced `Permissions` in `core/src/config/mod.rs`.
- Added `Config::permissions` and moved effective runtime permission
fields under it:
  - `approval_policy`
  - `sandbox_policy`
  - `network`
  - `shell_environment_policy`
  - `windows_sandbox_mode`
- Updated config loading/building so these effective values are still
derived from the same existing config inputs and constraints.
- Updated Windows sandbox helpers/resolution to read/write via
`permissions`.
- Threaded the new field through all permission consumers across core
runtime, app-server, CLI/exec, TUI, and sandbox summary code.
- Updated affected tests to reference `config.permissions.*`.
- Renamed the struct/field from
`EffectivePermissions`/`effective_permissions` to
`Permissions`/`permissions` and aligned variable naming accordingly.

## Verification
- `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
-p codex-exec -p codex-utils-sandbox-summary`
- `cargo build -p codex-core -p codex-tui -p codex-cli -p
codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
2026-02-12 14:42:54 -08:00
Owen Lin
efc8d45750 feat(app-server): experimental flag to persist extended history (#11227)
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).

### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.

Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.

### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).

This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)

In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit

For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.

And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.

#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
2026-02-12 19:34:22 +00:00
Michael Bolin
476c1a7160 Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
## Why

`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.

## Net Change

- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.

## Outcome

- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
2026-02-10 22:44:02 -08:00
Celia Chen
641d5268fa chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.

This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.

This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.

example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```

backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.
2026-02-11 03:56:01 +00:00
Dylan Hurd
8b3521ee77 feat(core) update Personality on turn (#9644)
## Summary
Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
is explicitly unused by the clients so far, to simplify PRs - app-server
and tui implementations will be follow-ups.

## Testing
- [x] added integration tests
2026-01-22 12:04:23 -08:00
charley-oai
fe641f759f Add collaboration_mode to TurnContextItem (#9583)
## Summary
- add optional `collaboration_mode` to `TurnContextItem` in rollouts
- persist the current collaboration mode when recording turn context
(sampling + compaction)

## Rationale
We already persist turn context data for resume logic. Capturing
collaboration mode in the rollout gives us the mode context for each
turn, enabling follow‑up work to diff mode instructions correctly on
resume.

## Changes
- protocol: add optional `collaboration_mode` field to `TurnContextItem`
- core: persist collaboration mode alongside other turn context settings
in rollouts
2026-01-21 14:14:21 -08:00
Dylan Hurd
675f165c56 fix(core) Preserve base_instructions in SessionMeta (#9427)
## Summary
This PR consolidates base_instructions onto SessionMeta /
SessionConfiguration, so we ensure `base_instructions` is set once per
session and should be (mostly) immutable, unless:
- overridden by config on resume / fork
- sub-agent tasks, like review or collab


In a future PR, we should convert all references to `base_instructions`
to consistently used the typed struct, so it's less likely that we put
other strings there. See #9423. However, this PR is already quite
complex, so I'm deferring that to a follow-up.

## Testing
- [x] Added a resume test to assert that instructions are preserved. In
particular, `resume_switches_models_preserves_base_instructions` fails
against main.

Existing test coverage thats assert base instructions are preserved
across multiple requests in a session:
- Manual compact keeps baseline instructions:
core/tests/suite/compact.rs:199
- Auto-compact keeps baseline instructions:
core/tests/suite/compact.rs:1142
- Prompt caching reuses the same instructions across two requests:
core/tests/suite/prompt_caching.rs:150 and
core/tests/suite/prompt_caching.rs:157
- Prompt caching with explicit expected string across two requests:
core/tests/suite/prompt_caching.rs:213 and
core/tests/suite/prompt_caching.rs:222
- Resume with model switch keeps original instructions:
core/tests/suite/resume.rs:136
- Compact/resume/fork uses request 0 instructions for later expected
payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
jif-oai
1aed01e99f renaming: task to turn (#8963) 2026-01-09 17:31:17 +00:00
jif-oai
116059c3a0 chore: unify conversation with thread name (#8830)
Done and verified by Codex + refactor feature of RustRover
2026-01-07 17:04:53 +00:00
Anton Panasenko
cbc5fb9acf chore: save more about turn context in rollout log file (#8458)
### Motivation
- Persist richer per-turn configuration in rollouts so resumed/forked
sessions and tooling can reason about the exact instruction inputs and
output constraints used for a turn.

### Description
- Extend `TurnContextItem` to include optional `base_instructions`,
`user_instructions`, and `developer_instructions`.
- Record the optional `final_output_json_schema` associated with a turn.
- Add an optional `truncation_policy` to `TurnContextItem` and populate
it when writing turn-context rollout items.
- Introduce a protocol-level `TruncationPolicy` representation and
convert from core truncation policy when recording.

### Testing
- `cargo test -p codex-protocol` (pass)
2025-12-22 19:51:07 -08:00
Michael Bolin
dc61fc5f50 feat: support allowed_sandbox_modes in requirements.toml (#8298)
This adds support for `allowed_sandbox_modes` in `requirements.toml` and
provides legacy support for constraining sandbox modes in
`managed_config.toml`. This is converted to `Constrained<SandboxPolicy>`
in `ConfigRequirements` and applied to `Config` such that constraints
are enforced throughout the harness.

Note that, because `managed_config.toml` is deprecated, we do not add
support for the new `external-sandbox` variant recently introduced in
https://github.com/openai/codex/pull/8290. As noted, that variant is not
supported in `config.toml` today, but can be configured programmatically
via app server.
2025-12-19 21:09:20 +00:00
Michael Bolin
0a7021de72 fix: enable resume_warning that was missing from mod.rs (#8333)
This test was introduced in https://github.com/openai/codex/pull/6507,
but was not included in `mod.rs`. It does not appear that it was getting
compiled?
2025-12-19 19:21:47 +00:00
Michael Bolin
3d4ced3ff5 chore: migrate from Config::load_from_base_config_with_overrides to ConfigBuilder (#8276)
https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
this PR updates all call non-test call sites to use it instead of
`Config::load_from_base_config_with_overrides()`.

This is important because `load_from_base_config_with_overrides()` uses
an empty `ConfigRequirements`, which is a reasonable default for testing
so the tests are not influenced by the settings on the host. This method
is now guarded by `#[cfg(test)]` so it cannot be used by business logic.

Because `ConfigBuilder::build()` is `async`, many of the test methods
had to be migrated to be `async`, as well. On the bright side, this made
it possible to eliminate a bunch of `block_on_future()` stuff.
2025-12-18 16:12:52 -08:00
gt-oai
9352c6b235 feat: Constrain values for approval_policy (#7778)
Constrain `approval_policy` through new `admin_policy` config.

This PR will:
1. Add a `admin_policy` section to config, with a single field (for now)
`allowed_approval_policies`. This list constrains the set of
user-settable `approval_policy`s.
2. Introduce a new `Constrained<T>` type, which combines a current value
and a validator function. The validator function ensures disallowed
values are not set.
3. Change the type of `approval_policy` on `Config` and
`SessionConfiguration` from `AskForApproval` to
`Constrained<AskForApproval>`. The validator function is set by the
values passed into `allowed_approval_policies`.
4. `GenericDisplayRow`: add a `disabled_reason: Option<String>`. When
set, it disables selection of the value and indicates as such in the
menu. This also makes it unselectable with arrow keys or numbers. This
is used in the `/approvals` menu.

Follow ups are:
1. Do the same thing to `sandbox_policy`.
2. Propagate the allowed set of values through app-server for the
extension (though already this should prevent app-server from setting
this values, it's just that we want to disable UI elements that are
unsettable).

Happy to split this PR up if you prefer, into the logical numbered areas
above. Especially if there are parts we want to gavel on separately
(e.g. admin_policy).

Disabled full access:
<img width="1680" height="380" alt="image"
src="https://github.com/user-attachments/assets/1fb61c8c-1fcb-4dc4-8355-2293edb52ba0"
/>

Disabled `--yolo` on startup:
<img width="749" height="76" alt="image"
src="https://github.com/user-attachments/assets/0a1211a0-6eb1-40d6-a1d7-439c41e94ddb"
/>

CODEX-4087
2025-12-17 16:19:27 +00:00
Ahmed Ibrahim
cb9a189857 make model optional in config (#7769)
- Make Config.model optional and centralize default-selection logic in
ModelsManager, including a default_model helper (with
codex-auto-balanced when available) so sessions now carry an explicit
chosen model separate from the base config.
- Resolve `model` once in `core` and `tui` from config. Then store the
state of it on other structs.
- Move refreshing models to be before resolving the default model
2025-12-10 11:19:00 -08:00
jif-oai
530db0ad73 feat: warning switch model on resume (#6507)
<img width="1259" height="40" alt="Screenshot 2025-11-11 at 14 01 41"
src="https://github.com/user-attachments/assets/48ead3d2-d89c-4d8a-a578-82d9663dbd88"
/>
2025-11-12 11:13:37 +00:00