We used to override truncation policy by comparing model info vs config
value in context manager. A better way to do it is to construct model
info using the config value
**Summary**
This PR makes “ApprovalDecision::AcceptForSession / don’t ask again this
session” actually work for `apply_patch` approvals by caching approvals
based on absolute file paths in codex-core, properly wiring it through
app-server v2, and exposing the choice in both TUI and TUI2.
- This brings `apply_patch` calls to be at feature-parity with general
shell commands, which also have a "Yes, and don't ask again" option.
- This also fixes VSCE's "Allow this session" button to actually work.
While we're at it, also split the app-server v2 protocol's
`ApprovalDecision` enum so execpolicy amendments are only available for
command execution approvals.
**Key changes**
- Core: per-session patch approval allowlist keyed by absolute file
paths
- Handles multi-file patches and renames/moves by recording both source
and destination paths for `Update { move_path: Some(...) }`.
- Extend the `Approvable` trait and `ApplyPatchRuntime` to work with
multiple keys, because an `apply_patch` tool call can modify multiple
files. For a request to be auto-approved, we will need to check that all
file paths have been approved previously.
- App-server v2: honor AcceptForSession for file changes
- File-change approval responses now map AcceptForSession to
ReviewDecision::ApprovedForSession (no longer downgraded to plain
Approved).
- Replace `ApprovalDecision` with two enums:
`CommandExecutionApprovalDecision` and `FileChangeApprovalDecision`
- TUI / TUI2: expose “don’t ask again for these files this session”
- Patch approval overlays now include a third option (“Yes, and don’t
ask again for these files this session (s)”).
- Snapshot updates for the approval modal.
**Tests added/updated**
- Core:
- Integration test that proves ApprovedForSession on a patch skips the
next patch prompt for the same file
- App-server:
- v2 integration test verifying
FileChangeApprovalDecision::AcceptForSession works properly
**User-visible behavior**
- When the user approves a patch “for session”, future patches touching
only those previously approved file(s) will no longer prompt gain during
that session (both via app-server v2 and TUI/TUI2).
**Manual testing**
Tested both TUI and TUI2 - see screenshots below.
TUI:
<img width="1082" height="355" alt="image"
src="https://github.com/user-attachments/assets/adcf45ad-d428-498d-92fc-1a0a420878d9"
/>
TUI2:
<img width="1089" height="438" alt="image"
src="https://github.com/user-attachments/assets/dd768b1a-2f5f-4bd6-98fd-e52c1d3abd9e"
/>
> // todo(aibrahim): why are we passing model here while it can change?
we update it on each turn with `.with_model`
> //TODO(aibrahim): run CI in release mode.
although it's good to have, release builds take double the time tests
take.
> // todo(aibrahim): make this async function
we figured out another way of doing this sync
- Merge ModelFamily into ModelInfo
- Remove logic for adding instructions to apply patch
- Add compaction limit and visible context window to `ModelInfo`
See https://rustsec.org/advisories/RUSTSEC-2026-0002.
Though our `ratatui` fork has a transitive dep on an older version of
the `lru` crate, so to get CI green ASAP, this PR also adds an exception
to `deny.toml` for `RUSTSEC-2026-0002`, but hopefully this will be
short-lived.
Fixes CodexExec to avoid missing early process exits by registering the
exit handler up front and deferring the error until after stdout is
drained, and adds a regression test that simulates a fast-exit child
while still producing output so hangs are caught.
Handle /review <instructions> in the TUI and TUI2 by routing it as a
custom review command instead of plain text, wiring command dispatch and
adding composer coverage so typing /review text starts a review directly
rather than posting a message. User impact: /review with arguments now
kicks off the review flow, previously it would just forward as a plain
command and not actually start a review.
Fixes apply.rs path parsing so
- quoted diff headers are tokenized and extracted correctly,
- /dev/null headers are ignored before prefix stripping to avoid bogus
dev/null paths, and
- git apply output paths are unescaped from C-style quoting.
**Why**
This prevents potentially missed staging and misclassified paths when
applying or reverting patches, which could lead to incorrect behavior
for repos with spaces or escaped characters in filenames.
**Impact**
I checked and this is only used in the cloud tasks support and `codex
apply <task_id>` flow.
Done to avoid spammy warnings to end up in the model context without
having to switch to nightly
```
Warning: can't set `imports_granularity = Item`, unstable features are only available in nightly channel.
```
Set login=false for the shell tool in the timing-based parallelism test
so it does not depend on slow user login shells, making the test
deterministic without user-facing changes. This prevents occasional
flakes when running locally.
With `config.toml`:
```
model = "gpt-5.1-codex"
```
(where `gpt-5.1-codex` has `show_in_picker: false` in
[`model_presets.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/src/models_manager/model_presets.rs);
this happens if the user hasn't used codex in a while so they didn't see
the popup before their model was changed to `show_in_picker: false`)
The upgrade picker used to not show (because `gpt-5.1-codex` was
filtered out of the model list in code). Now, the filtering is done
downstream in tui and app-server, so the model upgrade popup shows:
<img width="1503" height="227" alt="Screenshot 2026-01-06 at 5 04 37 PM"
src="https://github.com/user-attachments/assets/26144cc2-0b3f-4674-ac17-e476781ec548"
/>
Use the contents of the commit message from the commit associated with
the tag (that contains the version bump) as the release notes by writing
them to a file and then specifying the file as the `body_path` of
`softprops/action-gh-release@v2`.
Add `web_search_cached` feature to config. Enables `web_search` tool
with access only to cached/indexed results (see
[docs](https://platform.openai.com/docs/guides/tools-web-search#live-internet-access)).
This takes precedence over the existing `web_search_request`, which
continues to enable `web_search` over live results as it did before.
`web_search_cached` is disabled for review mode, as `web_search_request`
is.
Add `thread/rollback` to app-server to support IDEs undo-ing the last N
turns of a thread.
For context, an IDE partner will be supporting an "undo" capability
where the IDE (the app-server client) will be responsible for reverting
the local changes made during the last turn. To support this well, we
also need a way to drop the last turn (or more generally, the last N
turns) from the agent's context. This is what `thread/rollback` does.
**Core idea**: A Thread rollback is represented as a persisted event
message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
rewriting history. On resume, both the model's context (core replay) and
the UI turn list (app-server v2's thread history builder) apply these
markers so the pruned history is consistent across live conversations
and `thread/resume`.
Implementation notes:
- Rollback only affects agent context and appends to the rollout file;
clients are responsible for reverting files on disk.
- If a thread rollback is currently in progress, subsequent
`thread/rollback` calls are rejected.
- Because we use `CodexConversation::submit` and codex core tracks
active turns, returning an error on concurrent rollbacks is communicated
via an `EventMsg::Error` with a new variant
`CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
sends the BAD_REQUEST RPC response.
Tests cover thread rollbacks in both core and app-server, including when
`num_turns` > existing turns (which clears all turns).
**Note**: this explicitly does **not** behave like `/undo` which we just
removed from the CLI, which does the opposite of what `thread/rollback`
does. `/undo` reverts local changes via ghost commits/snapshots and does
not modify the agent's context / conversation history.
### Motivation
- Fix a visual bug where transcript text could bleed through the
on-screen copy "pill" overlay.
- Ensure the copy affordance fully covers the underlying buffer so the
pill background is solid and consistent with styling.
- Document the approach in-code to make the background-clearing
rationale explicit.
### Description
- Clear the pill area before drawing by iterating `Rect::positions()`
and calling `cell.set_symbol(" ")` and `cell.set_style(base_style)` in
`render_copy_pill` in `transcript_copy_ui.rs`.
- Added an explanatory comment for why the pill background is explicitly
cleared.
- Added a unit test `copy_pill_clears_background` and committed the
corresponding snapshot file to validate the rendering behavior.
### Testing
- Ran `just fmt` (formatting completed; non-blocking environment warning
may appear).
- Ran `just fix -p codex-tui2` to apply lints/fixes (completed).
- Ran `cargo test -p codex-tui2` and all tests passed (snapshot updated
and tests succeeded).
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_695c9b23e9b8832997d5a457c4d83410)
Added an agent control plane that lets sessions spawn or message other
conversations via `AgentControl`.
`AgentBus` (core/src/agent/bus.rs) keeps track of the last known status
of a conversation.
ConversationManager now holds shared state behind an Arc so AgentControl
keeps only a weak back-reference, the goal is just to avoid explicit
cycle reference.
Follow-ups:
* Build a small tool in the TUI to be able to see every agent and send
manual message to each of them
* Handle approval requests in this TUI
* Add tools to spawn/communicate between agents (see related design)
* Define agent types
Force an announcement tooltip in the CLI. This query the gh repo on this
[file](https://raw.githubusercontent.com/openai/codex/main/announcement_tip.toml)
which contains announcements in TOML looking like this:
```
# Example announcement tips for Codex TUI.
# Each [[announcements]] entry is evaluated in order; the last matching one is shown.
# Dates are UTC, formatted as YYYY-MM-DD. The from_date is inclusive and the to_date is exclusive.
# version_regex matches against the CLI version (env!("CARGO_PKG_VERSION")); omit to apply to all versions.
# target_app specify which app should display the announcement (cli, vsce, ...).
[[announcements]]
content = "Welcome to Codex! Check out the new onboarding flow."
from_date = "2024-10-01"
to_date = "2024-10-15"
version_regex = "^0\\.0\\.0$"
target_app = "cli"
```
To make this efficient, the announcement is queried on a best effort
basis at the launch of the CLI (no refresh made after this).
This is done in an async way and we display the announcement (with 100%
probability) iff the announcement is available, the cache is correctly
warmed and there is a matching announcement (matching is recomputed for
each new session).
Fixes ReadinessFlag::subscribe to avoid handing out token 0 or duplicate
tokens on i32 wrap-around, adds regression tests, and prevents readiness
gates from getting stuck waiting on an unmarkable or mis-authorized
token.
Background
Streaming assistant prose in tui2 was being rendered with viewport-width
wrapping during streaming, then stored in history cells as already split
`Line`s. Those width-derived breaks became indistinguishable from hard
newlines, so the transcript could not "un-split" on resize. This also
degraded copy/paste, since soft wraps looked like hard breaks.
What changed
- Introduce width-agnostic `MarkdownLogicalLine` output in
`tui2/src/markdown_render.rs`, preserving markdown wrap semantics:
initial/subsequent indents, per-line style, and a preformatted flag.
- Update the streaming collector (`tui2/src/markdown_stream.rs`) to emit
logical lines (newline-gated) and remove any captured viewport width.
- Update streaming orchestration (`tui2/src/streaming/*`) to queue and
emit logical lines, producing `AgentMessageCell::new_logical(...)`.
- Make `AgentMessageCell` store logical lines and wrap at render time in
`HistoryCell::transcript_lines_with_joiners(width)`, emitting joiners so
copy/paste can join soft-wrap continuations correctly.
Overlay deferral
When an overlay is active, defer *cells* (not rendered `Vec<Line>`) and
render them at overlay close time. This avoids baking width-derived
wraps based on a stale width.
Tests + docs
- Add resize/reflow regression tests + snapshots for streamed agent
output.
- Expand module/API docs for the new logical-line streaming pipeline and
clarify joiner semantics.
- Align scrollback-related docs/comments with current tui2 behavior
(main draw loop does not flush queued "history lines" to the terminal).
More details
See `codex-rs/tui2/docs/streaming_wrapping_design.md` for the full
problem statement and solution approach, and
`codex-rs/tui2/docs/tui_viewport_and_history.md` for viewport vs printed
output behavior.
Trim whitespace when validating '*** Begin Patch'/'*** End Patch'
markers in codex-apply-patch so padded marker lines parse as intended,
and add regression coverage (unit + fixture scenario); this avoids
apply_patch failures when models include extra spacing. Tested with
cargo test -p codex-apply-patch.
**Motivation**
- Bring `codex exec resume` to parity with top‑level flags so global
options (git check bypass, json, model, sandbox toggles) work after the
subcommand, including when outside a git repo.
**Description**
- Exec CLI: mark `--skip-git-repo-check`, `--json`, `--model`,
`--full-auto`, and `--dangerously-bypass-approvals-and-sandbox` as
global so they’re accepted after `resume`.
- Tests: add `exec_resume_accepts_global_flags_after_subcommand` to
verify those flags work when passed after `resume`.
**Testing**
- `just fmt`
- `cargo test -p codex-exec` (pass; ran with elevated perms to allow
network/port binds)
- Manual: exercised `codex exec resume` with global flags after the
subcommand to confirm behavior.
We've seen reports that people who try to login on a remote/headless
machine will open the login link on their own machine and got errors.
Update the instructions to ask those users to use `codex login
--device-auth` instead.
<img width="1434" height="938" alt="CleanShot 2026-01-05 at 11 35 02@2x"
src="https://github.com/user-attachments/assets/2b209953-6a42-4eb0-8b55-bb0733f2e373"
/>
Adds an optional `justification` parameter to the `prefix_rule()`
execpolicy DSL so policy authors can attach human-readable rationale to
a rule. That justification is propagated through parsing/matching and
can be surfaced to the model (or approval UI) when a command is blocked
or requires approval.
When a command is rejected (or gated behind approval) due to policy, a
generic message makes it hard for the model/user to understand what went
wrong and what to do instead. Allowing policy authors to supply a short
justification improves debuggability and helps guide the model toward
compliant alternatives.
Example:
```python
prefix_rule(
pattern = ["git", "push"],
decision = "forbidden",
justification = "pushing is blocked in this repo",
)
```
If Codex tried to run `git push origin main`, now the failure would
include:
```
`git push origin main` rejected: pushing is blocked in this repo
```
whereas previously, all it was told was:
```
execpolicy forbids this command
```
The elevated sandbox creates two new Windows users - CodexSandboxOffline
and CodexSandboxOnline. This is necessary, so this PR does all that it
can to "hide" those users. It uses the registry plus directory flags (on
their home directories) to get them to show up as little as possible.
## Summary
When using device-code login with a custom issuer
(`--experimental_issuer`), Codex correctly uses that issuer for the auth
flow — but the **terminal prompt still told users to open the default
OpenAI device URL** (`https://auth.openai.com/codex/device`). That’s
confusing and can send users to the **wrong domain** (especially for
enterprise/staging issuers). This PR updates the prompt (and related
URLs) to consistently use the configured issuer. 🎯
---
## 🔧 What changed
* 🔗 **Device auth prompt link** now uses the configured issuer (instead
of a hard-coded OpenAI URL)
* 🧭 **Redirect callback URL** is derived from the same issuer for
consistency
* 🧼 Minor cleanup: normalize the issuer base URL once and reuse it
(avoids formatting quirks like trailing `/`)
---
## 🧪 Repro + Before/After
### ▶️ Command
```bash
codex login --device-auth --experimental_issuer https://auth.example.com
```
### ❌ Before (wrong link shown)
```text
1. Open this link in your browser and sign in to your account
https://auth.openai.com/codex/device
```
### ✅ After (correct link shown)
```text
1. Open this link in your browser and sign in to your account
https://auth.example.com/codex/device
```
Full example output (same as before, but with the correct URL):
```text
Welcome to Codex [v0.72.0]
OpenAI's command-line coding agent
Follow these steps to sign in with ChatGPT using device code authorization:
1. Open this link in your browser and sign in to your account
https://auth.example.com/codex/device
2. Enter this one-time code (expires in 15 minutes)
BUT6-0M8K4
Device codes are a common phishing target. Never share this code.
```
---
## ✅ Test plan
* 🟦 `codex login --device-auth` (default issuer): output remains
unchanged
* 🟩 `codex login --device-auth --experimental_issuer
https://auth.example.com`:
* prompt link points to the issuer ✅
* callback URL is derived from the same issuer ✅
* no double slashes / mismatched domains ✅
Co-authored-by: Eric Traut <etraut@openai.com>
This change improves the skills render section
- Separate the skills list from usage rules with clear subheadings
- Define skill more clearly upfront
- Remove confusing trigger/discovery wording and make reference-following guidance more actionable
Never treat .codex or .codex/.sandbox as a workspace root.
Handle write permissions to .codex/.sandbox in a single method so that
the sandbox setup/runner can write logs and other setup files to that
directory.
What changed
- Added `outputSchema` support to the app-server APIs, mirroring `codex
exec --output-schema` behavior.
- V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
assistant message for that turn.
- V2 `turn/start` now accepts `outputSchema` and constrains the final
assistant message for that turn (explicitly per-turn only).
Core behavior
- `Op::UserTurn` already supported `final_output_json_schema`; now V1
`sendUserTurn` forwards `outputSchema` into that field.
- `Op::UserInput` now carries `final_output_json_schema` for per-turn
settings updates; core maps it into
`SessionSettingsUpdate.final_output_json_schema` so it applies to the
created turn context.
- V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
(it’s applied only for the current turn). Other overrides
(cwd/model/etc) keep their existing persistent behavior.
API / docs
- `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
`outputSchema`).
- `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
- `codex-rs/app-server/README.md`: document `outputSchema` for
`turn/start` and clarify it applies only to the current turn.
- `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
`sendUserTurn` and v2 `turn/start`.
Tests added/updated
- New app-server integration tests asserting `outputSchema` is forwarded
into outbound `/responses` requests as `text.format`:
- `codex-rs/app-server/tests/suite/output_schema.rs`
- `codex-rs/app-server/tests/suite/v2/output_schema.rs`
- Added per-turn semantics tests (schema does not leak to the next
turn):
- `send_user_turn_output_schema_is_per_turn_v1`
- `turn_start_output_schema_is_per_turn_v2`
- Added protocol wire-compat tests for the merged op:
- serialize omits `final_output_json_schema` when `None`
- deserialize works when field is missing
- serialize includes `final_output_json_schema` when `Some(schema)`
Call site updates (high level)
- Updated all `Op::UserInput { .. }` constructions to include
`final_output_json_schema`:
- `codex-rs/app-server/src/codex_message_processor.rs`
- `codex-rs/core/src/codex_delegate.rs`
- `codex-rs/mcp-server/src/codex_tool_runner.rs`
- `codex-rs/tui/src/chatwidget.rs`
- `codex-rs/tui2/src/chatwidget.rs`
- plus impacted core tests.
Validation
- `just fmt`
- `cargo test -p codex-core`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-protocol`
- `cargo clippy --all-features --tests --profile dev --fix -- -D
warnings`
Load managed requirements from MDM key `requirements_toml_base64`.
Tested on my Mac (using `defaults` to set the preference, though this
would be set by MDM in production):
```
➜ codex git:(gt/mdm-requirements) defaults read com.openai.codex requirements_toml_base64 | base64 -d
allowed_approval_policies = ["on-request"]
➜ codex git:(gt/mdm-requirements) just c --yolo
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.26s
Running `target/debug/codex --yolo`
Error loading configuration: value `Never` is not in the allowed set [OnRequest]
error: Recipe `codex` failed on line 11 with exit code 1
➜ codex git:(gt/mdm-requirements) defaults delete com.openai.codex requirements_toml_base64
➜ codex git:(gt/mdm-requirements) just c --yolo
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.24s
Running `target/debug/codex --yolo`
╭──────────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0) │
│ │
│ model: codex-auto-balanced medium /model to change │
│ directory: ~/code/codex/codex-rs │
╰──────────────────────────────────────────────────────────╯
Tip: Start a fresh idea with /new; the previous session stays in history.
```
Context
- This code parses Server-Sent Events (SSE) from the legacy Chat
Completions streaming API (wire_api = "chat").
- The upstream protocol terminates a stream with a final sentinel event:
data: [DONE].
- Some of our test stubs/helpers historically end the stream with data:
DONE (no brackets).
How this was found
- GitHub Actions on Windows failed in codex-app-server integration tests
with wiremock verification errors (expected multiple POSTs, got 1).
Diagnosis
- The job logs included: codex_api::sse::chat: Failed to parse
ChatCompletions SSE event ... data: DONE.
- eventsource_stream surfaces the sentinel as a normal SSE event; it
does not automatically close the stream.
- The parser previously attempted to JSON-decode every data: payload.
The sentinel is not JSON, so we logged and skipped it, then continued
polling.
- On servers that keep the HTTP connection open after emitting the
sentinel (notably wiremock on Windows), skipping the sentinel meant we
never emitted ResponseEvent::Completed.
- Higher layers wait for completion before progressing (emitting
approval requests and issuing follow-up model calls), so the test never
reached the subsequent requests and wiremock panicked when its
expected-call count was not met.
Fix
- Treat both data: [DONE] and data: DONE as explicit end-of-stream
sentinels.
- When a sentinel is seen, flush any pending assistant/reasoning items
and emit ResponseEvent::Completed once.
Tests
- Add a regression unit test asserting we complete on the sentinel even
if the underlying connection is not closed.
## Summary
- Add a transcript scrollbar in `tui2` using `tui-scrollbar`.
- Reserve 2 columns on the right (1 empty gap + 1 scrollbar track) and
plumb the reduced width through wrapping/selection/copy so rendering and
interactions match.
- Auto-hide the scrollbar when the transcript is pinned to the bottom
(columns remain reserved).
- Add mouse click/drag support for the scrollbar, with pointer-capture
so drags don’t fall through into transcript selection.
- Skip scrollbar hit-testing when auto-hidden to avoid an invisible
interactive region.
## Notes
- Styling is theme-aware: in light themes the thumb is darker than the
track; in dark themes it reads as an “indented” element without going
full-white.
- Pre-Ratatui 0.30 (ratatui-core split) requires a small scratch-buffer
bridge; this should simplify once we move to Ratatui 0.30.
## Testing
- `just fmt`
- `just fix -p codex-tui2 --allow-no-vcs`
- `cargo test -p codex-tui2`
The Responses API requires that all tool names conform to
'^[a-zA-Z0-9_-]+$'. This PR replaces all non-conforming characters with
`_` to ensure that they can be used.
Fixes#8174
Fixes /review base-branch prompt resolution to use the session/turn cwd
(respecting runtime cwd overrides) so merge-base/diff guidance is
computed from the intended repo; adds a regression test for cwd
overrides; tested with cargo test -p codex-core --test all
review_uses_overridden_cwd_for_base_branch_merge_base.
Fix this: https://github.com/openai/codex/issues/8479
The issue is that chat completion API expect all the tool calls in a
single assistant message and then all the tool call output in a single
response message
Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.17 to
0.1.18.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="60b083b630"><code>60b083b</code></a>
chore: prepare tokio-stream 0.1.18 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7830">#7830</a>)</li>
<li><a
href="9cc02cc88d"><code>9cc02cc</code></a>
chore: prepare tokio-util 0.7.18 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7829">#7829</a>)</li>
<li><a
href="d2799d791b"><code>d2799d7</code></a>
task: improve the docs of <code>Builder::spawn_local</code> (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7828">#7828</a>)</li>
<li><a
href="4d4870f291"><code>4d4870f</code></a>
task: doc that task drops before JoinHandle completion (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7825">#7825</a>)</li>
<li><a
href="fdb150901a"><code>fdb1509</code></a>
fs: check for io-uring opcode support (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7815">#7815</a>)</li>
<li><a
href="426a562780"><code>426a562</code></a>
rt: remove <code>allow(dead_code)</code> after <code>JoinSet</code>
stabilization (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7826">#7826</a>)</li>
<li><a
href="e3b89bbefa"><code>e3b89bb</code></a>
chore: prepare Tokio v1.49.0 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7824">#7824</a>)</li>
<li><a
href="4f577b84e9"><code>4f577b8</code></a>
Merge 'tokio-1.47.3' into 'master'</li>
<li><a
href="f320197693"><code>f320197</code></a>
chore: prepare Tokio v1.47.3 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7823">#7823</a>)</li>
<li><a
href="ea6b144cd1"><code>ea6b144</code></a>
ci: freeze rustc on nightly-2025-01-25 in <code>netlify.toml</code> (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7652">#7652</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tokio-rs/tokio/compare/tokio-stream-0.1.17...tokio-stream-0.1.18">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [clap_complete](https://github.com/clap-rs/clap) from 4.5.57 to
4.5.64.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e115243369"><code>e115243</code></a>
chore: Release</li>
<li><a
href="d4c34fa2b8"><code>d4c34fa</code></a>
docs: Update changelog</li>
<li><a
href="ab4f438860"><code>ab4f438</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/6203">#6203</a>
from jpgrayson/fix/zsh-space-after-dir-completions</li>
<li><a
href="5571b83c8a"><code>5571b83</code></a>
fix(complete): Trailing space after zsh directory completions</li>
<li><a
href="06a2311586"><code>06a2311</code></a>
chore: Release</li>
<li><a
href="bed131f7ae"><code>bed131f</code></a>
docs: Update changelog</li>
<li><a
href="a61c53e6dd"><code>a61c53e</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/6202">#6202</a>
from iepathos/6201-symlink-path-completions</li>
<li><a
href="c3b440570e"><code>c3b4405</code></a>
fix(complete): Follow symlinks in path completion</li>
<li><a
href="a794395340"><code>a794395</code></a>
test(complete): Add symlink path completion tests</li>
<li><a
href="ca0aeba31f"><code>ca0aeba</code></a>
chore: Release</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.57...clap_complete-v4.5.64">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Clicking the transcript copy pill or pressing the copy shortcut now
copies the selected transcript text and clears the highlight.
Show transient footer feedback ("Copied"/"Copy failed") after a copy
attempt, with logic in transcript_copy_action to keep app.rs smaller and
closer to tui for long-term diffs.
Update footer snapshots and add tiny unit tests for feedback expiry.
https://github.com/user-attachments/assets/c36c8163-11c5-476b-b388-e6fbe0ff6034
When the selection ends on the last visible row, the copy affordance had
no space below and never rendered. Fall back to placing it above (or on
the same row for 1-row viewports) and add a regression test.
Mouse/trackpad scrolling in tui2 applies deltas in visual lines, but the
transcript scroll state was anchored only to CellLine entries.
When a 1-line scroll landed on the synthetic inter-cell Spacer row
(inserted between non-continuation cells),
`TranscriptScroll::anchor_for` would skip that row and snap back to the
adjacent cell line. That makes the resolved top offset unchanged for
small/coalesced scroll deltas, so scrolling appears to get stuck right
before certain cells (commonly user prompts and command output cells).
Fix this by making spacer rows a first-class scroll anchor:
- Add `TranscriptScroll::ScrolledSpacerBeforeCell` and resolve it back
to the spacer row index when present.
- Update `anchor_for`/`scrolled_by` to preserve spacers instead of
skipping them.
- Treat the new variant as "already anchored" in
`lock_transcript_scroll_to_current_view`.
Tests:
- cargo test -p codex-tui2
## Summary
Forked repositories inherit GitHub Actions workflows including scheduled
ones. This causes:
1. **Wasted Actions minutes** - Scheduled workflows run on forks even
though they will fail
2. **Failed runs** - Workflows requiring `CODEX_OPENAI_API_KEY` fail
immediately on forks
3. **Noise** - Fork owners see failed workflow runs they didn't trigger
This PR adds `if: github.repository == 'openai/codex'` guards to
workflows that should only run on the upstream repository.
### Affected workflows
| Workflow | Trigger | Issue |
|----------|---------|-------|
| `rust-release-prepare` | `schedule: */4 hours` | Runs 6x/day on every
fork |
| `close-stale-contributor-prs` | `schedule: daily` | Runs daily on
every fork |
| `issue-deduplicator` | `issues: opened` | Requires
`CODEX_OPENAI_API_KEY` |
| `issue-labeler` | `issues: opened` | Requires `CODEX_OPENAI_API_KEY` |
### Note
`cla.yml` already has this guard (`github.repository_owner ==
'openai'`), so it was not modified.
## Test plan
- [ ] Verify workflows still run correctly on `openai/codex`
- [ ] Verify workflows are skipped on forks (can check via Actions tab
on any fork)
The transcript viewport draws every frame. Ratatui's Line::render_ref
does grapheme segmentation and span layout, so repeated redraws can burn
CPU during streaming even when the visible transcript hasn't changed.
Introduce TranscriptViewCache to reduce per-frame work:
- WrappedTranscriptCache memoizes flattened+wrapped transcript lines per
width, appends incrementally as new cells arrive, and rebuilds on width
change, truncation (backtrack), or transcript replacement.
- TranscriptRasterCache caches rasterized rows (Vec<Cell>) per line
index and user-row styling; redraws copy cells instead of rerendering
spans.
The caches are width-scoped and store base transcript content only;
selection highlighting and copy affordances are applied after drawing.
User rows include the row-wide base style in the cached raster.
Refactor transcript_render to expose append_wrapped_transcript_cell for
incremental building and add a test that incremental append matches the
full build.
Add docs/tui2/performance-testing.md as a playbook for macOS sample
profiles and hotspot greps.
Expand transcript_view_cache tests to cover rebuild conditions, raster
equivalence vs direct rendering, user-row caching, and eviction.
Test: cargo test -p codex-tui2
last token count in context manager is initialized to 0. Gets populated
only on events from server.
This PR populates it on resume so we can decide if we need to compact or
not.
This reduces unnecessary frame scheduling in codex-tui2.
Changes:
- Gate redraw scheduling for streaming deltas when nothing visible
changes.
- Avoid a redraw feedback loop from footer transcript UI state updates.
Why:
- Streaming deltas can arrive at very high frequency; redrawing on every
delta can drive a near-constant render loop.
- BottomPane was requesting another frame after every Draw even when the
derived transcript UI state was unchanged.
Testing:
- cargo test -p codex-tui2
Manual sampling:
- sample "$(pgrep -n codex-tui2)" 3 -file
/tmp/tui2.idle.after.sample.txt
- sample "$(pgrep -n codex-tui2)" 3 -file
/tmp/tui2.streaming.after.sample.txt
This eliminates redundant user documentation and allows us to focus our
documentation investments.
I left tombstone files for most of the existing ".md" docs files to
avoid broken links. These now contain brief links to the developers docs
site.
This is more future-proof if we ever decide to add additional Sandbox
Users for new functionality
This also moves some more user-related code into a new file for code
cleanliness
Bumps [regex-lite](https://github.com/rust-lang/regex) from 0.1.7 to
0.1.8.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-lang/regex/blob/master/CHANGELOG.md">regex-lite's
changelog</a>.</em></p>
<blockquote>
<h1>0.1.80</h1>
<ul>
<li>[PR <a
href="https://redirect.github.com/rust-lang/regex/issues/292">#292</a>](<a
href="https://redirect.github.com/rust-lang/regex/pull/292">rust-lang/regex#292</a>):
Fixes bug <a
href="https://redirect.github.com/rust-lang/regex/issues/291">#291</a>,
which was introduced by PR <a
href="https://redirect.github.com/rust-lang/regex/issues/290">#290</a>.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="140f8949da"><code>140f894</code></a>
regex-lite-0.1.8</li>
<li><a
href="27d6d65263"><code>27d6d65</code></a>
1.12.1</li>
<li><a
href="85398ad500"><code>85398ad</code></a>
changelog: 1.12.1</li>
<li><a
href="764efbd305"><code>764efbd</code></a>
api: tweak the lifetime of <code>Captures::get_match</code></li>
<li><a
href="ee6aa55e01"><code>ee6aa55</code></a>
rure-0.2.4</li>
<li><a
href="42076c6bca"><code>42076c6</code></a>
1.12.0</li>
<li><a
href="aef2153e31"><code>aef2153</code></a>
deps: bump to regex-automata 0.4.12</li>
<li><a
href="459dbbeaa9"><code>459dbbe</code></a>
regex-automata-0.4.12</li>
<li><a
href="610bf2d76e"><code>610bf2d</code></a>
regex-syntax-0.8.7</li>
<li><a
href="7dbb384dd0"><code>7dbb384</code></a>
changelog: 1.12.0</li>
<li>Additional commits viewable in <a
href="https://github.com/rust-lang/regex/compare/regex-lite-0.1.7...regex-lite-0.1.8">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [toml_edit](https://github.com/toml-rs/toml) from 0.23.7 to
0.24.0+spec-1.1.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2e09401567"><code>2e09401</code></a>
chore: Release</li>
<li><a
href="e32c7a2f9b"><code>e32c7a2</code></a>
chore: Release</li>
<li><a
href="df1c3286de"><code>df1c328</code></a>
docs: Update changelog</li>
<li><a
href="b826cf4914"><code>b826cf4</code></a>
feat(edit)!: Allow <code>set_position(None)</code> (<a
href="https://redirect.github.com/toml-rs/toml/issues/1080">#1080</a>)</li>
<li><a
href="8043f20af7"><code>8043f20</code></a>
feat(edit)!: Allow <code>set_position(None)</code></li>
<li><a
href="a02c0db59f"><code>a02c0db</code></a>
feat: Support TOML 1.1 (<a
href="https://redirect.github.com/toml-rs/toml/issues/1079">#1079</a>)</li>
<li><a
href="5cfb838b15"><code>5cfb838</code></a>
feat(edit): Support TOML 1.1</li>
<li><a
href="1eb4d606d3"><code>1eb4d60</code></a>
feat(toml): Support TOML 1.1</li>
<li><a
href="695d7883d8"><code>695d788</code></a>
feat(edit)!: Multi-line inline tables with trailing commas</li>
<li><a
href="cc4f7acd94"><code>cc4f7ac</code></a>
feat(toml): Multi-line inline tables with trailing commas</li>
<li>Additional commits viewable in <a
href="https://github.com/toml-rs/toml/compare/v0.23.7...v0.24.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
I attempted to build codex on LoongArch Linux and encountered
compilation errors.
After investigation, the errors were traced to certain `windows-sys`
features
which rely on platform-specific cfgs that only support x86 and aarch64.
With this change applied, the project now builds and runs successfully
on my
platform:
- OS: AOSC OS (loongarch64)
- Kernel: Linux 6.17
- CPU: Loongson-3A6000
Please let me know if this approach is reasonable, or if there is a
better way
to support additional platforms.
### What
Builds on #8293.
Add `additional_details`, which contains the upstream error message, to
relevant structures used to pass along retryable `StreamError`s.
Uses the new TUI status indicator's `details` field (shows under the
status header) to display the `additional_details` error to the user on
retryable `Reconnecting...` errors. This adds clarity for users for
retryable errors.
Will make corresponding change to VSCode extension to show
`additional_details` as expandable from the `Reconnecting...` cell.
Examples:
<img width="1012" height="326" alt="image"
src="https://github.com/user-attachments/assets/f35e7e6a-8f5e-4a2f-a764-358101776996"
/>
<img width="1526" height="358" alt="image"
src="https://github.com/user-attachments/assets/0029cbc0-f062-4233-8650-cc216c7808f0"
/>
This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.
As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.
See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.
Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.
My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
Clamp frame draw notifications in the `FrameRequester` scheduler so we
don't redraw more frequently than a user can perceive.
This applies to both `codex-tui` and `codex-tui2`, and keeps the
draw/dispatch loops simple by centralizing the rate limiting in a small
helper module.
- Add `FrameRateLimiter` (pure, unit-tested) to clamp draw deadlines
- Apply the limiter in the scheduler before emitting `TuiEvent::Draw`
- Use immediate redraw requests for scroll paths (scheduler now
coalesces + clamps)
- Add scheduler tests covering immediate/delayed interactions
This isn't very useful parameter.
logic:
```
if model puts `**` in their reasoning, trim it and visualize the header.
if couldn't trim: don't render
if model doesn't support: don't render
```
We can simplify to:
```
if could trim, visualize header.
if not, don't render
```
I am trying to support building with [Buck2](https://buck2.build), which
reports which files have changed between invocations of `buck2 test` and
`tmp_delete_example.txt` came up. This turned out to be the reason.
When rg download fails during npm package staging, log the
target/platform/url and preserve the original exception as the cause.
Emit GitHub Actions log groups and error annotations so the failure is
easier to spot.
Document why a urlopen timeout is set (the default can hang
indefinitely).
This is to make failures in the specific build step easier to understand
/ work out what's failing rather than having a big wall of text (or at
least having an obvious part of it that helps narrow that wall)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Support multi-click transcript selection using transcript/viewport
coordinates
(wrapped visual line index + content column), not terminal buffer
positions.
Gestures:
- double click: select word-ish token under cursor
- triple click: select entire wrapped line
- quad click: select paragraph (contiguous non-empty wrapped lines)
- quint+ click: select the entire history cell (all wrapped lines
belonging to a
single `HistoryCell`, including blank lines inside the cell)
Selection expansion rebuilds the wrapped transcript view from
`HistoryCell::display_lines(width)` so boundaries match on-screen
wrapping during
scroll/resize/streaming reflow. Click grouping is resilient to minor
drag jitter
(some terminals emit tiny Drag events during clicks) and becomes more
tolerant as
the sequence progresses so quad/quint clicks are practical.
Tests cover expansion (word/line/paragraph/cell), sequence resets
(timing, motion,
line changes, real drags), drag jitter, and behavior on spacer lines
between
history cells (paragraph/cell selection prefers the cell above).
Avoid distracting 1-cell highlights on simple click by tracking an
anchor on mouse down and only creating a visible selection once the
mouse is dragged (selection head set).
When dragging while following the bottom during streaming, request a
scroll lock so the viewport stops moving under the active selection.
Move selection state transitions into transcript_selection helpers
(returning change/lock outcomes for the caller) and add unit tests for
the state machine.
### Motivation
- Persist richer per-turn configuration in rollouts so resumed/forked
sessions and tooling can reason about the exact instruction inputs and
output constraints used for a turn.
### Description
- Extend `TurnContextItem` to include optional `base_instructions`,
`user_instructions`, and `developer_instructions`.
- Record the optional `final_output_json_schema` associated with a turn.
- Add an optional `truncation_policy` to `TurnContextItem` and populate
it when writing turn-context rollout items.
- Introduce a protocol-level `TruncationPolicy` representation and
convert from core truncation policy when recording.
### Testing
- `cargo test -p codex-protocol` (pass)
Summary
Fixes intermittent screen corruption in tui2 (random stale characters)
by
addressing two terminal state desyncs: nested alt-screen transitions and
the
first-draw viewport clear.
- Make alt-screen enter/leave re-entrant via a small nesting guard so
closing
- Ensure the first viewport draw clears after the viewport is sized,
preventing
old terminal contents from leaking through when diff-based rendering
skips
space cells.
- Add docs + a small unit test for the alt-screen nesting behavior.
Testing
- cargo test -p codex-tui2
- cargo clippy -p codex-tui2 --all-features --tests
- Manual:
- Opened the transcript overlay and dismissed it repeatedly; verified
the
normal view redraws cleanly with no leftover characters.
- Ran tui2 in a new folder with no trust settings (and also cleared the
trust setting from config to re-trigger the prompt); verified the
initial
trust/onboarding screen renders without artifacts.
This adds logic to load `/etc/codex/config.toml` and associate it with
`ConfigLayerSource::System` on UNIX. I refactored the code so it shares
logic with the creation of the `ConfigLayerSource::User` layer.
https://github.com/openai/codex/pull/8354 added support for in-repo
`.config/` files, so this PR updates the logic for loading `*.rules`
files to load `*.rules` files from all relevant layers. The main change
to the business logic is `load_exec_policy()` in
`codex-rs/core/src/exec_policy.rs`.
Note this adds a `config_folder()` method to `ConfigLayerSource` that
returns `Option<AbsolutePathBuf>` so that it is straightforward to
iterate over the sources and get the associated config folder, if any.
This is necessary so that `$CODEX_HOME/skills` and `$CODEX_HOME/rules`
still get loaded even if `$CODEX_HOME/config.toml` does not exist. See
#8453.
For now, it is possible to omit this layer when creating a dummy
`ConfigLayerStack` in a test. We can revisit that later, if it turns out
to be the right thing to do.
with_target(true) is the default for tracing-subscriber, but we
previously disabled it for file output.
Keep it enabled so we can selectively enable specific targets/events at
runtime via RUST_LOG=..., and then grep by target/module in the log file
during troubleshooting.
before and after:
<img width="629" height="194" alt="image"
src="https://github.com/user-attachments/assets/33f7df3f-0c5d-4d3f-b7b7-80b03d4acd21"
/>
Copy now operates on the full logical selection range (anchor..head),
not just the visible viewport, so selections that include offscreen
lines copy the expected text.
Selection extraction is factored into `transcript_selection` to make the
logic easier to test and reason about. It reconstructs the wrapped
visual transcript, renders each wrapped line into a 1-row offscreen
Buffer, and reads the selected cells. This keeps clipboard text aligned
with what is rendered (gutter, indentation, wrapping).
Additional behavior:
- Skip continuation cells for wide glyphs (e.g. CJK) so copied text does
not include spurious spaces like "コ X".
- Avoid copying right-margin padding spaces.
Manual tested performed:
- "tell me a story" a few times
- scroll up, select text, scroll down, copy text
- confirm copied text is what you expect
Add `ctrl+g` shortcut to enable opening current prompt in configured
editor (`$VISUAL` or `$EDITOR`).
- Prompt is updated with editor's content upon editor close.
- Paste placeholders are automatically expanded when opening the
external editor, and are not "recompressed" on close
- They could be preserved in the editor, but it would be hard to prevent
the user from modifying the placeholder text directly, which would drop
the mapping to the `pending_paste` value
- Image placeholders stay as-is
- `ctrl+g` explanation added to shortcuts menu, snapshot tests updated
https://github.com/user-attachments/assets/4ee05c81-fa49-4e99-8b07-fc9eef0bbfce
The elevated setup synchronously applies read/write ACLs to any
workspace roots.
However, until we apply *read* permission to the full path, powershell
cannot use some roots as a cwd as it needs access to all parts of the
path in order to apply it as the working directory for a command.
The solution is, while the async read-ACL part of setup is running, use
a "junction" that lives in C:\Users\CodexSandbox{Offline|Online} that
points to the cwd.
Once the read ACLs are applied, we stop using the junction.
-----
this PR also removes some dead code and overly-verbose logging, and has
some light refactoring to the ACL-related functions
The bash command parser in exec_policy was failing to parse commands
with concatenated flag-value patterns like `-g"*.py"` (no space between
flag and quoted value). This caused policy rules like
`prefix_rule(pattern=["rg"])` to not match commands such as `rg -n "foo"
-g"*.py"`.
When tree-sitter-bash parses `-g"*.py"`, it creates a "concatenation"
node containing a word (`-g`) and a string (`"*.py"`). The parser
previously rejected any node type not in the ALLOWED_KINDS list, causing
the entire command parsing to fail and fall back to matching against the
wrapped `bash -lc` command instead of the inner command.
This change:
- Adds "concatenation" to ALLOWED_KINDS in
try_parse_word_only_commands_sequence
- Adds handling for concatenation nodes in parse_plain_command_from_node
that recursively extracts and joins word/string/raw_string children
- Adds test cases for concatenated flag patterns with double and single
quotes
Fixes#8394
- allow configuring `project_root_markers` in `config.toml`
(user/system/MDM) to control project discovery beyond `.git`
- honor the markers after merging pre-project layers; default to
`[".git"]` when unset and skip ancestor walk when set to an empty array
- document the option and add coverage for alternate markers in config
loader tests
- We now support `.codex/config.toml` in repo (from `cwd` up to the
first `.git` found, if any) as layers in `ConfigLayerStack`. A new
`ConfigLayerSource::Project` variant was added to support this.
- In doing this work, I realized that we were resolving relative paths
in `config.toml` after merging everything into one `toml::Value`, which
is wrong: paths should be relativized with respect to the folder
containing the `config.toml` that was deserialized. This PR introduces a
deserialize/re-serialize strategy to account for this in
`resolve_config_paths()`. (This is why `Serialize` is added to so many
types as part of this PR.)
- Added tests to verify this new behavior.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8354).
* #8359
* __->__ #8354
## Summary
Adds a FeatureFlag to enforce UTF8 encoding in powershell, particularly
Windows Powershell v5. This should help address issues like #7290.
Notably, this PR does not include the ability to parse `apply_patch`
invocations within UTF8 shell commands (calls to the freeform tool
should not be impacted). I am leaving this out of scope for now. We
should address before this feature becomes Stable, but those cases are
not the default behavior at this time so we're okay for experimentation
phase. We should continue cleaning up the `apply_patch::invocation`
logic and then can handle it more cleanly.
## Testing
- [x] Adds additional testing
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.47 to 4.5.53.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.53</h2>
<h2>[4.5.53] - 2025-11-19</h2>
<h3>Features</h3>
<ul>
<li>Add <code>default_values_if</code>,
<code>default_values_ifs</code></li>
</ul>
<h2>v4.5.52</h2>
<h2>[4.5.52] - 2025-11-17</h2>
<h3>Fixes</h3>
<ul>
<li>Don't panic when <code>args_conflicts_with_subcommands</code>
conflicts with an <code>ArgGroup</code></li>
</ul>
<h2>v4.5.51</h2>
<h2>[4.5.51] - 2025-10-29</h2>
<h3>Fixes</h3>
<ul>
<li><em>(help)</em> Correctly calculate padding for short flags that
take a value</li>
<li><em>(help)</em> Don't panic on short flags using
<code>ArgAction::Count</code></li>
</ul>
<h2>v4.5.50</h2>
<h2>[4.5.50] - 2025-10-20</h2>
<h3>Features</h3>
<ul>
<li>Accept <code>Cow</code> where <code>String</code> and
<code>&str</code> are accepted</li>
</ul>
<h2>v4.5.48</h2>
<h2>[4.5.48] - 2025-09-19</h2>
<h3>Documentation</h3>
<ul>
<li>Add a new CLI Concepts document as another way of framing clap</li>
<li>Expand the <code>typed_derive</code> cookbook entry</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.53] - 2025-11-19</h2>
<h3>Features</h3>
<ul>
<li>Add <code>default_values_if</code>,
<code>default_values_ifs</code></li>
</ul>
<h2>[4.5.52] - 2025-11-17</h2>
<h3>Fixes</h3>
<ul>
<li>Don't panic when <code>args_conflicts_with_subcommands</code>
conflicts with an <code>ArgGroup</code></li>
</ul>
<h2>[4.5.51] - 2025-10-29</h2>
<h3>Fixes</h3>
<ul>
<li><em>(help)</em> Correctly calculate padding for short flags that
take a value</li>
<li><em>(help)</em> Don't panic on short flags using
<code>ArgAction::Count</code></li>
</ul>
<h2>[4.5.50] - 2025-10-20</h2>
<h3>Features</h3>
<ul>
<li>Accept <code>Cow</code> where <code>String</code> and
<code>&str</code> are accepted</li>
</ul>
<h2>[4.5.49] - 2025-10-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(help)</em> Correctly wrap when ANSI escape codes are
present</li>
</ul>
<h2>[4.5.48] - 2025-09-19</h2>
<h3>Documentation</h3>
<ul>
<li>Add a new CLI Concepts document as another way of framing clap</li>
<li>Expand the <code>typed_derive</code> cookbook entry</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3716f9f428"><code>3716f9f</code></a>
chore: Release</li>
<li><a
href="613b69a6b7"><code>613b69a</code></a>
docs: Update changelog</li>
<li><a
href="d117f7acde"><code>d117f7a</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/6028">#6028</a>
from epage/arg</li>
<li><a
href="cb8255d2f3"><code>cb8255d</code></a>
feat(builder): Allow quoted id's for arg macro</li>
<li><a
href="1036060f13"><code>1036060</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/6025">#6025</a>
from AldaronLau/typos-in-faq</li>
<li><a
href="2fcafc0aee"><code>2fcafc0</code></a>
docs: Fix minor grammar issues in FAQ</li>
<li><a
href="a380b65fe9"><code>a380b65</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/6023">#6023</a>
from epage/template</li>
<li><a
href="4d7ab1483c"><code>4d7ab14</code></a>
chore: Update from _rust/main template</li>
<li><a
href="b8a7ea49d9"><code>b8a7ea4</code></a>
chore(deps): Update Rust Stable to v1.87 (<a
href="https://redirect.github.com/clap-rs/clap/issues/18">#18</a>)</li>
<li><a
href="f9842b3b3f"><code>f9842b3</code></a>
chore: Avoid MSRV problems out of the box</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.47...clap_complete-v4.5.53">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [test-log](https://github.com/d-e-s-o/test-log) from 0.2.18 to
0.2.19.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/d-e-s-o/test-log/releases">test-log's
releases</a>.</em></p>
<blockquote>
<h2>v0.2.19</h2>
<h2>What's Changed</h2>
<ul>
<li>Adjusted <code>tracing</code> output to log to
<code>stderr</code></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/dbdr"><code>@dbdr</code></a> made their
first contribution in <a
href="https://redirect.github.com/d-e-s-o/test-log/pull/64">d-e-s-o/test-log#64</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/d-e-s-o/test-log/compare/v0.2.18...v0.2.19">https://github.com/d-e-s-o/test-log/compare/v0.2.18...v0.2.19</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/d-e-s-o/test-log/blob/main/CHANGELOG.md">test-log's
changelog</a>.</em></p>
<blockquote>
<h2>0.2.19</h2>
<ul>
<li>Adjusted <code>tracing</code> output to log to
<code>stderr</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b4cd4a3ab6"><code>b4cd4a3</code></a>
Bump version to 0.2.19</li>
<li><a
href="bafe834fe7"><code>bafe834</code></a>
Emit tracing output to stderr</li>
<li><a
href="9e7aafbdcc"><code>9e7aafb</code></a>
Bump actions/checkout from 5 to 6</li>
<li><a
href="7fc59759d8"><code>7fc5975</code></a>
Suggest using [dev-dependencies] instead of [dependencies] in
README.md</li>
<li><a
href="25e7c367e6"><code>25e7c36</code></a>
Bump actions/checkout from 4 to 5</li>
<li><a
href="b633926871"><code>b633926</code></a>
Address clippy reported issue</li>
<li><a
href="628feeea5a"><code>628feee</code></a>
Don't specify patch level for dev-dependencies</li>
<li><a
href="d1a217e2e4"><code>d1a217e</code></a>
Update rstest requirement from 0.25.0 to 0.26.1</li>
<li><a
href="a2c6ba206e"><code>a2c6ba2</code></a>
Document private items in documentation CI job</li>
<li>See full diff in <a
href="https://github.com/d-e-s-o/test-log/compare/v0.2.18...v0.2.19">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the configuration documentation to clarify and improve the
description of the `developer_instructions` and `instructions` fields.
Documentation updates:
* Added a description for the `developer_instructions` field in
`docs/config.md`, clarifying that it provides additional developer
instructions.
* Updated the comments in `docs/example-config.md` to specify that
`developer_instructions` is injected before `AGENTS.md`, and clarified
that the `instructions` field is ignored and that `AGENTS.md` is
preferred.
___
ref #7973
Thanks to @miraclebakelaser for the message. I have double-confirmed
that developer instructions are always injected before user
instructions. According to the source code
[codex_core::codex::Session::build_initial_context](https://github.com/openai/codex/blob/rust-v0.77.0-alpha.2/codex-rs/core/src/codex.rs#L1279),
we can see the specific order of these instructions.
Ignore mouse events outside the transcript region so composer/footer
interactions do not start or mutate transcript selection state.
A left-click outside the transcript also cancels any active selection.
Selection changes schedule a redraw because mouse events don't
inherently trigger a frame.
Codex Unified Exec injects NO_COLOR=1 (and TERM=dumb) into shell tool
commands to keep output stable. Crossterm respects NO_COLOR and
suppresses ANSI escapes, which breaks our VT100-backed tests that assert
on parsed ANSI color output (they see vt100::Color::Default everywhere).
Force ANSI color output back on in the VT100 test backend by overriding
crossterm's memoized NO_COLOR setting in VT100Backend::new. This keeps
Unified Exec behavior unchanged while making the VT100 tests meaningful
and deterministic under Codex.
> [!WARNING]
> it's possible that this might be a race condition problem for this and
need to be solved a different way. Feel free to revert if it causes the
opposite problem for other tests that assume NOCOLOR is set. If it does
then we need to probably add some extra AGENTS.md lines for how to run
tests when using unified exec.
(this same change was made in tui, so it's probably safe).
### Summary
With codesigning on Mac, Windows and Linux, we should be able to safely
remove `features.rmcp_client` and `use_experimental_use_rmcp_client`
check from the codebase now.
## TUI2: Normalize Mouse Scroll Input Across Terminals (Wheel +
Trackpad)
This changes TUI2 scrolling to a stream-based model that normalizes
terminal scroll event density into consistent wheel behavior (default:
~3 transcript lines per physical wheel notch) while keeping trackpad
input higher fidelity via fractional accumulation.
Primary code: `codex-rs/tui2/src/tui/scrolling/mouse.rs`
Doc of record (model + probe-derived data):
`codex-rs/tui2/docs/scroll_input_model.md`
### Why
Terminals encode both mouse wheels and trackpads as discrete scroll
up/down events with direction but no magnitude, and they vary widely in
how many raw events they emit per physical wheel notch (commonly 1, 3,
or 9+). Timing alone doesn’t reliably distinguish wheel vs trackpad, so
cadence-based heuristics are unstable across terminals/hardware.
This PR treats scroll input as short *streams* separated by silence or
direction flips, normalizes raw event density into tick-equivalents,
coalesces redraws for dense streams, and exposes explicit config
overrides.
### What Changed
#### Scroll Model (TUI2)
- Stream detection
- Start a stream on the first scroll event.
- End a stream on an idle gap (`STREAM_GAP_MS`) or a direction flip.
- Normalization
- Convert raw events into tick-equivalents using per-terminal
`tui.scroll_events_per_tick`.
- Wheel-like vs trackpad-like behavior
- Wheel-like: fixed “classic” lines per wheel notch; flush immediately
for responsiveness.
- Trackpad-like: fractional accumulation + carry across stream
boundaries; coalesce flushes to ~60Hz to avoid floods and reduce “stop
lag / overshoot”.
- Trackpad divisor is intentionally capped: `min(scroll_events_per_tick,
3)` so terminals with dense wheel ticks (e.g. 9 events per notch) don’t
make trackpads feel artificially slow.
- Auto mode (default)
- Start conservatively as trackpad-like (avoid overshoot).
- Promote to wheel-like if the first tick-worth of events arrives
quickly.
- Fallback for 1-event-per-tick terminals (no tick-completion timing
signal).
#### Trackpad Acceleration
Some terminals produce relatively low vertical event density for
trackpad gestures, which makes large/faster swipes feel sluggish even
when small motions feel correct. To address that, trackpad-like streams
apply a bounded multiplier based on event count:
- `multiplier = clamp(1 + abs(events) / scroll_trackpad_accel_events,
1..scroll_trackpad_accel_max)`
The multiplier is applied to the trackpad stream’s computed line delta
(including carried fractional remainder). Defaults are conservative and
bounded.
#### Config Knobs (TUI2)
All keys live under `[tui]`:
- `scroll_wheel_lines`: lines per physical wheel notch (default: 3).
- `scroll_events_per_tick`: raw vertical scroll events per physical
wheel notch (terminal-specific default; fallback: 3).
- Wheel-like per-event contribution: `scroll_wheel_lines /
scroll_events_per_tick`.
- `scroll_trackpad_lines`: baseline trackpad sensitivity (default: 1).
- Trackpad-like per-event contribution: `scroll_trackpad_lines /
min(scroll_events_per_tick, 3)`.
- `scroll_trackpad_accel_events` / `scroll_trackpad_accel_max`: bounded
trackpad acceleration (defaults: 30 / 3).
- `scroll_mode = auto|wheel|trackpad`: force behavior or use the
heuristic (default: `auto`).
- `scroll_wheel_tick_detect_max_ms`: auto-mode promotion threshold (ms).
- `scroll_wheel_like_max_duration_ms`: auto-mode fallback for
1-event-per-tick terminals (ms).
- `scroll_invert`: invert scroll direction (applies to wheel +
trackpad).
Config docs: `docs/config.md` and field docs in
`codex-rs/core/src/config/types.rs`.
#### App Integration
- The app schedules follow-up ticks to close idle streams (via
`ScrollUpdate::next_tick_in` and `schedule_frame_in`) and finalizes
streams on draw ticks.
- `codex-rs/tui2/src/app.rs`
#### Docs
- Single doc of record describing the model + preserved probe
findings/spec:
- `codex-rs/tui2/docs/scroll_input_model.md`
#### Other (jj-only friendliness)
- `codex-rs/tui2/src/diff_render.rs`: prefer stable cwd-relative paths
when the file is under the cwd even if there’s no `.git`.
### Terminal Defaults
Per-terminal defaults are derived from scroll-probe logs (see doc).
Notable:
- Ghostty currently defaults to `scroll_events_per_tick = 3` even though
logs measured ~9 in one setup. This is a deliberate stopgap; if your
Ghostty build emits ~9 events per wheel notch, set:
```toml
[tui]
scroll_events_per_tick = 9
```
### Testing
- `just fmt`
- `just fix -p codex-core --allow-no-vcs`
- `cargo test -p codex-core --lib` (pass)
- `cargo test -p codex-tui2` (scroll tests pass; remaining failures are
known flaky VT100 color tests in `insert_history`)
### Review Focus
- Stream finalization + frame scheduling in `codex-rs/tui2/src/app.rs`.
- Auto-mode promotion thresholds and the 1-event-per-tick fallback
behavior.
- Trackpad divisor cap (`min(events_per_tick, 3)`) and acceleration
defaults.
- Ghostty default tradeoff (3 vs ~9) and whether we should change it.
`load_config_layers_state()` should load config from a
`.codex/config.toml` in any folder between the `cwd` for a thread and
the project root. Though in order to do that,
`load_config_layers_state()` needs to know what the `cwd` is, so this PR
does the work to thread the `cwd` through for existing callsites.
A notable exception is the `/config` endpoint in app server for which a
`cwd` is not guaranteed to be associated with the query, so the `cwd`
param is `Option<AbsolutePathBuf>` to account for this case.
The logic to make use of the `cwd` will be done in a follow-up PR.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This adds support for `allowed_sandbox_modes` in `requirements.toml` and
provides legacy support for constraining sandbox modes in
`managed_config.toml`. This is converted to `Constrained<SandboxPolicy>`
in `ConfigRequirements` and applied to `Config` such that constraints
are enforced throughout the harness.
Note that, because `managed_config.toml` is deprecated, we do not add
support for the new `external-sandbox` variant recently introduced in
https://github.com/openai/codex/pull/8290. As noted, that variant is not
supported in `config.toml` today, but can be configured programmatically
via app server.
## Summary
- centralize file name derivation in codex-file-search
- reuse the helper in app-server fuzzy search to avoid duplicate logic
- add unit tests for file_name_from_path
## Testing
- cargo test -p codex-file-search
- cargo test -p codex-app-server
Problem
- Mouse wheel events were scheduling a redraw on every event, which
could backlog and create lag during fast scrolling.
Solution
- Schedule transcript scroll redraws with a short delay (16ms) so the
frame requester coalesces bursts into fewer draws.
Why
- Smooths rapid wheel scrolling while keeping the UI responsive.
Testing
- Manual: Scrolled in iTerm and Ghostty; no lag observed.
- `cargo clippy --fix --all-features --tests --allow-dirty
--allow-no-vcs -p codex-tui2`
This will make it easier to test for expected errors in unit tests since
we can compare based on the field values rather than the message (which
might change over time). See https://github.com/openai/codex/pull/8298
for an example.
It also ensures more consistency in the way a `ConstraintError` is
constructed.
Fixes#8214 by removing the '--staged' flag from the undo git restore
command. This ensures that while the working tree is reverted to the
snapshot state, the user's staged changes (index) are preserved,
preventing data loss. Also adds a regression test.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
We were assembling the skill roots in two different places, and the
admin root was missing in one of them. This change centralizes root
selection into a helper so both paths stay in sync.
## Description
Introduced `ExternalSandbox` policy to cover use case when sandbox
defined by outside environment, effectively it translates to
`SandboxMode#DangerFullAccess` for file system (since sandbox configured
on container level) and configurable `network_access` (either Restricted
or Enabled by outside environment).
as example you can configure `ExternalSandbox` policy as part of
`sendUserTurn` v1 app_server API:
```
{
"conversationId": <id>,
"cwd": <cwd>,
"approvalPolicy": "never",
"sandboxPolicy": {
"type": ""external-sandbox",
"network_access": "enabled"/"restricted"
},
"model": <model>,
"effort": <effort>,
....
}
```
https://github.com/openai/codex/pull/8235 introduced `ConfigBuilder` and
this PR updates all call non-test call sites to use it instead of
`Config::load_from_base_config_with_overrides()`.
This is important because `load_from_base_config_with_overrides()` uses
an empty `ConfigRequirements`, which is a reasonable default for testing
so the tests are not influenced by the settings on the host. This method
is now guarded by `#[cfg(test)]` so it cannot be used by business logic.
Because `ConfigBuilder::build()` is `async`, many of the test methods
had to be migrated to be `async`, as well. On the bright side, this made
it possible to eliminate a bunch of `block_on_future()` stuff.
Historically, `accept_elicitation_for_prompt_rule()` was flaky because
we were using a notification to update the sandbox followed by a `shell`
tool request that we expected to be subject to the new sandbox config,
but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate
each incoming message to a new Tokio task, messages are not guaranteed
to be processed in order, so sometimes the `shell` tool call would run
before the notification was processed.
Prior to this PR, we relied on a generous `sleep()` between the
notification and the request to reduce the change of the test flaking
out.
This PR implements a proper fix, which is to use a _request_ instead of
a notification for the sandbox update so that we can wait for the
response to the sandbox request before sending the request to the
`shell` tool call. Previously, `rmcp` did not support custom requests,
but I fixed that in
https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it
into the `0.12.0` release (see #8288).
This PR updates `shell-tool-mcp` to expect
`"codex/sandbox-state/update"` as a _request_ instead of a notification
and sends the appropriate ack. Note this behavior is tied to our custom
`codex/sandbox-state` capability, which Codex honors as an MCP client,
which is why `core/src/mcp_connection_manager.rs` had to be updated as
part of this PR, as well.
This PR also updates the docs at `shell-tool-mcp/README.md`.
This implements the new config design where config _requirements_ are
loaded separately (and with a special schema) as compared to config
_settings_. In particular, on UNIX, with this PR, you could define
`/etc/codex/requirements.toml` with:
```toml
allowed_approval_policies = ["never", "on-request"]
```
to enforce that `Config.approval_policy` must be one of those two values
when Codex runs.
We plan to expand the set of things that can be restricted by
`/etc/codex/requirements.toml` in short order.
Note that requirements can come from several sources:
- new MDM key on macOS (not implemented yet)
- `/etc/codex/requirements.toml`
- re-interpretation of legacy MDM key on macOS
(`com.openai.codex/config_toml_base64`)
- re-interpretation of legacy `/etc/codex/managed_config.toml`
So our resolution strategy is to load TOML data from those sources, in
order. Later TOMLs are "merged" into previous TOMLs, but any field that
is already set cannot be overwritten. See
`ConfigRequirementsToml::merge_unset_fields()`.
See snapshots for view of edge cases
This is still named `UnifiedExecSessions` for consistency across the
code but should be renamed to `BackgroundTerminals` in a follow-up
Example:
<img width="945" height="687" alt="Screenshot 2025-12-18 at 20 12 53"
src="https://github.com/user-attachments/assets/92f39ff2-243c-4006-b402-e3fa9e93c952"
/>
# Terminal Detection Metadata for Per-Terminal Scroll Scaling
## Summary
Expand terminal detection into structured metadata (`TerminalInfo`) with
multiplexer awareness, plus a testable environment shim and
characterization tests.
## Context / Motivation
- TUI2 owns its viewport and scrolling model (see
`codex-rs/tui2/docs/tui_viewport_and_history.md`), so scroll behavior
must be consistent across terminals and independent of terminal
scrollback quirks.
- Prior investigations show mouse wheel scroll deltas vary noticeably by
terminal. To tune scroll scaling (line increments per wheel tick) we
need reliable terminal identification, including when running inside
tmux/zellij.
- tmux is especially tricky because it can mask the underlying terminal;
we now consult `tmux display-message` client termtype/name to attribute
sessions to the actual terminal rather than tmux itself.
- This remains backwards compatible with the existing OpenTelemetry
user-agent token because `user_agent()` is still derived from the same
environment signals (now via `TerminalInfo`).
## Changes
- Introduce `TerminalInfo`, `TerminalName`, and `Multiplexer` with
`TERM_PROGRAM`/`TERM`/multiplexer detection and user-agent formatting in
`codex-rs/core/src/terminal.rs`.
- Add an injectable `Environment` trait + `FakeEnvironment` for testing,
and comprehensive characterization tests covering known terminals, tmux
client termtype/name, and zellij.
- Document module usage and detection order; update `terminal_info()` to
be the primary interface for callers.
## Testing
- `cargo test -p codex-core terminal::tests`
- manually checked ghostty, iTerm2, Terminal.app, vscode, tmux, zellij,
Warp, alacritty, kitty.
```
2025-12-18T07:07:49.191421Z INFO Detected terminal info terminal=TerminalInfo { name: Iterm2, term_program: Some("iTerm.app"), version: Some("3.6.6"), term: None, multiplexer: None }
2025-12-18T07:07:57.991776Z INFO Detected terminal info terminal=TerminalInfo { name: AppleTerminal, term_program: Some("Apple_Terminal"), version: Some("455.1"), term: None, multiplexer: None }
2025-12-18T07:08:07.732095Z INFO Detected terminal info terminal=TerminalInfo { name: WarpTerminal, term_program: Some("WarpTerminal"), version: Some("v0.2025.12.10.08.12.stable_03"), term: None, multiplexer: None }
2025-12-18T07:08:24.860316Z INFO Detected terminal info terminal=TerminalInfo { name: Kitty, term_program: None, version: None, term: None, multiplexer: None }
2025-12-18T07:08:38.302761Z INFO Detected terminal info terminal=TerminalInfo { name: Alacritty, term_program: None, version: None, term: None, multiplexer: None }
2025-12-18T07:08:50.887748Z INFO Detected terminal info terminal=TerminalInfo { name: VsCode, term_program: Some("vscode"), version: Some("1.107.1"), term: None, multiplexer: None }
2025-12-18T07:10:01.309802Z INFO Detected terminal info terminal=TerminalInfo { name: WezTerm, term_program: Some("WezTerm"), version: Some("20240203-110809-5046fc22"), term: None, multiplexer: None }
2025-12-18T08:05:17.009271Z INFO Detected terminal info terminal=TerminalInfo { name: Ghostty, term_program: Some("ghostty"), version: Some("1.2.3"), term: None, multiplexer: None }
2025-12-18T08:05:23.819973Z INFO Detected terminal info terminal=TerminalInfo { name: Ghostty, term_program: Some("ghostty"), version: Some("1.2.3"), term: Some("xterm-ghostty"), multiplexer: Some(Tmux { version: Some("3.6a") }) }
2025-12-18T08:05:35.572853Z INFO Detected terminal info terminal=TerminalInfo { name: Ghostty, term_program: Some("ghostty"), version: Some("1.2.3"), term: None, multiplexer: Some(Zellij) }
```
## Notes / Follow-ups
- Next step is to wire `TerminalInfo` into TUI2’s scroll scaling
configuration and add a per-terminal tuning table.
- The log output in TUI2 helps validate real-world detection before
applying behavior changes.
when granting read access to the sandbox user, grant the
codex/command-runner exe directory first so commands can run before the
entire read ACL process is finished.
Add a dmg target that bundles the codex and codex responses api proxy
binaries for MacOS. this target is signed and notarized.
Verified by triggering a build here:
https://github.com/openai/codex/actions/runs/20318136302/job/58367155205.
Downloaded the artifact and verified that the dmg is signed and
notarized, and the codex binary contained works as expected.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This is a significant change to how layers of configuration are applied.
In particular, the `ConfigLayerStack` now has two important fields:
- `layers: Vec<ConfigLayerEntry>`
- `requirements: ConfigRequirements`
We merge `TomlValue`s across the layers, but they are subject to
`ConfigRequirements` before creating a `Config`.
How I would review this PR:
- start with `codex-rs/app-server-protocol/src/protocol/v2.rs` and note
the new variants added to the `ConfigLayerSource` enum:
`LegacyManagedConfigTomlFromFile` and `LegacyManagedConfigTomlFromMdm`
- note that `ConfigLayerSource` now has a `precedence()` method and
implements `PartialOrd`
- `codex-rs/core/src/config_loader/layer_io.rs` is responsible for
loading "admin" preferences from `/etc/codex/managed_config.toml` and
MDM. Because `/etc/codex/managed_config.toml` is now deprecated in favor
of `/etc/codex/requirements.toml` and `/etc/codex/config.toml`, we now
include some extra information on the `LoadedConfigLayers` returned in
`layer_io.rs`.
- `codex-rs/core/src/config_loader/mod.rs` has major changes to
`load_config_layers_state()`, which is what produces `ConfigLayerStack`.
The docstring has the new specification and describes the various layers
that will be loaded and the precedence order.
- It uses the information from `LoaderOverrides` "twice," both in the
spirit of legacy support:
- We use one instances to derive an instance of `ConfigRequirements`.
Currently, the only field in `managed_config.toml` that contributes to
`ConfigRequirements` is `approval_policy`. This PR introduces
`Constrained::allow_only()` to support this.
- We use a clone of `LoaderOverrides` to derive
`ConfigLayerSource::LegacyManagedConfigTomlFromFile` and
`ConfigLayerSource::LegacyManagedConfigTomlFromMdm` layers, as
appropriate. As before, this ends up being a "best effort" at enterprise
controls, but is enforcement is not guaranteed like it is for
`ConfigRequirements`.
- Now we only create a "user" layer if `$CODEX_HOME/config.toml` exists.
(Previously, a user layer was always created for `ConfigLayerStack`.)
- Similarly, we only add a "session flags" layer if there are CLI
overrides.
- `config_loader/state.rs` contains the updated implementation for
`ConfigLayerStack`. Note the public API is largely the same as before,
but the implementation is quite different. We leverage the fact that
`ConfigLayerSource` is now `PartialOrd` to ensure layers are in the
correct order.
- A `Config` constructed via `ConfigBuilder.build()` will use
`load_config_layers_state()` to create the `ConfigLayerStack` and use
the associated `ConfigRequirements` when constructing the `Config`
object.
- That said, a `Config` constructed via
`Config::load_from_base_config_with_overrides()` does _not_ yet use
`ConfigBuilder`, so it creates a `ConfigRequirements::default()` instead
of loading a proper `ConfigRequirements`. I will fix this in a
subsequent PR.
Then the following files are mostly test changes:
```
codex-rs/app-server/tests/suite/v2/config_rpc.rs
codex-rs/core/src/config/service.rs
codex-rs/core/src/config_loader/tests.rs
```
Again, because we do not always include "user" and "session flags"
layers when the contents are empty, `ConfigLayerStack` sometimes has
fewer layers than before (and the precedence order changed slightly),
which is the main reason integration tests changed.
This pull request makes a small update to the session picker
documentation for `codex resume`. The main change clarifies how to view
the original working directory (CWD) for sessions and when the Git
branch is shown.
- The session picker now displays the recorded Git branch when
available, and instructions are added for showing the original working
directory by using the `--all` flag, which also disables CWD filtering
and adds a `CWD` column.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This pull request updates the ChatGPT login description in the
onboarding authentication widgets to clarify which plans include usage.
The description now lists "Business" rather than "Team" and adds
"Education" plans in addition to the previously mentioned plans.
I have read the CLA Document and I hereby sign the CLAs.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduce `ConfigBuilder` as an alternative to our existing `Config`
constructors.
I noticed that the existing constructors,
`Config::load_with_cli_overrides()` and
`Config::load_with_cli_overrides_and_harness_overrides()`, did not take
`codex_home` as a parameter, which can be a problem.
Historically, when Codex was purely a CLI, we wanted to be extra sure
that the creation of `codex_home` was always done via
`find_codex_home()`, so we did not expose `codex_home` as a parameter
when creating `Config` in business logic. But in integration tests,
`codex_home` nearly always needs to be configured (as a temp directory),
which is why callers would have to go through
`Config::load_from_base_config_with_overrides()` instead.
Now that the Codex harness also functions as an app server, which could
conceivably load multiple threads where `codex_home` is parameterized
differently in each one, I think it makes sense to make this
configurable. Going to a builder pattern makes it more flexible to
ensure an arbitrary permutation of options can be set when constructing
a `Config` while using the appropriate defaults for the options that
aren't set explicitly.
Ultimately, I think this should make it possible for us to make
`Config::load_from_base_config_with_overrides()` private because all
integration tests should be able to leverage `ConfigBuilder` instead.
Though there could be edge cases, so I'll pursue that migration after we
get through the current config overhaul.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8235).
* #8237
* __->__ #8235
1. Remove PUBLIC skills and introduce SYSTEM skills embedded in the
binary and installed into $CODEX_HOME/skills/.system at startup.
2. Skills are now always enabled (feature flag removed).
3. Update skills/list to accept forceReload and plumb it through (not
used by clients yet).
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
2025-12-18 02:03:40 +00:00
496 changed files with 34817 additions and 12350 deletions
- Prefer deep equals comparisons whenever possible. Perform `assert_eq!()` on entire objects, rather than individual fields.
- Avoid mutating process environment in tests; prefer passing environment-derived flags or dependencies from above.
### Spawning workspace binaries in tests (Cargo vs Buck2)
- Prefer `codex_utils_cargo_bin::cargo_bin("...")` over `assert_cmd::Command::cargo_bin(...)` or `escargot` when tests need to spawn first-party binaries.
- Under Buck2, `CARGO_BIN_EXE_*` may be project-relative (e.g. `buck-out/...`), which breaks if a test changes its working directory. `codex_utils_cargo_bin::cargo_bin` resolves to an absolute path first.
- When locating fixture files under Buck2, avoid `env!("CARGO_MANIFEST_DIR")` (Buck codegen sets it to `"."`). Prefer deriving paths from `codex_utils_cargo_bin::buck_project_root()` when needed.
### Integration tests (core)
- Prefer the utilities in `core_test_support::responses` when writing end-to-end Codex tests.
<p align="center"><code>npm i -g @openai/codex</code><br />or <code>brew install --cask codex</code></p>
<p align="center"><strong>Codex CLI</strong> is a coding agent from OpenAI that runs locally on your computer.
</br>
</br>If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href="https://developers.openai.com/codex/ide">install in your IDE</a>
</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href="https://chatgpt.com/codex">chatgpt.com/codex</a></p>
If you want Codex in your code editor (VS Code, Cursor, Windsurf), <a href="https://developers.openai.com/codex/ide">install in your IDE.</a>
</br>If you are looking for the <em>cloud-based agent</em> from OpenAI, <strong>Codex Web</strong>, go to <a href="https://chatgpt.com/codex">chatgpt.com/codex</a>.</p>
---
@@ -15,25 +13,19 @@
### Installing and running Codex CLI
Install globally with your preferred package manager. If you use npm:
Install globally with your preferred package manager:
```shell
# Install using npm
npm install -g @openai/codex
```
Alternatively, if you use Homebrew:
```shell
# Install using Homebrew
brew install --cask codex
```
Then simply run `codex` to get started:
```shell
codex
```
If you're running into upgrade issues with Homebrew, see the [FAQ entry on brew upgrade codex](./docs/faq.md#brew-upgrade-codex-isnt-upgrading-me).
Then simply run `codex` to get started.
<details>
<summary>You can also go to the <a href="https://github.com/openai/codex/releases/latest">latest GitHub Release</a> and download the appropriate binary for your platform.</summary>
@@ -53,60 +45,15 @@ Each archive contains a single entry with the platform baked into the name (e.g.
Run `codex` and select **Sign in with ChatGPT**. We recommend signing into your ChatGPT account to use Codex as part of your Plus, Pro, Team, Edu, or Enterprise plan. [Learn more about what's included in your ChatGPT plan](https://help.openai.com/en/articles/11369540-codex-in-chatgpt).
You can also use Codex with an API key, but this requires [additional setup](./docs/authentication.md#usage-based-billing-alternative-use-an-openai-api-key). If you previously used an API key for usage-based billing, see the [migration steps](./docs/authentication.md#migrating-from-usage-based-billing-api-key). If you're having trouble with login, please comment on [this issue](https://github.com/openai/codex/issues/1243).
You can also use Codex with an API key, but this requires [additional setup](https://developers.openai.com/codex/auth#sign-in-with-an-api-key).
### Model Context Protocol (MCP)
## Docs
Codex can access MCP servers. To configure them, refer to the [config docs](./docs/config.md#mcp_servers).
### Configuration
Codex CLI supports a rich set of configuration options, with preferences stored in `~/.codex/config.toml`. For full configuration options, see [Configuration](./docs/config.md).
### Execpolicy
See the [Execpolicy quickstart](./docs/execpolicy.md) to set up rules that govern what commands Codex can execute.
@@ -15,8 +15,8 @@ You can also install via Homebrew (`brew install --cask codex`) or download a pl
## Documentation quickstart
- First run with Codex? Follow the walkthrough in [`docs/getting-started.md`](../docs/getting-started.md) for prompts, keyboard shortcuts, and session management.
-Already shipping with Codex and want deeper control? Jump to [`docs/advanced.md`](../docs/advanced.md) and the configuration reference at [`docs/config.md`](../docs/config.md).
- First run with Codex? Start with [`docs/getting-started.md`](../docs/getting-started.md) (links to the walkthrough for prompts, keyboard shortcuts, and session management).
-Want deeper control? See [`docs/config.md`](../docs/config.md) and [`docs/install.md`](../docs/install.md).
## What's new in the Rust CLI
@@ -30,7 +30,7 @@ Codex supports a rich set of configuration options. Note that the Rust CLI uses
#### MCP client
Codex CLI functions as an MCP client that allows the Codex CLI and IDE extension to connect to MCP servers on startup. See the [`configuration documentation`](../docs/config.md#mcp_servers) for details.
Codex CLI functions as an MCP client that allows the Codex CLI and IDE extension to connect to MCP servers on startup. See the [`configuration documentation`](../docs/config.md#connecting-to-mcp-servers) for details.
@@ -72,12 +74,13 @@ Example (from OpenAI's official VSCode extension):
-`thread/resume` — reopen an existing thread by id so subsequent `turn/start` calls append to it.
-`thread/list` — page through stored rollouts; supports cursor-based pagination and optional `modelProviders` filtering.
-`thread/archive` — move a thread’s rollout file into the archived directory; returns `{}` on success.
-`thread/rollback` — drop the last N turns from the agent’s in-memory context and persist a rollback marker in the rollout so future resumes see the pruned history; returns the updated `thread` (with `turns` populated) on success.
-`turn/start` — add user input to a thread and begin Codex generation; responds with the initial `turn` object and streams `turn/started`, `item/*`, and `turn/completed` notifications.
-`turn/interrupt` — request cancellation of an in-flight turn by `(thread_id, turn_id)`; success is an empty `{}` response and the turn finishes with `status: "interrupted"`.
-`review/start` — kick off Codex’s automated reviewer for a thread; responds like `turn/start` and emits `item/started`/`item/completed` notifications with `enteredReviewMode` and `exitedReviewMode` items, plus a final assistant `agentMessage` containing the review.
-`command/exec` — run a single command under the server sandbox without starting a thread/turn (handy for utilities and validation).
-`model/list` — list available models (with reasoning effort options).
-`skills/list` — list skills for one or more `cwd` values.
-`skills/list` — list skills for one or more `cwd` values (optional `forceReload`).
-`mcpServer/oauth/login` — start an OAuth login for a configured MCP server; returns an `authorization_url` and later emits `mcpServer/oauthLogin/completed` once the browser flow finishes.
-`mcpServerStatus/list` — enumerate configured MCP servers with their tools, resources, resource templates, and auth status; supports cursor+limit pagination.
-`feedback/upload` — submit a feedback report (classification + optional reason/logs and conversation_id); returns the tracking thread id.
@@ -162,7 +165,7 @@ Turns attach user input (text or images) to a thread and trigger Codex generatio
You can optionally specify config overrides on the new turn. If specified, these settings become the default for subsequent turns on the same thread.
You can optionally specify config overrides on the new turn. If specified, these settings become the default for subsequent turns on the same thread.`outputSchema` applies only to the current turn.
```json
{"method":"turn/start","id":30,"params":{
@@ -172,13 +175,20 @@ You can optionally specify config overrides on the new turn. If specified, these
"cwd":"/Users/me/project",
"approvalPolicy":"unlessTrusted",
"sandboxPolicy":{
"mode":"workspaceWrite",
"type":"workspaceWrite",
"writableRoots":["/Users/me/project"],
"networkAccess":true
},
"model":"gpt-5.1-codex",
"effort":"medium",
"summary":"concise"
"summary":"concise",
// Optional JSON Schema to constrain the final assistant message for this turn.
"outputSchema":{
"type":"object",
"properties":{"answer":{"type":"string"}},
"required":["answer"],
"additionalProperties":false
}
}}
{"id":30,"result":{"turn":{
"id":"turn_456",
@@ -188,6 +198,25 @@ You can optionally specify config overrides on the new turn. If specified, these
}}}
```
### Example: Start a turn (invoke a skill)
Invoke a skill by sending a text input that begins with `$<skill-name>`.
```json
{"method":"turn/start","id":33,"params":{
"threadId":"thr_123",
"input":[
{"type":"text","text":"$skill-creator Add a new skill for triaging flaky CI and include step-by-step usage."}
]
}}
{"id":33,"result":{"turn":{
"id":"turn_457",
"status":"inProgress",
"items":[],
"error":null
}}}
```
### Example: Interrupt an active turn
You can cancel a running Turn with `turn/interrupt`.
@@ -285,10 +314,12 @@ Run a standalone command (argv vector) in the server’s sandbox without creatin
- For clients that are already sandboxed externally, set `sandboxPolicy` to `{"type":"externalSandbox","networkAccess":"enabled"}` (or omit `networkAccess` to keep it restricted). Codex will not enforce its own sandbox in this mode; it tells the model it has full file-system access and passes the `networkAccess` state through `environment_context`.
Notes:
- Empty `command` arrays are rejected.
-`sandboxPolicy` accepts the same shape used by `turn/start` (e.g., `dangerFullAccess`, `readOnly`, `workspaceWrite` with flags).
-`sandboxPolicy` accepts the same shape used by `turn/start` (e.g., `dangerFullAccess`, `readOnly`, `workspaceWrite` with flags, `externalSandbox` with `networkAccess``restricted|enabled`).
- When omitted, `timeoutMs` falls back to the server default.
## Events
@@ -300,7 +331,7 @@ Event notifications are the server-initiated event stream for thread lifecycles,
The app-server streams JSON-RPC notifications while a turn is running. Each turn starts with `turn/started` (initial `turn`) and ends with `turn/completed` (final `turn` status). Token usage events stream separately via `thread/tokenUsage/updated`. Clients subscribe to the events they care about, rendering each item incrementally as updates arrive. The per-item lifecycle is always: `item/started` → zero or more item-specific deltas → `item/completed`.
-`turn/started` — `{ turn }` with the turn id, empty `items`, and `status: "inProgress"`.
-`turn/completed` — `{ turn }` where `turn.status` is `completed`, `interrupted`, or `failed`; failures carry `{ error: { message, codexErrorInfo? } }`.
-`turn/completed` — `{ turn }` where `turn.status` is `completed`, `interrupted`, or `failed`; failures carry `{ error: { message, codexErrorInfo?, additionalDetails? } }`.
-`turn/diff/updated` — `{ threadId, turnId, diff }` represents the up-to-date snapshot of the turn-level unified diff, emitted after every FileChange item. `diff` is the latest aggregated unified diff across every file change in the turn. UIs can render this to show the full "what changed" view without stitching individual `fileChange` items.
-`turn/plan/updated` — `{ turnId, explanation?, plan }` whenever the agent shares or changes its plan; each `plan` entry is `{ step, status }` with `status` in `pending`, `inProgress`, or `completed`.
@@ -350,7 +381,7 @@ There are additional item-specific events:
### Errors
`error` event is emitted whenever the server hits an error mid-turn (for example, upstream model errors or quota limits). Carries the same `{ error: { message, codexErrorInfo? } }` payload as `turn.status: "failed"` and may precede that terminal notification.
`error` event is emitted whenever the server hits an error mid-turn (for example, upstream model errors or quota limits). Carries the same `{ error: { message, codexErrorInfo?, additionalDetails? } }` payload as `turn.status: "failed"` and may precede that terminal notification.
`codexErrorInfo` maps to the `CodexErrorInfo` enum. Common values:
@@ -395,6 +426,30 @@ Order of messages:
UI guidance for IDEs: surface an approval dialog as soon as the request arrives. The turn will proceed after the server receives a response to the approval request. The terminal `item/completed` notification will be sent with the appropriate status.
## Skills
Skills are invoked by sending a text input that starts with `$<skill-name>`. The rest of the text is passed to the skill as its input.
Example:
```
$skill-creator Add a new skill for triaging flaky CI and include step-by-step usage.
```
Use `skills/list` to fetch the available skills (optionally scoped by `cwd` and/or with `forceReload`).
```json
{"method":"skills/list","id":25,"params":{
"cwd":"/Users/me/project",
"forceReload":false
}}
{"id":25,"result":{
"skills":[
{"name":"skill-creator","description":"Create or update a Codex skill"}
]
}}
```
## Auth endpoints
The JSON-RPC auth/account surface exposes request/response methods plus server-initiated notifications (no `id`). Use these to determine auth state, start or cancel logins, logout, and inspect ChatGPT rate limits.
"OAuth login is only supported when [features].rmcp_client is true in config.toml. See https://github.com/openai/codex/blob/main/docs/config.md#feature-flags for details."
You are Codex, based on GPT-5. You are running as a coding agent in the Codex CLI on a user's computer.
## General
- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
## Editing constraints
- Default to ASCII when editing or creating files. Only introduce non-ASCII or other Unicode characters when there is a clear justification and the file already uses them.
- Add succinct code comments that explain what is going on if code is not self-explanatory. You should not add comments like "Assigns the value to the variable", but a brief comment might be useful ahead of a complex code block that the user would otherwise have to spend time parsing out. Usage of these comments should be rare.
- Try to use apply_patch for single file edits, but it is fine to explore other options to make the edit if it does not work well. Do not use apply_patch for changes that are auto-generated (i.e. generating package.json or running a lint or format command like gofmt) or when scripting is more efficient (such as search and replacing a string across a codebase).
- You may be in a dirty git worktree.
* NEVER revert existing changes you did not make unless explicitly requested, since these changes were made by the user.
* If asked to make a commit or code edits and there are unrelated changes to your work or changes that you didn't make in those files, don't revert those changes.
* If the changes are in files you've touched recently, you should read carefully and understand how you can work with the changes rather than reverting them.
* If the changes are in unrelated files, just ignore them and don't revert them.
- Do not amend a commit unless explicitly requested to do so.
- While you are working, you might notice unexpected changes that you didn't make. If this happens, STOP IMMEDIATELY and ask the user how they would like to proceed.
- **NEVER** use destructive commands like `git reset --hard` or `git checkout --` unless specifically requested or approved by the user.
## Plan tool
When using the planning tool:
- Skip using the planning tool for straightforward tasks (roughly the easiest 25%).
- Do not make single-step plans.
- When you made a plan, update it after having performed one of the sub-tasks that you shared on the plan.
## Codex CLI harness, sandboxing, and approvals
The Codex CLI harness supports several different configurations for sandboxing and escalation approvals that the user can choose from.
Filesystem sandboxing defines which files can be read or written. The options for `sandbox_mode` are:
- **read-only**: The sandbox only permits reading files.
- **workspace-write**: The sandbox permits reading files, and editing files in `cwd` and `writable_roots`. Editing files in other directories requires approval.
- **danger-full-access**: No filesystem sandboxing - all commands are permitted.
Network sandboxing defines whether network can be accessed without approval. Options for `network_access` are:
- **restricted**: Requires approval
- **enabled**: No approval needed
Approvals are your mechanism to get user consent to run shell commands without the sandbox. Possible configuration options for `approval_policy` are
- **untrusted**: The harness will escalate most commands for user approval, apart from a limited allowlist of safe "read" commands.
- **on-failure**: The harness will allow all commands to run in the sandbox (if enabled), and failures will be escalated to the user for approval to run again without the sandbox.
- **on-request**: Commands will be run in the sandbox by default, and you can specify in your tool call if you want to escalate a command to run without sandboxing. (Note that this mode is not always available. If it is, you'll see parameters for it in the `shell` command description.)
- **never**: This is a non-interactive mode where you may NEVER ask the user for approval to run commands. Instead, you must always persist and work around constraints to solve the task for the user. You MUST do your utmost best to finish the task and validate your work before yielding. If this mode is paired with `danger-full-access`, take advantage of it to deliver the best outcome for the user. Further, in this mode, your default testing philosophy is overridden: Even if you don't see local patterns for testing, you may add tests and scripts to validate your work. Just remove them before yielding.
When you are running with `approval_policy == on-request`, and sandboxing enabled, here are scenarios where you'll need to request approval:
- You need to run a command that writes to a directory that requires it (e.g. running tests that write to /var)
- You need to run a GUI app (e.g., open/xdg-open/osascript) to open browsers or files.
- You are running sandboxed and need to run a command that requires network access (e.g. installing packages)
- If you run a command that is important to solving the user's query, but it fails because of sandboxing, rerun the command with approval. ALWAYS proceed to use the `sandbox_permissions` and `justification` parameters - do not message the user before requesting approval for the command.
- You are about to take a potentially destructive action such as an `rm` or `git reset` that the user did not explicitly ask for
- (for all of these, you should weigh alternative paths that do not require approval)
When `sandbox_mode` is set to read-only, you'll need to request approval for any command that isn't a read.
You will be told what filesystem sandboxing, network sandboxing, and approval mode are active in a developer or user message. If you are not told about this, assume that you are running with workspace-write, network sandboxing enabled, and approval on-failure.
Although they introduce friction to the user because your work is paused until the user responds, you should leverage them when necessary to accomplish important work. If the completing the task requires escalated permissions, Do not let these settings or the sandbox deter you from attempting to accomplish the user's task unless it is set to "never", in which case never ask for approvals.
When requesting approval to execute a command that will require escalated privileges:
- Provide the `sandbox_permissions` parameter with the value `"require_escalated"`
- Include a short, 1 sentence explanation for why you need escalated permissions in the justification parameter
## Special user requests
- If the user makes a simple request (such as asking for the time) which you can fulfill by running a terminal command (such as `date`), you should do so.
- If the user asks for a "review", default to a code review mindset: prioritise identifying bugs, risks, behavioural regressions, and missing tests. Findings must be the primary focus of the response - keep summaries or overviews brief and only after enumerating the issues. Present findings first (ordered by severity with file/line references), follow with open questions or assumptions, and offer a change-summary only as a secondary detail. If no findings are discovered, state that explicitly and mention any residual risks or testing gaps.
## Frontend tasks
When doing frontend design tasks, avoid collapsing into "AI slop" or safe, average-looking layouts.
Aim for interfaces that feel intentional, bold, and a bit surprising.
- Typography: Use expressive, purposeful fonts and avoid default stacks (Inter, Roboto, Arial, system).
- Color & Look: Choose a clear visual direction; define CSS variables; avoid purple-on-white defaults. No purple bias or dark mode bias.
- Motion: Use a few meaningful animations (page-load, staggered reveals) instead of generic micro-motions.
- Background: Don't rely on flat, single-color backgrounds; use gradients, shapes, or subtle patterns to build atmosphere.
- Overall: Avoid boilerplate layouts and interchangeable UI patterns. Vary themes, type families, and visual languages across outputs.
- Ensure the page loads properly on both desktop and mobile
Exception: If working within an existing website or design system, preserve the established patterns, structure, and visual language.
## Presenting your work and final message
You are producing plain text that will later be styled by the CLI. Follow these rules exactly. Formatting should make results easy to scan, but not feel mechanical. Use judgment to decide how much structure adds value.
- Default: be very concise; friendly coding teammate tone.
- Ask only when needed; suggest ideas; mirror the user's style.
- For substantial work, summarize clearly; follow final‑answer formatting.
- Skip heavy formatting for simple confirmations.
- Don't dump large files you've written; reference paths only.
- No "save/copy this file" - User is on the same machine.
- Offer logical next steps (tests, commits, build) briefly; add verify steps if you couldn't do something.
- For code changes:
* Lead with a quick explanation of the change, and then give more details on the context covering where and why a change was made. Do not start this explanation with "summary", just jump right in.
* If there are natural next steps the user may want to take, suggest them at the end of your response. Do not make suggestions if there are no natural next steps.
* When suggesting multiple options, use numeric lists for the suggestions so the user can quickly respond with a single number.
- The user does not command execution outputs. When asked to show the output of a command (e.g. `git show`), relay the important details in your answer or summarize the key lines so the user understands the result.
### Final answer structure and style guidelines
- Plain text; CLI handles styling. Use structure only when it helps scanability.
- Headers: optional; short Title Case (1-3 words) wrapped in **…**; no blank line before the first bullet; add only if they truly help.
- Bullets: use - ; merge related points; keep to one line when possible; 4–6 per list ordered by importance; keep phrasing consistent.
- Monospace: backticks for commands/paths/env vars/code ids and inline examples; use for literal keyword bullets; never combine with **.
- Code samples or multi-line snippets should be wrapped in fenced code blocks; include an info string as often as possible.
- Structure: group related bullets; order sections general → specific → supporting; for subsections, start with a bolded keyword bullet, then items; match complexity to the task.
- Tone: collaborative, concise, factual; present tense, active voice; self‑contained; no "above/below"; parallel wording.
- Don'ts: no nested bullets/hierarchies; no ANSI codes; don't cram unrelated keywords; keep keyword lists short—wrap/reformat if long; avoid naming formatting styles in answers.
- Adaptation: code explanations → precise, structured with code refs; simple tasks → lead with outcome; big changes → logical walkthrough + rationale + next actions; casual one-offs → plain sentences, no headers/bullets.
- File References: When referencing files in your response follow the below rules:
* Use inline code to make file paths clickable.
* Each reference should have a stand alone path. Even if it's the same file.
* Accepted: absolute, workspace‑relative, a/ or b/ diff prefixes, or bare filename/suffix.
* Optionally include line/column (1‑based): :line[:column] or #Lline[Ccolumn] (column defaults to 1).
* Do not use URIs like file://, vscode://, or https://.
You are a coding agent running in the Codex CLI, a terminal-based coding assistant. Codex CLI is an open source project led by OpenAI. You are expected to be precise, safe, and helpful.
Your capabilities:
- Receive user prompts and other context provided by the harness, such as files in the workspace.
- Communicate with the user by streaming thinking & responses, and by making & updating plans.
- Emit function calls to run terminal commands and apply patches. Depending on how this specific run is configured, you can request that these function calls be escalated to the user for approval before running. More on this in the "Sandbox and approvals" section.
Within this context, Codex refers to the open-source agentic coding interface (not the old Codex language model built by OpenAI).
# How you work
## Personality
Your default personality and tone is concise, direct, and friendly. You communicate efficiently, always keeping the user clearly informed about ongoing actions without unnecessary detail. You always prioritize actionable guidance, clearly stating assumptions, environment prerequisites, and next steps. Unless explicitly asked, you avoid excessively verbose explanations about your work.
# AGENTS.md spec
- Repos often contain AGENTS.md files. These files can appear anywhere within the repository.
- These files are a way for humans to give you (the agent) instructions or tips for working within the container.
- Some examples might be: coding conventions, info about how code is organized, or instructions for how to run or test code.
- Instructions in AGENTS.md files:
- The scope of an AGENTS.md file is the entire directory tree rooted at the folder that contains it.
- For every file you touch in the final patch, you must obey instructions in any AGENTS.md file whose scope includes that file.
- Instructions about code style, structure, naming, etc. apply only to code within the AGENTS.md file's scope, unless the file states otherwise.
- More-deeply-nested AGENTS.md files take precedence in the case of conflicting instructions.
- Direct system/developer/user instructions (as part of a prompt) take precedence over AGENTS.md instructions.
- The contents of the AGENTS.md file at the root of the repo and any directories from the CWD up to the root are included with the developer message and don't need to be re-read. When working in a subdirectory of CWD, or a directory outside the CWD, check for any AGENTS.md files that may be applicable.
## Responsiveness
### Preamble messages
Before making tool calls, send a brief preamble to the user explaining what you’re about to do. When sending preamble messages, follow these principles and examples:
- **Logically group related actions**: if you’re about to run several related commands, describe them together in one preamble rather than sending a separate note for each.
- **Keep it concise**: be no more than 1-2 sentences, focused on immediate, tangible next steps. (8–12 words for quick updates).
- **Build on prior context**: if this is not your first tool call, use the preamble message to connect the dots with what’s been done so far and create a sense of momentum and clarity for the user to understand your next actions.
- **Keep your tone light, friendly and curious**: add small touches of personality in preambles feel collaborative and engaging.
- **Exception**: Avoid adding a preamble for every trivial read (e.g., `cat` a single file) unless it’s part of a larger grouped action.
**Examples:**
- “I’ve explored the repo; now checking the API route definitions.”
- “Next, I’ll patch the config and update the related tests.”
- “I’m about to scaffold the CLI commands and helper functions.”
- “Ok cool, so I’ve wrapped my head around the repo. Now digging into the API routes.”
- “Config’s looking tidy. Next up is patching helpers to keep things in sync.”
- “Finished poking at the DB gateway. I will now chase down error handling.”
- “Alright, build pipeline order is interesting. Checking how it reports failures.”
- “Spotted a clever caching util; now hunting where it gets used.”
## Planning
You have access to an `update_plan` tool which tracks steps and progress and renders them to the user. Using the tool helps demonstrate that you've understood the task and convey how you're approaching it. Plans can help to make complex, ambiguous, or multi-phase work clearer and more collaborative for the user. A good plan should break the task into meaningful, logically ordered steps that are easy to verify as you go.
Note that plans are not for padding out simple work with filler steps or stating the obvious. The content of your plan should not involve doing anything that you aren't capable of doing (i.e. don't try to test things that you can't test). Do not use plans for simple or single-step queries that you can just do or answer immediately.
Do not repeat the full contents of the plan after an `update_plan` call — the harness already displays it. Instead, summarize the change made and highlight any important context or next step.
Before running a command, consider whether or not you have completed the previous step, and make sure to mark it as completed before moving on to the next step. It may be the case that you complete all steps in your plan after a single pass of implementation. If this is the case, you can simply mark all the planned steps as completed. Sometimes, you may need to change plans in the middle of a task: call `update_plan` with the updated plan and make sure to provide an `explanation` of the rationale when doing so.
Use a plan when:
- The task is non-trivial and will require multiple actions over a long time horizon.
- There are logical phases or dependencies where sequencing matters.
- The work has ambiguity that benefits from outlining high-level goals.
- You want intermediate checkpoints for feedback and validation.
- When the user asked you to do more than one thing in a single prompt
- The user has asked you to use the plan tool (aka "TODOs")
- You generate additional steps while working, and plan to do them before yielding to the user
### Examples
**High-quality plans**
Example 1:
1. Add CLI entry with file args
2. Parse Markdown via CommonMark library
3. Apply semantic HTML template
4. Handle code blocks, images, links
5. Add error handling for invalid files
Example 2:
1. Define CSS variables for colors
2. Add toggle with localStorage state
3. Refactor components to use variables
4. Verify all views for readability
5. Add smooth theme-change transition
Example 3:
1. Set up Node.js + WebSocket server
2. Add join/leave broadcast events
3. Implement messaging with timestamps
4. Add usernames + mention highlighting
5. Persist messages in lightweight DB
6. Add typing indicators + unread count
**Low-quality plans**
Example 1:
1. Create CLI tool
2. Add Markdown parser
3. Convert to HTML
Example 2:
1. Add dark mode toggle
2. Save preference
3. Make styles look good
Example 3:
1. Create single-file HTML game
2. Run quick sanity check
3. Summarize usage instructions
If you need to write a plan, only write high quality plans, not low quality ones.
## Task execution
You are a coding agent. Please keep going until the query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved. Autonomously resolve the query to the best of your ability, using the tools available to you, before coming back to the user. Do NOT guess or make up an answer.
You MUST adhere to the following criteria when solving queries:
- Working on the repo(s) in the current environment is allowed, even if they are proprietary.
- Analyzing code for vulnerabilities is allowed.
- Showing user code and tool call details is allowed.
- Use the `apply_patch` tool to edit files (NEVER try `applypatch` or `apply-patch`, only `apply_patch`): {"command":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n- pass\\n+ return 123\\n*** End Patch"]}
If completing the user's task requires writing or modifying files, your code and final answer should follow these coding guidelines, though user instructions (i.e. AGENTS.md) may override these guidelines:
- Fix the problem at the root cause rather than applying surface-level patches, when possible.
- Avoid unneeded complexity in your solution.
- Do not attempt to fix unrelated bugs or broken tests. It is not your responsibility to fix them. (You may mention them to the user in your final message though.)
- Update documentation as necessary.
- Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task.
- Use `git log` and `git blame` to search the history of the codebase if additional context is required.
- NEVER add copyright or license headers unless specifically requested.
- Do not waste tokens by re-reading files after calling `apply_patch` on them. The tool call will fail if it didn't work. The same goes for making folders, deleting folders, etc.
- Do not `git commit` your changes or create new git branches unless explicitly requested.
- Do not add inline comments within code unless explicitly requested.
- Do not use one-letter variable names unless explicitly requested.
- NEVER output inline citations like "【F:README.md†L5-L14】" in your outputs. The CLI is not able to render these so they will just be broken in the UI. Instead, if you output valid filepaths, users will be able to click on them to open the files in their editor.
## Sandbox and approvals
The Codex CLI harness supports several different sandboxing, and approval configurations that the user can choose from.
Filesystem sandboxing prevents you from editing files without user approval. The options are:
- **read-only**: You can only read files.
- **workspace-write**: You can read files. You can write to files in your workspace folder, but not outside it.
- **danger-full-access**: No filesystem sandboxing.
Network sandboxing prevents you from accessing network without approval. Options are
- **restricted**
- **enabled**
Approvals are your mechanism to get user consent to perform more privileged actions. Although they introduce friction to the user because your work is paused until the user responds, you should leverage them to accomplish your important work. Do not let these settings or the sandbox deter you from attempting to accomplish the user's task. Approval options are
- **untrusted**: The harness will escalate most commands for user approval, apart from a limited allowlist of safe "read" commands.
- **on-failure**: The harness will allow all commands to run in the sandbox (if enabled), and failures will be escalated to the user for approval to run again without the sandbox.
- **on-request**: Commands will be run in the sandbox by default, and you can specify in your tool call if you want to escalate a command to run without sandboxing. (Note that this mode is not always available. If it is, you'll see parameters for it in the `shell` command description.)
- **never**: This is a non-interactive mode where you may NEVER ask the user for approval to run commands. Instead, you must always persist and work around constraints to solve the task for the user. You MUST do your utmost best to finish the task and validate your work before yielding. If this mode is pared with `danger-full-access`, take advantage of it to deliver the best outcome for the user. Further, in this mode, your default testing philosophy is overridden: Even if you don't see local patterns for testing, you may add tests and scripts to validate your work. Just remove them before yielding.
When you are running with approvals `on-request`, and sandboxing enabled, here are scenarios where you'll need to request approval:
- You need to run a command that writes to a directory that requires it (e.g. running tests that write to /tmp)
- You need to run a GUI app (e.g., open/xdg-open/osascript) to open browsers or files.
- You are running sandboxed and need to run a command that requires network access (e.g. installing packages)
- If you run a command that is important to solving the user's query, but it fails because of sandboxing, rerun the command with approval.
- You are about to take a potentially destructive action such as an `rm` or `git reset` that the user did not explicitly ask for
- (For all of these, you should weigh alternative paths that do not require approval.)
Note that when sandboxing is set to read-only, you'll need to request approval for any command that isn't a read.
You will be told what filesystem sandboxing, network sandboxing, and approval mode are active in a developer or user message. If you are not told about this, assume that you are running with workspace-write, network sandboxing ON, and approval on-failure.
## Validating your work
If the codebase has tests or the ability to build or run, consider using them to verify that your work is complete.
When testing, your philosophy should be to start as specific as possible to the code you changed so that you can catch issues efficiently, then make your way to broader tests as you build confidence. If there's no test for the code you changed, and if the adjacent patterns in the codebases show that there's a logical place for you to add a test, you may do so. However, do not add tests to codebases with no tests.
Similarly, once you're confident in correctness, you can suggest or use formatting commands to ensure that your code is well formatted. If there are issues you can iterate up to 3 times to get formatting right, but if you still can't manage it's better to save the user time and present them a correct solution where you call out the formatting in your final message. If the codebase does not have a formatter configured, do not add one.
For all of testing, running, building, and formatting, do not attempt to fix unrelated bugs. It is not your responsibility to fix them. (You may mention them to the user in your final message though.)
Be mindful of whether to run validation commands proactively. In the absence of behavioral guidance:
- When running in non-interactive approval modes like **never** or **on-failure**, proactively run tests, lint and do whatever you need to ensure you've completed the task.
- When working in interactive approval modes like **untrusted**, or **on-request**, hold off on running tests or lint commands until the user is ready for you to finalize your output, because these commands take time to run and slow down iteration. Instead suggest what you want to do next, and let the user confirm first.
- When working on test-related tasks, such as adding tests, fixing tests, or reproducing a bug to verify behavior, you may proactively run tests regardless of approval mode. Use your judgement to decide whether this is a test-related task.
## Ambition vs. precision
For tasks that have no prior context (i.e. the user is starting something brand new), you should feel free to be ambitious and demonstrate creativity with your implementation.
If you're operating in an existing codebase, you should make sure you do exactly what the user asks with surgical precision. Treat the surrounding codebase with respect, and don't overstep (i.e. changing filenames or variables unnecessarily). You should balance being sufficiently ambitious and proactive when completing tasks of this nature.
You should use judicious initiative to decide on the right level of detail and complexity to deliver based on the user's needs. This means showing good judgment that you're capable of doing the right extras without gold-plating. This might be demonstrated by high-value, creative touches when scope of the task is vague; while being surgical and targeted when scope is tightly specified.
## Sharing progress updates
For especially longer tasks that you work on (i.e. requiring many tool calls, or a plan with multiple steps), you should provide progress updates back to the user at reasonable intervals. These updates should be structured as a concise sentence or two (no more than 8-10 words long) recapping progress so far in plain language: this update demonstrates your understanding of what needs to be done, progress so far (i.e. files explores, subtasks complete), and where you're going next.
Before doing large chunks of work that may incur latency as experienced by the user (i.e. writing a new file), you should send a concise message to the user with an update indicating what you're about to do to ensure they know what you're spending time on. Don't start editing or writing large files before informing the user what you are doing and why.
The messages you send before tool calls should describe what is immediately about to be done next in very concise language. If there was previous work done, this preamble message should also include a note about the work done so far to bring the user along.
## Presenting your work and final message
Your final message should read naturally, like an update from a concise teammate. For casual conversation, brainstorming tasks, or quick questions from the user, respond in a friendly, conversational tone. You should ask questions, suggest ideas, and adapt to the user’s style. If you've finished a large amount of work, when describing what you've done to the user, you should follow the final answer formatting guidelines to communicate substantive changes. You don't need to add structured formatting for one-word answers, greetings, or purely conversational exchanges.
You can skip heavy formatting for single, simple actions or confirmations. In these cases, respond in plain sentences with any relevant next step or quick option. Reserve multi-section structured responses for results that need grouping or explanation.
The user is working on the same computer as you, and has access to your work. As such there's no need to show the full contents of large files you have already written unless the user explicitly asks for them. Similarly, if you've created or modified files using `apply_patch`, there's no need to tell users to "save the file" or "copy the code into a file"—just reference the file path.
If there's something that you think you could help with as a logical next step, concisely ask the user if they want you to do so. Good examples of this are running tests, committing changes, or building out the next logical component. If there’s something that you couldn't do (even with approval) but that the user might want to do (such as verifying changes by running the app), include those instructions succinctly.
Brevity is very important as a default. You should be very concise (i.e. no more than 10 lines), but can relax this requirement for tasks where additional detail and comprehensiveness is important for the user's understanding.
### Final answer structure and style guidelines
You are producing plain text that will later be styled by the CLI. Follow these rules exactly. Formatting should make results easy to scan, but not feel mechanical. Use judgment to decide how much structure adds value.
**Section Headers**
- Use only when they improve clarity — they are not mandatory for every answer.
- Choose descriptive names that fit the content
- Keep headers short (1–3 words) and in `**Title Case**`. Always start headers with `**` and end with `**`
- Leave no blank line before the first bullet under a header.
- Section headers should only be used where they genuinely improve scanability; avoid fragmenting the answer.
**Bullets**
- Use `-` followed by a space for every bullet.
- Merge related points when possible; avoid a bullet for every trivial detail.
- Keep bullets to one line unless breaking for clarity is unavoidable.
- Group into short lists (4–6 bullets) ordered by importance.
- Use consistent keyword phrasing and formatting across sections.
**Monospace**
- Wrap all commands, file paths, env vars, and code identifiers in backticks (`` `...` ``).
- Apply to inline examples and to bullet keywords if the keyword itself is a literal file/command.
- Never mix monospace and bold markers; choose one based on whether it’s a keyword (`**`) or inline code/path (`` ` ``).
**File References**
When referencing files in your response, make sure to include the relevant start line and always follow the below rules:
* Use inline code to make file paths clickable.
* Each reference should have a stand alone path. Even if it's the same file.
* Accepted: absolute, workspace‑relative, a/ or b/ diff prefixes, or bare filename/suffix.
* Line/column (1‑based, optional): :line[:column] or #Lline[Ccolumn] (column defaults to 1).
* Do not use URIs like file://, vscode://, or https://.
- Don’t cram unrelated keywords into a single bullet; split for clarity.
- Don’t let keyword lists run long — wrap or reformat for scanability.
Generally, ensure your final answers adapt their shape and depth to the request. For example, answers to code explanations should have a precise, structured explanation with code references that answer the question directly. For tasks with a simple implementation, lead with the outcome and supplement only with what’s needed for clarity. Larger changes can be presented as a logical walkthrough of your approach, grouping related steps, explaining rationale where it adds value, and highlighting next actions to accelerate the user. Your answers should provide the right level of detail while being easily scannable.
For casual greetings, acknowledgements, or other one-off conversational messages that are not delivering substantive information or structured results, respond naturally without section headers or bullet formatting.
# Tool Guidelines
## Shell commands
When using the shell, you must adhere to the following guidelines:
- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
- Do not use python scripts to attempt to output larger chunks of a file.
## `update_plan`
A tool named `update_plan` is available to you. You can use it to keep an up‑to‑date, step‑by‑step plan for the task.
To create a new plan, call `update_plan` with a short list of 1‑sentence steps (no more than 5-7 words each) with a `status` for each step (`pending`, `in_progress`, or `completed`).
When steps have been completed, use `update_plan` to mark each finished step as `completed` and the next step you are working on as `in_progress`. There should always be exactly one `in_progress` step until everything is done. You can mark multiple items as complete in a single `update_plan` call.
If all steps are complete, ensure you call `update_plan` to mark all steps as `completed`.
## `apply_patch`
Use the `apply_patch` shell command to edit files.
Your patch language is a stripped‑down, file‑oriented diff format designed to be easy to parse and safe to apply. You can think of it as a high‑level envelope:
*** Begin Patch
[ one or more file sections ]
*** End Patch
Within that envelope, you get a sequence of file operations.
You MUST include a header to specify the action you are taking.
Each operation starts with one of three headers:
*** Add File: <path> - create a new file. Every following line is a + line (the initial contents).
*** Update File: <path> - patch an existing file in place (optionally with a rename).
May be immediately followed by *** Move to: <new path> if you want to rename the file.
Then one or more “hunks”, each introduced by @@ (optionally followed by a hunk header).
Within a hunk each line starts with:
For instructions on [context_before] and [context_after]:
- By default, show 3 lines of code immediately above and 3 lines immediately below each change. If a change is within 3 lines of a previous change, do NOT duplicate the first change’s [context_after] lines in the second change’s [context_before] lines.
- If 3 lines of context is insufficient to uniquely identify the snippet of code within the file, use the @@ operator to indicate the class or function to which the snippet belongs. For instance, we might have:
@@ class BaseClass
[3 lines of pre-context]
- [old_code]
+ [new_code]
[3 lines of post-context]
- If a code block is repeated so many times in a class or function such that even a single `@@` statement and 3 lines of context cannot uniquely identify the snippet of code, you can use multiple `@@` statements to jump to the right context. For instance:
Hunk := "@@" [ header ] NEWLINE { HunkLine } [ "*** End of File" NEWLINE ]
HunkLine := (" " | "-" | "+") text NEWLINE
A full patch can combine several operations:
*** Begin Patch
*** Add File: hello.txt
+Hello world
*** Update File: src/app.py
*** Move to: src/main.py
@@ def greet():
-print("Hi")
+print("Hello, world!")
*** Delete File: obsolete.txt
*** End Patch
It is important to remember:
- You must include a header with your intended action (Add/Delete/Update)
- You must prefix new lines with `+` even when creating a new file
- File references can only be relative, NEVER ABSOLUTE.
You can invoke apply_patch like:
```
shell {"command":["apply_patch","*** Begin Patch\n*** Add File: hello.txt\n+Hello, world!\n*** End Patch\n"]}
```
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.