Commit Graph

18 Commits

Author SHA1 Message Date
Michael Bolin
61dfe0b86c chore: clean up argument-comment lint and roll out all-target CI on macOS (#16054)
## Why

`argument-comment-lint` was green in CI even though the repo still had
many uncommented literal arguments. The main gap was target coverage:
the repo wrapper did not force Cargo to inspect test-only call sites, so
examples like the `latest_session_lookup_params(true, ...)` tests in
`codex-rs/tui_app_server/src/lib.rs` never entered the blocking CI path.

This change cleans up the existing backlog, makes the default repo lint
path cover all Cargo targets, and starts rolling that stricter CI
enforcement out on the platform where it is currently validated.

## What changed

- mechanically fixed existing `argument-comment-lint` violations across
the `codex-rs` workspace, including tests, examples, and benches
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` and
`tools/argument-comment-lint/run.sh` so non-`--fix` runs default to
`--all-targets` unless the caller explicitly narrows the target set
- fixed both wrappers so forwarded cargo arguments after `--` are
preserved with a single separator
- documented the new default behavior in
`tools/argument-comment-lint/README.md`
- updated `rust-ci` so the macOS lint lane keeps the plain wrapper
invocation and therefore enforces `--all-targets`, while Linux and
Windows temporarily pass `-- --lib --bins`

That temporary CI split keeps the stricter all-targets check where it is
already cleaned up, while leaving room to finish the remaining Linux-
and Windows-specific target-gated cleanup before enabling
`--all-targets` on those runners. The Linux and Windows failures on the
intermediate revision were caused by the wrapper forwarding bug, not by
additional lint findings in those lanes.

## Validation

- `bash -n tools/argument-comment-lint/run.sh`
- `bash -n tools/argument-comment-lint/run-prebuilt-linter.sh`
- shell-level wrapper forwarding check for `-- --lib --bins`
- shell-level wrapper forwarding check for `-- --tests`
- `just argument-comment-lint`
- `cargo test` in `tools/argument-comment-lint`
- `cargo test -p codex-terminal-detection`

## Follow-up

- Clean up remaining Linux-only target-gated callsites, then switch the
Linux lint lane back to the plain wrapper invocation.
- Clean up remaining Windows-only target-gated callsites, then switch
the Windows lint lane back to the plain wrapper invocation.
2026-03-27 19:00:44 -07:00
Ruslan Nigmatullin
9022cdc563 app-server: Silence thread status changes caused by thread being created (#13079)
Currently we emit `thread/status/changed` with `Idle` status right
before sending `thread/started` event (which also has `Idle` status in
it).
It feels that there is no point in that as client has no way to know
prior state of the thread as it didn't exist yet, so silence these kinds
of notifications.
2026-03-03 00:52:28 +00:00
Michael Bolin
1af2a37ada chore: remove codex-core public protocol/shell re-exports (#12432)
## Why

`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.

This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.

## What Changed

- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
  - `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
  - `codex_protocol::protocol`
  - `codex_protocol::config_types`
  - `codex_protocol::models`
  - `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
  - `codex-utils-approval-presets`
  - `codex-utils-cli`

## Verification

- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
2026-02-20 23:45:35 -08:00
sayan-oai
41800fc876 chore: rm remote models fflag (#11699)
rm `remote_models` feature flag.

We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.

Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
2026-02-17 11:43:16 -08:00
Michael Bolin
aef4af1079 app-server tests: disable shell_snapshot for review suite (#11657)
## Why


`suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
was failing on Windows CI due to an unrelated process crash during shell
snapshot initialization (`tokio-runtime-worker` stack overflow).

This review test suite validates review API behavior and should not
depend on shell snapshot behavior. Keeping shell snapshot enabled in
this fixture made the test flaky for reasons outside the scenario under
test.

## What Changed

- Updated the review suite test config in
`codex-rs/app-server/tests/suite/v2/review.rs` to set:
  - `shell_snapshot = false`

This keeps the review tests focused on review behavior by disabling
shell snapshot initialization in this fixture.

## Verification

- `cargo test -p codex-app-server`
- Confirmed the previously failing Windows CI job for this test now
passes on this PR.
2026-02-12 23:56:43 +00:00
Owen Lin
76256a8cec fix: skip review_start_with_detached_delivery_returns_new_thread_id o… (#11645)
…n windows
2026-02-12 15:12:57 -08:00
Max Johnson
7053aa5457 Reapply "Add app-server transport layer with websocket support" (#11370)
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.

This reverts commit 47356ff83c.

## Summary

To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
   - inbound handling (`transport_event_rx`)
   - outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks

## Validation

Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests

<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-02-11 18:13:39 +00:00
pakrym-oai
bfd4e2112c Disable very flaky tests (#11394)
Collected from last 20 builds of main in
https://github.com/openai/codex/commits/main/.
2026-02-10 18:50:11 -08:00
Max Johnson
47356ff83c Revert "Add app-server transport layer with websocket support (#10693)" (#11323)
Suspected cause of deadlocking bug
2026-02-10 17:37:49 +00:00
Eric Traut
b3de6c7f2b Defer persistence of rollout file (#11028)
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes

For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests

Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
2026-02-07 23:05:03 -08:00
Eric Traut
4d52428fa2 Fixed a flaky test (#10970)
## Summary

Stabilize v2 review integration tests by making them hermetic with
respect to model discovery.

`app-server` review tests were intermittently timing out in CI
(especially on Windows runners) because their test config allowed remote
model refresh. During `thread/start`, the test process could issue live
`/v1/models` requests, introducing external network latency and
nondeterministic timing before review flow assertions.

This change disables remote model fetching in the review test config
helper used by these tests.
2026-02-06 21:26:26 -08:00
Owen Lin
d9ad5c3c49 fix(app-server): fix approval events in review mode (#10416)
One of our partners flagged that they were seeing the wrong order of
events when running `review/start` with command exec approvals:
```
{"method":"item/commandExecution/requestApproval","id":0,"params":{"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0","itemId":"0","reason":"`/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'` requires approval: Xcode-required approval: Require explicit user confirmation for all commands.","proposedExecpolicyAmendment":null}}

{"method":"item/started","params":{"item":{"type":"commandExecution","id":"call_AEjlbHqLYNM7kbU3N6uw1CNi","command":"/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'","cwd":"/Users/devingreen/Desktop/SampleProject","processId":null,"status":"inProgress","commandActions":[{"type":"unknown","command":"git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat"}],"aggregatedOutput":null,"exitCode":null,"durationMs":null},"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0"}}
```

**Key fix**: In the review sub‑agent delegate we were forwarding exec
(and patch) approvals using the parent turn id (`parent_ctx.sub_id`) as
the approval call_id. That made
`item/commandExecution/requestApproval.itemId` differ from the actual
`item/started` id. We now forward the sub‑agent’s `call_id` from the
approval event instead, so the approval item id matches the
commandExecution item id in review flows.

Here’s the expected event order for an inline `review/start` that
triggers an exec approval after this fix:
1. Response to review/start (JSON‑RPC response)
- Includes `turn` (status inProgress) and `review_thread_id` (same as
parent thread for inline).
2. `turn/started` notification
  - turnId is the review turn id (e.g., "0").
3. `item/started` → EnteredReviewMode
  - item.id == turnId, marks entry into review mode.
4. `item/started` → commandExecution
  - item.id == <call_id> (e.g., "review-call-1"), status: inProgress.
5. `item/commandExecution/requestApproval` request
  - JSON‑RPC request (not a notification).
  - params.itemId == <call_id> and params.turnId == turnId.
6. Client replies to approval request (Approved / Declined / etc).
7. If approved:
  - Optional `item/commandExecution/outputDelta` notifications.
  - `item/completed` → commandExecution with status and exitCode.
8. Review finishes:
  - `item/started` → ExitedReviewMode
  - `item/completed` → ExitedReviewMode
  - (Agent message items may also appear, depending on review output.)
9. `turn/completed` notification

The key being #4 and #5 are now in the proper order with the correct
item id.
2026-02-03 12:08:17 -08:00
Celia Chen
be4364bb80 [chore] move app server tests from chat completion to responses (#8939)
We are deprecating chat completions. Move all app server tests from chat
completion to responses.
2026-01-08 22:27:55 +00:00
jif-oai
4b78e2ab09 chore: review everywhere (#7444) 2025-12-02 11:26:27 +00:00
jif-oai
6eeaf46ac1 fix: other flaky tests (#7372) 2025-11-28 15:29:44 +00:00
jif-oai
aaec8abf58 feat: detached review (#7292) 2025-11-28 11:34:57 +00:00
Michael Bolin
a75321a64c fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847)
This adds the following fields to `ThreadStartResponse` and
`ThreadResumeResponse`:

```rust
    pub model: String,
    pub model_provider: String,
    pub cwd: PathBuf,
    pub approval_policy: AskForApproval,
    pub sandbox: SandboxPolicy,
    pub reasoning_effort: Option<ReasoningEffort>,
```

This is important because these fields are optional in
`ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
able to determine what values were ultimately used to start/resume the
conversation. (Though note that any of these could be changed later
between turns in the conversation.)

Though to get this information reliably, it must be read from the
internal `SessionConfiguredEvent` that is created in response to the
start of a conversation. Because `SessionConfiguredEvent` (as defined in
`codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
number of them had to be added as part of this PR.

Because `SessionConfiguredEvent` is referenced in many tests, test
instances of `SessionConfiguredEvent` had to be updated, as well, which
is why this PR touches so many files.
2025-11-18 21:18:43 -08:00
jif-oai
8ddae8cde3 feat: review in app server (#6613) 2025-11-18 21:58:54 +00:00