Commit Graph

17 Commits

Author SHA1 Message Date
Eric Traut
10336068db Fix flaky windows CI test (#10993)
Hardens PTY Python REPL test and make MCP test startup deterministic

**Summary**
- `utils/pty/src/tests.rs`
- Added a REPL readiness handshake (`wait_for_python_repl_ready`) that
repeatedly sends a marker and waits for it in PTY output before sending
test commands.
  - Updated `pty_python_repl_emits_output_and_exits` to:
    - wait for readiness first,
    - preserve startup output,
    - append output collected through process exit.
- Reduces Windows/ConPTY flakiness from early stdin writes racing REPL
startup.

- `mcp-server/tests/suite/codex_tool.rs`
- Avoid remote model refresh during MCP test startup, reducing
timeout-prone nondeterminism.
2026-02-07 08:55:42 -08:00
jif-oai
d2394a2494 chore: nuke chat/completions API (#10157) 2026-02-03 11:31:57 +00:00
Michael Bolin
66447d5d2c feat: replace custom mcp-types crate with equivalents from rmcp (#10349)
We started working with MCP in Codex before
https://crates.io/crates/rmcp was mature, so we had our own crate for
MCP types that was generated from the MCP schema:


8b95d3e082/codex-rs/mcp-types/README.md

Now that `rmcp` is more mature, it makes more sense to use their MCP
types in Rust, as they handle details (like the `_meta` field) that our
custom version ignored. Though one advantage that our custom types had
is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
whereas the types in `rmcp` do not. As such, part of the work of this PR
is leveraging the adapters between `rmcp` types and the serializable
types that are API for us (app server and MCP) introduced in #10356.

Note this PR results in a number of changes to
`codex-rs/app-server-protocol/schema`, which merit special attention
during review. We must ensure that these changes are still
backwards-compatible, which is possible because we have:

```diff
- export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
+ export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
```

so `ContentBlock` has been replaced with the more general `JsonValue`.
Note that `ContentBlock` was defined as:

```typescript
export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
```

so the deletion of those individual variants should not be a cause of
great concern.

Similarly, we have the following change in
`codex-rs/app-server-protocol/schema/typescript/Tool.ts`:

```
- export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
+ export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
```

so:

- `annotations?: ToolAnnotations` ➡️ `JsonValue`
- `inputSchema: ToolInputSchema` ➡️ `JsonValue`
- `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`

and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
* #10357
* __->__ #10349
* #10356
2026-02-02 17:41:55 -08:00
Michael Bolin
99f47d6e9a fix(mcp): include threadId in both content and structuredContent in CallToolResult (#9338) 2026-01-15 18:33:11 -08:00
Michael Bolin
0c09dc3c03 feat: add threadId to MCP server messages (#9192)
This favors `threadId` instead of `conversationId` so we use the same
terms as https://developers.openai.com/codex/sdk/.

To test the local build:

```
cd codex-rs
cargo build --bin codex
npx -y @modelcontextprotocol/inspector ./target/debug/codex mcp-server
```

I sent:

```json
{
  "method": "tools/call",
  "params": {
    "name": "codex",
    "arguments": {
      "prompt": "favorite ls option?"
    },
    "_meta": {
      "progressToken": 0
    }
  }
}
```

and got:

```json
{
  "content": [
    {
      "type": "text",
      "text": "`ls -lah` (or `ls -alh`) — long listing, includes dotfiles, human-readable sizes."
    }
  ],
  "structuredContent": {
    "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e"
  }
}
```

and successfully used the `threadId` in the follow-up with the
`codex-reply` tool call:

```json
{
  "method": "tools/call",
  "params": {
    "name": "codex-reply",
    "arguments": {
      "prompt": "what is the long versoin",
      "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e"
    },
    "_meta": {
      "progressToken": 1
    }
  }
}
```

whose response also has the `threadId`:

```json
{
  "content": [
    {
      "type": "text",
      "text": "Long listing is `ls -l` (adds permissions, owner/group, size, timestamp)."
    }
  ],
  "structuredContent": {
    "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e"
  }
}
```

Fixes https://github.com/openai/codex/issues/3712.
2026-01-13 22:14:41 -08:00
Ahmed Ibrahim
87f7226cca Assemble sandbox/approval/network prompts dynamically (#8961)
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.

Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
2026-01-12 23:12:59 +00:00
Eric Traut
c4af707e09 Removed experimental "command risk assessment" feature (#7799)
This experimental feature received lukewarm reception during internal
testing. Removing from the code base.
2025-12-10 09:48:11 -08:00
pakrym-oai
767b66f407 Migrate coverage to shell_command (#7042) 2025-11-21 03:44:00 +00:00
Anton Panasenko
9572cfc782 [codex] add developer instructions (#5897)
we are using developer instructions for code reviews, we need to pass
them in cli as well.
2025-10-30 11:18:31 -07:00
Eric Traut
f8af4f5c8d Added model summary and risk assessment for commands that violate sandbox policy (#5536)
This PR adds support for a model-based summary and risk assessment for
commands that violate the sandbox policy and require user approval. This
aids the user in evaluating whether the command should be approved.

The feature works by taking a failed command and passing it back to the
model and asking it to summarize the command, give it a risk level (low,
medium, high) and a risk category (e.g. "data deletion" or "data
exfiltration"). It uses a new conversation thread so the context in the
existing thread doesn't influence the answer. If the call to the model
fails or takes longer than 5 seconds, it falls back to the current
behavior.

For now, this is an experimental feature and is gated by a config key
`experimental_sandbox_command_assessment`.

Here is a screen shot of the approval prompt showing the risk assessment
and summary.

<img width="723" height="282" alt="image"
src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
/>
2025-10-24 15:23:44 -07:00
jif-oai
5e4f3bbb0b chore: rework tools execution workflow (#5278)
Re-work the tool execution flow. Read `orchestrator.rs` to understand
the structure
2025-10-20 20:57:37 +01:00
Michael Bolin
995f5c3614 feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222)
This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent`
in the core protocol (`protocol/src/protocol.rs`), which is also what
this field is named on `ExecCommandBeginEvent`. Honestly, I don't love
the name (it sounds like a single command, but it is actually a list of
them), but I don't want to get distracted by a naming discussion right
now.

This also adds `parsed_cmd` to `ExecCommandApprovalParams` in
`codex-rs/app-server-protocol/src/protocol.rs`, so it will be available
via `codex app-server`, as well.

For consistency, I also updated `ExecApprovalElicitRequestParams` in
`codex-rs/mcp-server/src/exec_approval.rs` to include this field under
the name `codex_parsed_cmd`, as that struct already has a number of
special `codex_*` fields. Note this is the code for when Codex is used
as an MCP _server_ and therefore has to conform to the official spec for
an MCP elicitation type.
2025-10-15 13:58:40 -07:00
Jeremy Rose
4a5f05c136 make tests pass cleanly in sandbox (#4067)
This changes the reqwest client used in tests to be sandbox-friendly,
and skips a bunch of other tests that don't work inside the
sandbox/without network.
2025-09-25 13:11:14 -07:00
jif-oai
be366a31ab chore: clippy on redundant closure (#4058)
Add redundant closure clippy rules and let Codex fix it by minimising
FQP
2025-09-22 19:30:16 +00:00
pakrym-oai
14a115d488 Add non_sandbox_test helper (#3880)
Makes tests shorter
2025-09-22 14:50:41 +00:00
Michael Bolin
1e9e703b96 chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848)
`test_shell_command_approval_triggers_elicitation()` is one of a number
of integration tests that we have observed to be flaky on GitHub CI, so
this PR tries to reduce the flakiness _and_ to provide us with more
information when it flakes. Specifically:

- Changed the command that we use to trigger the elicitation from `git
init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'`
because running `git` seems more likely to invite variance.
- Increased the timeout to wait for the task response from 10s to 20s.
- Added more logging.
2025-08-28 12:33:33 -07:00
Jeremy Rose
32bbbbad61 test: faster test execution in codex-core (#2633)
this dramatically improves time to run `cargo test -p codex-core` (~25x
speedup).

before:
```
cargo test -p codex-core  35.96s user 68.63s system 19% cpu 8:49.80 total
```

after:
```
cargo test -p codex-core  5.51s user 8.16s system 63% cpu 21.407 total
```

both tests measured "hot", i.e. on a 2nd run with no filesystem changes,
to exclude compile times.

approach inspired by [Delete Cargo Integration
Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
we move all test cases in tests/ into a single suite in order to have a
single binary, as there is significant overhead for each test binary
executed, and because test execution is only parallelized with a single
binary.
2025-08-24 11:10:53 -07:00