codex

mirror of https://github.com/openai/codex.git synced 2026-04-26 07:35:29 +00:00

Author	SHA1	Message	Date
Eric Traut	10336068db	Fix flaky windows CI test (#10993 ) Hardens PTY Python REPL test and make MCP test startup deterministic Summary - `utils/pty/src/tests.rs` - Added a REPL readiness handshake (`wait_for_python_repl_ready`) that repeatedly sends a marker and waits for it in PTY output before sending test commands. - Updated `pty_python_repl_emits_output_and_exits` to: - wait for readiness first, - preserve startup output, - append output collected through process exit. - Reduces Windows/ConPTY flakiness from early stdin writes racing REPL startup. - `mcp-server/tests/suite/codex_tool.rs` - Avoid remote model refresh during MCP test startup, reducing timeout-prone nondeterminism.	2026-02-07 08:55:42 -08:00
jif-oai	d2394a2494	chore: nuke chat/completions API (#10157 )	2026-02-03 11:31:57 +00:00
Michael Bolin	66447d5d2c	feat: replace custom mcp-types crate with equivalents from rmcp (#10349 ) We started working with MCP in Codex before https://crates.io/crates/rmcp was mature, so we had our own crate for MCP types that was generated from the MCP schema: `8b95d3e082/codex-rs/mcp-types/README.md` Now that `rmcp` is more mature, it makes more sense to use their MCP types in Rust, as they handle details (like the `_meta` field) that our custom version ignored. Though one advantage that our custom types had is that our generated types implemented `JsonSchema` and `ts_rs::TS`, whereas the types in `rmcp` do not. As such, part of the work of this PR is leveraging the adapters between `rmcp` types and the serializable types that are API for us (app server and MCP) introduced in #10356. Note this PR results in a number of changes to `codex-rs/app-server-protocol/schema`, which merit special attention during review. We must ensure that these changes are still backwards-compatible, which is possible because we have: ```diff - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, }; + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, }; ``` so `ContentBlock` has been replaced with the more general `JsonValue`. Note that `ContentBlock` was defined as: ```typescript export type ContentBlock = TextContent \| ImageContent \| AudioContent \| ResourceLink \| EmbeddedResource; ``` so the deletion of those individual variants should not be a cause of great concern. Similarly, we have the following change in `codex-rs/app-server-protocol/schema/typescript/Tool.ts`: ``` - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, }; + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, }; ``` so: - `annotations?: ToolAnnotations` ➡️ `JsonValue` - `inputSchema: ToolInputSchema` ➡️ `JsonValue` - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue` and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349). * #10357 * __->__ #10349 * #10356	2026-02-02 17:41:55 -08:00
Michael Bolin	99f47d6e9a	fix(mcp): include threadId in both content and structuredContent in CallToolResult (#9338 )	2026-01-15 18:33:11 -08:00
Michael Bolin	0c09dc3c03	feat: add threadId to MCP server messages (#9192 ) This favors `threadId` instead of `conversationId` so we use the same terms as https://developers.openai.com/codex/sdk/. To test the local build: ``` cd codex-rs cargo build --bin codex npx -y @modelcontextprotocol/inspector ./target/debug/codex mcp-server ``` I sent: ```json { "method": "tools/call", "params": { "name": "codex", "arguments": { "prompt": "favorite ls option?" }, "_meta": { "progressToken": 0 } } } ``` and got: ```json { "content": [ { "type": "text", "text": "`ls -lah` (or `ls -alh`) — long listing, includes dotfiles, human-readable sizes." } ], "structuredContent": { "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e" } } ``` and successfully used the `threadId` in the follow-up with the `codex-reply` tool call: ```json { "method": "tools/call", "params": { "name": "codex-reply", "arguments": { "prompt": "what is the long versoin", "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e" }, "_meta": { "progressToken": 1 } } } ``` whose response also has the `threadId`: ```json { "content": [ { "type": "text", "text": "Long listing is `ls -l` (adds permissions, owner/group, size, timestamp)." } ], "structuredContent": { "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e" } } ``` Fixes https://github.com/openai/codex/issues/3712.	2026-01-13 22:14:41 -08:00
Ahmed Ibrahim	87f7226cca	Assemble sandbox/approval/network prompts dynamically (#8961 ) - Add a single builder for developer permissions messaging that accepts SandboxPolicy and approval policy. This builder now drives the developer “permissions” message that’s injected at session start and any time sandbox/approval settings change. - Trim EnvironmentContext to only include cwd, writable roots, and shell; removed sandbox/approval/network duplication and adjusted XML serialization and tests accordingly. Follow-up: adding a config value to replace the developer permissions message for custom sandboxes.	2026-01-12 23:12:59 +00:00
Eric Traut	c4af707e09	Removed experimental "command risk assessment" feature (#7799 ) This experimental feature received lukewarm reception during internal testing. Removing from the code base.	2025-12-10 09:48:11 -08:00
pakrym-oai	767b66f407	Migrate coverage to shell_command (#7042 )	2025-11-21 03:44:00 +00:00
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
jif-oai	5e4f3bbb0b	chore: rework tools execution workflow (#5278 ) Re-work the tool execution flow. Read `orchestrator.rs` to understand the structure	2025-10-20 20:57:37 +01:00
Michael Bolin	995f5c3614	feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 ) This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent` in the core protocol (`protocol/src/protocol.rs`), which is also what this field is named on `ExecCommandBeginEvent`. Honestly, I don't love the name (it sounds like a single command, but it is actually a list of them), but I don't want to get distracted by a naming discussion right now. This also adds `parsed_cmd` to `ExecCommandApprovalParams` in `codex-rs/app-server-protocol/src/protocol.rs`, so it will be available via `codex app-server`, as well. For consistency, I also updated `ExecApprovalElicitRequestParams` in `codex-rs/mcp-server/src/exec_approval.rs` to include this field under the name `codex_parsed_cmd`, as that struct already has a number of special `codex_*` fields. Note this is the code for when Codex is used as an MCP _server_ and therefore has to conform to the official spec for an MCP elicitation type.	2025-10-15 13:58:40 -07:00
Jeremy Rose	4a5f05c136	make tests pass cleanly in sandbox (#4067 ) This changes the reqwest client used in tests to be sandbox-friendly, and skips a bunch of other tests that don't work inside the sandbox/without network.	2025-09-25 13:11:14 -07:00
jif-oai	be366a31ab	chore: clippy on redundant closure (#4058 ) Add redundant closure clippy rules and let Codex fix it by minimising FQP	2025-09-22 19:30:16 +00:00
pakrym-oai	14a115d488	Add non_sandbox_test helper (#3880 ) Makes tests shorter	2025-09-22 14:50:41 +00:00
Michael Bolin	1e9e703b96	chore: try to make it easier to debug the flakiness of test_shell_command_approval_triggers_elicitation (#2848 ) `test_shell_command_approval_triggers_elicitation()` is one of a number of integration tests that we have observed to be flaky on GitHub CI, so this PR tries to reduce the flakiness _and_ to provide us with more information when it flakes. Specifically: - Changed the command that we use to trigger the elicitation from `git init` to `python3 -c 'import pathlib; pathlib.Path(r"{}").touch()'` because running `git` seems more likely to invite variance. - Increased the timeout to wait for the task response from 10s to 20s. - Added more logging.	2025-08-28 12:33:33 -07:00
Jeremy Rose	32bbbbad61	test: faster test execution in codex-core (#2633 ) this dramatically improves time to run `cargo test -p codex-core` (~25x speedup). before: ``` cargo test -p codex-core 35.96s user 68.63s system 19% cpu 8:49.80 total ``` after: ``` cargo test -p codex-core 5.51s user 8.16s system 63% cpu 21.407 total ``` both tests measured "hot", i.e. on a 2nd run with no filesystem changes, to exclude compile times. approach inspired by [Delete Cargo Integration Tests](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html), we move all test cases in tests/ into a single suite in order to have a single binary, as there is significant overhead for each test binary executed, and because test execution is only parallelized with a single binary.	2025-08-24 11:10:53 -07:00

17 Commits