Compare commits

..

43 Commits

Author SHA1 Message Date
kevin zhao
876ade0445 serializing noMatch as an object 2025-11-19 21:34:39 +00:00
kevin zhao
2b8cdc7be3 renaming things 2025-11-19 21:19:01 +00:00
kevin zhao
20a4f95136 delete unecessary test 2025-11-19 20:41:28 +00:00
kevin zhao
b6cd0a5f02 calling it run() 2025-11-19 20:37:44 +00:00
kevin zhao
9ec2873084 commonizing cli logic 2025-11-19 20:30:20 +00:00
zhao-oai
745e2a6790 Add execpolicycheck subcommand 2025-11-19 12:05:40 -08:00
kevin zhao
119b1855f3 another note on readme 2025-11-19 10:40:56 -08:00
kevin zhao
4f2ee5f94c improving readme 2025-11-19 10:39:53 -08:00
kevin zhao
3cb8a0068d update readmes pt.2 2025-11-19 10:29:27 -08:00
kevin zhao
de9b3fd75d update READMEs 2025-11-19 10:29:13 -08:00
kevin zhao
408cc0b0f2 Merge branch 'dev/zhao/execpolicy2-core-integration' of github.com:openai/codex into dev/zhao/execpolicy2-core-integration 2025-11-19 09:47:06 -08:00
kevin zhao
5a1e6defd9 do not ensure dir and default.codexpolicy 2025-11-19 09:46:12 -08:00
zhao-oai
db36ccbe35 /// for docstring
Co-authored-by: Michael Bolin <mbolin@openai.com>
2025-11-19 09:04:37 -08:00
kevin zhao
9bb7589a36 with_escalated_permissions -> enum 2025-11-18 15:09:36 -08:00
kevin zhao
25bf30661b adding docstrings for ApprovalRequirement 2025-11-18 12:38:32 -08:00
kevin zhao
89271eccc5 using async 2025-11-18 12:31:05 -08:00
kevin zhao
a34b9fc259 load execpolicy from codex_home/policy and default to empty policy 2025-11-18 12:31:03 -08:00
zhao-oai
345050e1be running test single threaded
Co-authored-by: Michael Bolin <mbolin@openai.com>
2025-11-18 12:30:45 -08:00
kevin zhao
b5241d7f38 enabling execpolicy by default 2025-11-18 12:30:44 -08:00
kevin zhao
48a2db1a5a update debug message 2025-11-18 12:30:44 -08:00
kevin zhao
9cbe84748e addressing comments 2025-11-18 12:30:44 -08:00
kevin zhao
cda6857fff remove unused import 2025-11-18 12:30:44 -08:00
kevin zhao
7ac303b051 fix rebase error 2025-11-18 12:30:44 -08:00
kevin zhao
c9a34cd493 undo diff 2025-11-18 12:30:44 -08:00
kevin zhao
f69b225f44 exec_policy 2025-11-18 12:30:44 -08:00
kevin zhao
f8dc20279b rename test file 2025-11-18 12:30:44 -08:00
kevin zhao
a13b81adea update comment 2025-11-18 12:30:44 -08:00
kevin zhao
512a6c3386 update tracing:: message 2025-11-18 12:30:43 -08:00
kevin zhao
3cd5c23910 execpolicy2 -> execpolicy 2025-11-18 12:30:41 -08:00
kevin zhao
c6c03aed22 execpolicy2 core integration 2025-11-18 12:30:19 -08:00
kevin zhao
3990d90e10 precompute approval_requirement 2025-11-18 12:30:19 -08:00
kevin zhao
f18fdc97b3 execpolicy2 core integration
fix PR

undo keyring store
2025-11-18 12:30:17 -08:00
iceweasel-oai
4bada5a84d Prompt to turn on windows sandbox when auto mode selected. (#6618)
- stop prompting users to install WSL 
- prompt users to turn on Windows sandbox when auto mode requested.

<img width="1660" height="195" alt="Screenshot 2025-11-17 110612"
src="https://github.com/user-attachments/assets/c67fc239-a227-417e-94bb-599a8ed8f11e"
/>
<img width="1684" height="168" alt="Screenshot 2025-11-17 110637"
src="https://github.com/user-attachments/assets/d18c3370-830d-4971-8746-04757ae2f709"
/>
<img width="1655" height="293" alt="Screenshot 2025-11-17 110719"
src="https://github.com/user-attachments/assets/d21f6ce9-c23e-4842-baf6-8938b77c16db"
/>
2025-11-18 11:38:18 -08:00
Ahmed Ibrahim
3de8790714 Add the utility to truncate by tokens (#6746)
- This PR is to make it on path for truncating by tokens. This path will
be initially used by unified exec and context manager (responsible for
MCP calls mainly).
- We are exposing new config `calls_output_max_tokens`
- Use `tokens` as the main budget unit but truncate based on the model
family by Introducing `TruncationPolicy`.
- Introduce `truncate_text` as a router for truncation based on the
mode.

In next PRs:
- remove truncate_with_line_bytes_budget
- Add the ability to the model to override the token budget.
2025-11-18 11:36:23 -08:00
Alejandro Peña
b035c604b0 Update faq.md section on supported models (#6832)
Update faq.md to recommend usage of GPT-5.1 Codex, the latest Codex
model from OpenAI.
2025-11-18 09:38:45 -08:00
zhao-oai
e9e644a119 fixing localshell tool calls (#6823)
- Local-shell tool responses were always tagged as
`ExecCommandSource::UserShell` because handler would call
`run_exec_like` with `is_user_shell_cmd` set to true.
- Treat `ToolPayload::LocalShell` the same as other model generated
shell tool calls by deleting `is_user_shell_cmd` from `run_exec_like`
(since actual user shell commands follow a separate code path)
2025-11-18 17:28:26 +00:00
jif-oai
f5d9939cda feat: enable parallel tool calls (#6796) 2025-11-18 17:10:14 +00:00
jif-oai
838531d3e4 feat: remote compaction (#6795)
Co-authored-by: pakrym-oai <pakrym@openai.com>
2025-11-18 16:51:16 +00:00
jif-oai
0eb2e6f9ee nit: app server (#6830) 2025-11-18 16:34:13 +00:00
jif-oai
c20df79a38 nit: mark ghost commit as stable (#6833) 2025-11-18 16:05:49 +00:00
jif-oai
fc55fd7a81 feat: git branch tooling (#6831) 2025-11-18 15:26:09 +00:00
Lael
f3d4e210d8 🐛 fix(rmcp-client): refresh OAuth tokens using expires_at (#6574)
## Summary
- persist OAuth credential expiry timestamps and rehydrate `expires_in`
- proactively refresh rmcp OAuth tokens when `expires_at` is near, then
persist

## Testing
- just fmt
- just fix -p codex-rmcp-client
- cargo test -p codex-rmcp-client

Fixes #6572
2025-11-18 02:16:58 -05:00
Dylan Hurd
28ebe1c97a fix(windows) shell_command on windows, minor parsing (#6811)
## Summary
Enables shell_command for windows users, and starts adding some basic
command parsing here, to at least remove powershell prefixes. We'll
follow this up with command parsing but I wanted to land this change
separately with some basic UX.

**NOTE**: This implementation parses bash and powershell on both
platforms. In theory this is possible, since you can use git bash on
windows or powershell on linux. In practice, this may not be worth the
complexity of supporting, so I don't feel strongly about the current
approach vs. platform-specific branching.

## Testing
- [x] Added a bunch of tests 
- [x] Ran on both windows and os x
2025-11-17 22:23:53 -08:00
85 changed files with 2685 additions and 1360 deletions

View File

@@ -69,6 +69,39 @@ Codex can access MCP servers. To configure them, refer to the [config docs](./do
Codex CLI supports a rich set of configuration options, with preferences stored in `~/.codex/config.toml`. For full configuration options, see [Configuration](./docs/config.md).
### Execpolicy quickstart
Codex can enforce your own rules-based execution policy before it runs shell commands.
1. Create a policy directory: `mkdir -p ~/.codex/policy`.
2. Create one or more `.codexpolicy` files into that folder. Codex automatically loads every `.codexpolicy` file in there on startup.
3. Write `prefix_rule` entries to describe the commands you want to allow, prompt, or block:
```starlark
prefix_rule(
pattern = ["git", ["push", "fetch"]],
decision = "prompt", # allow | prompt | forbidden
match = [["git", "push", "origin", "main"]], # examples that must match
not_match = [["git", "status"]], # examples that must not match
)
```
- `pattern` is a list of shell tokens, evaluated from left to right; wrap tokens in a nested list to express alternatives (e.g., match both `push` and `fetch`).
- `decision` sets the severity; Codex picks the strictest decision when multiple rules match.
- `match` and `not_match` act as (optional) unit tests. Codex validates them when it loads your policy, so you get feedback if an example has unexpected behavior.
In this example rule, if Codex wants to run commands with the prefix `git push` or `git fetch`, it will first ask for user approval.
Note: If Codex wants to run a command that matches with multiple rules, it will use the strictest decision among the matched rules (forbidden > prompt > allow).
Use the `codex execpolicycheck` subcommand to preview decisions before you save a rule (see the [`execpolicy2` README](./codex-rs/execpolicy2/README.md) for syntax details):
```shell
codex execpolicycheck --policy ~/.codex/policy/default.codexpolicy git push origin main
```
Pass multiple `--policy` flags to test how several files combine, and use `--pretty` for formatted JSON output. See the [`codex-rs/execpolicy2` README](./codex-rs/execpolicy2/README.md) for a more detailed walkthrough of the available syntax.
---
### Docs & FAQ

1
codex-rs/Cargo.lock generated
View File

@@ -989,6 +989,7 @@ dependencies = [
"codex-common",
"codex-core",
"codex-exec",
"codex-execpolicy2",
"codex-login",
"codex-mcp-server",
"codex-process-hardening",

View File

@@ -708,6 +708,7 @@ mod tests {
use uuid::Uuid;
#[test]
#[ignore = "timing out"]
fn generated_ts_has_no_optional_nullable_fields() -> Result<()> {
// Assert that there are no types of the form "?: T | null" in the generated TS files.
let output_dir = std::env::temp_dir().join(format!("codex_ts_types_{}", Uuid::now_v7()));

View File

@@ -158,8 +158,8 @@ struct ActiveLogin {
login_id: Uuid,
}
impl ActiveLogin {
fn drop(&self) {
impl Drop for ActiveLogin {
fn drop(&mut self) {
self.shutdown_handle.shutdown();
}
}
@@ -417,7 +417,7 @@ impl CodexMessageProcessor {
{
let mut guard = self.active_login.lock().await;
if let Some(active) = guard.take() {
active.drop();
drop(active);
}
}
@@ -525,7 +525,7 @@ impl CodexMessageProcessor {
{
let mut guard = self.active_login.lock().await;
if let Some(existing) = guard.take() {
existing.drop();
drop(existing);
}
*guard = Some(ActiveLogin {
shutdown_handle: shutdown_handle.clone(),
@@ -615,7 +615,7 @@ impl CodexMessageProcessor {
{
let mut guard = self.active_login.lock().await;
if let Some(existing) = guard.take() {
existing.drop();
drop(existing);
}
*guard = Some(ActiveLogin {
shutdown_handle: shutdown_handle.clone(),
@@ -704,7 +704,7 @@ impl CodexMessageProcessor {
let mut guard = self.active_login.lock().await;
if guard.as_ref().map(|l| l.login_id) == Some(login_id) {
if let Some(active) = guard.take() {
active.drop();
drop(active);
}
Ok(())
} else {
@@ -758,7 +758,7 @@ impl CodexMessageProcessor {
{
let mut guard = self.active_login.lock().await;
if let Some(active) = guard.take() {
active.drop();
drop(active);
}
}

View File

@@ -19,6 +19,10 @@ pub(crate) async fn run_fuzzy_file_search(
roots: Vec<String>,
cancellation_flag: Arc<AtomicBool>,
) -> Vec<FuzzyFileSearchResult> {
if roots.is_empty() {
return Vec::new();
}
#[expect(clippy::expect_used)]
let limit_per_root =
NonZero::new(LIMIT_PER_ROOT).expect("LIMIT_PER_ROOT should be a valid non-zero usize");

View File

@@ -47,7 +47,7 @@ pub async fn run_main(
) -> IoResult<()> {
// Set up channels.
let (incoming_tx, mut incoming_rx) = mpsc::channel::<JSONRPCMessage>(CHANNEL_CAPACITY);
let (outgoing_tx, mut outgoing_rx) = mpsc::unbounded_channel::<OutgoingMessage>();
let (outgoing_tx, mut outgoing_rx) = mpsc::channel::<OutgoingMessage>(CHANNEL_CAPACITY);
// Task: read from stdin, push to `incoming_tx`.
let stdin_reader_handle = tokio::spawn({

View File

@@ -19,12 +19,12 @@ use crate::error_code::INTERNAL_ERROR_CODE;
/// Sends messages to the client and manages request callbacks.
pub(crate) struct OutgoingMessageSender {
next_request_id: AtomicI64,
sender: mpsc::UnboundedSender<OutgoingMessage>,
sender: mpsc::Sender<OutgoingMessage>,
request_id_to_callback: Mutex<HashMap<RequestId, oneshot::Sender<Result>>>,
}
impl OutgoingMessageSender {
pub(crate) fn new(sender: mpsc::UnboundedSender<OutgoingMessage>) -> Self {
pub(crate) fn new(sender: mpsc::Sender<OutgoingMessage>) -> Self {
Self {
next_request_id: AtomicI64::new(0),
sender,
@@ -45,8 +45,12 @@ impl OutgoingMessageSender {
}
let outgoing_message =
OutgoingMessage::Request(request.request_with_id(outgoing_message_id));
let _ = self.sender.send(outgoing_message);
OutgoingMessage::Request(request.request_with_id(outgoing_message_id.clone()));
if let Err(err) = self.sender.send(outgoing_message).await {
warn!("failed to send request {outgoing_message_id:?} to client: {err:?}");
let mut request_id_to_callback = self.request_id_to_callback.lock().await;
request_id_to_callback.remove(&outgoing_message_id);
}
rx_approve
}
@@ -72,7 +76,9 @@ impl OutgoingMessageSender {
match serde_json::to_value(response) {
Ok(result) => {
let outgoing_message = OutgoingMessage::Response(OutgoingResponse { id, result });
let _ = self.sender.send(outgoing_message);
if let Err(err) = self.sender.send(outgoing_message).await {
warn!("failed to send response to client: {err:?}");
}
}
Err(err) => {
self.send_error(
@@ -89,21 +95,29 @@ impl OutgoingMessageSender {
}
pub(crate) async fn send_server_notification(&self, notification: ServerNotification) {
let _ = self
if let Err(err) = self
.sender
.send(OutgoingMessage::AppServerNotification(notification));
.send(OutgoingMessage::AppServerNotification(notification))
.await
{
warn!("failed to send server notification to client: {err:?}");
}
}
/// All notifications should be migrated to [`ServerNotification`] and
/// [`OutgoingMessage::Notification`] should be removed.
pub(crate) async fn send_notification(&self, notification: OutgoingNotification) {
let outgoing_message = OutgoingMessage::Notification(notification);
let _ = self.sender.send(outgoing_message);
if let Err(err) = self.sender.send(outgoing_message).await {
warn!("failed to send notification to client: {err:?}");
}
}
pub(crate) async fn send_error(&self, id: RequestId, error: JSONRPCErrorError) {
let outgoing_message = OutgoingMessage::Error(OutgoingError { id, error });
let _ = self.sender.send(outgoing_message);
if let Err(err) = self.sender.send(outgoing_message).await {
warn!("failed to send error to client: {err:?}");
}
}
}

View File

@@ -26,6 +26,7 @@ codex-cloud-tasks = { path = "../cloud-tasks" }
codex-common = { workspace = true, features = ["cli"] }
codex-core = { workspace = true }
codex-exec = { workspace = true }
codex-execpolicy2 = { workspace = true }
codex-login = { workspace = true }
codex-mcp-server = { workspace = true }
codex-process-hardening = { workspace = true }

View File

@@ -18,6 +18,7 @@ use codex_cli::login::run_logout;
use codex_cloud_tasks::Cli as CloudTasksCli;
use codex_common::CliConfigOverrides;
use codex_exec::Cli as ExecCli;
use codex_execpolicy2::ExecPolicyCheckCommand;
use codex_responses_api_proxy::Args as ResponsesApiProxyArgs;
use codex_tui::AppExitInfo;
use codex_tui::Cli as TuiCli;
@@ -112,6 +113,10 @@ enum Subcommand {
#[clap(hide = true, name = "stdio-to-uds")]
StdioToUds(StdioToUdsCommand),
/// Check execpolicy files against a command.
#[clap(name = "execpolicycheck")]
ExecPolicyCheck(ExecPolicyCheckCommand),
/// Inspect feature flags.
Features(FeaturesCli),
}
@@ -323,6 +328,12 @@ fn run_update_action(action: UpdateAction) -> anyhow::Result<()> {
Ok(())
}
fn run_execpolicycheck(cmd: ExecPolicyCheckCommand) -> anyhow::Result<()> {
let json = cmd.run()?;
println!("{json}");
Ok(())
}
#[derive(Debug, Default, Parser, Clone)]
struct FeatureToggles {
/// Enable a feature (repeatable). Equivalent to `-c features.<name>=true`.
@@ -559,6 +570,7 @@ async fn cli_main(codex_linux_sandbox_exe: Option<PathBuf>) -> anyhow::Result<()
tokio::task::spawn_blocking(move || codex_stdio_to_uds::run(socket_path.as_path()))
.await??;
}
Some(Subcommand::ExecPolicyCheck(cmd)) => run_execpolicycheck(cmd)?,
Some(Subcommand::Features(FeaturesCli { sub })) => match sub {
FeaturesSubcommand::List => {
// Respect root-level `-c` overrides plus top-level flags like `--profile`.

View File

@@ -24,21 +24,21 @@ pub fn builtin_approval_presets() -> Vec<ApprovalPreset> {
ApprovalPreset {
id: "read-only",
label: "Read Only",
description: "Codex can read files and answer questions. Codex requires approval to make edits, run commands, or access network.",
description: "Requires approval to edit files and run commands.",
approval: AskForApproval::OnRequest,
sandbox: SandboxPolicy::ReadOnly,
},
ApprovalPreset {
id: "auto",
label: "Auto",
description: "Codex can read files, make edits, and run commands in the workspace. Codex requires approval to work outside the workspace or access network.",
label: "Agent",
description: "Read and edit files, and run commands.",
approval: AskForApproval::OnRequest,
sandbox: SandboxPolicy::new_workspace_write_policy(),
},
ApprovalPreset {
id: "full-access",
label: "Full Access",
description: "Codex can read files, make edits, and run commands with network access, without approval. Exercise caution.",
label: "Agent (full access)",
description: "Codex can edit files outside this workspace and run commands with network access. Exercise caution when using.",
approval: AskForApproval::Never,
sandbox: SandboxPolicy::DangerFullAccess,
},

View File

@@ -81,6 +81,7 @@ pub(crate) async fn stream_chat_completions(
ResponseItem::CustomToolCallOutput { .. } => {}
ResponseItem::WebSearchCall { .. } => {}
ResponseItem::GhostSnapshot { .. } => {}
ResponseItem::CompactionSummary { .. } => {}
}
}
@@ -320,7 +321,8 @@ pub(crate) async fn stream_chat_completions(
}
ResponseItem::Reasoning { .. }
| ResponseItem::WebSearchCall { .. }
| ResponseItem::Other => {
| ResponseItem::Other
| ResponseItem::CompactionSummary { .. } => {
// Omit these items from the conversation history.
continue;
}

View File

@@ -26,6 +26,7 @@ use tokio::sync::mpsc;
use tokio::time::timeout;
use tokio_util::io::ReaderStream;
use tracing::debug;
use tracing::enabled;
use tracing::trace;
use tracing::warn;
@@ -78,6 +79,18 @@ struct Error {
resets_at: Option<i64>,
}
#[derive(Debug, Serialize)]
struct CompactHistoryRequest<'a> {
model: &'a str,
input: &'a [ResponseItem],
instructions: &'a str,
}
#[derive(Debug, Deserialize)]
struct CompactHistoryResponse {
output: Vec<ResponseItem>,
}
#[derive(Debug, Clone)]
pub struct ModelClient {
config: Arc<Config>,
@@ -507,6 +520,70 @@ impl ModelClient {
pub fn get_auth_manager(&self) -> Option<Arc<AuthManager>> {
self.auth_manager.clone()
}
pub async fn compact_conversation_history(&self, prompt: &Prompt) -> Result<Vec<ResponseItem>> {
if prompt.input.is_empty() {
return Ok(Vec::new());
}
let auth_manager = self.auth_manager.clone();
let auth = auth_manager.as_ref().and_then(|m| m.auth());
let mut req_builder = self
.provider
.create_compact_request_builder(&self.client, &auth)
.await?;
if let SessionSource::SubAgent(sub) = &self.session_source {
let subagent = if let crate::protocol::SubAgentSource::Other(label) = sub {
label.clone()
} else {
serde_json::to_value(sub)
.ok()
.and_then(|v| v.as_str().map(std::string::ToString::to_string))
.unwrap_or_else(|| "other".to_string())
};
req_builder = req_builder.header("x-openai-subagent", subagent);
}
if let Some(auth) = auth.as_ref()
&& auth.mode == AuthMode::ChatGPT
&& let Some(account_id) = auth.get_account_id()
{
req_builder = req_builder.header("chatgpt-account-id", account_id);
}
let payload = CompactHistoryRequest {
model: &self.config.model,
input: &prompt.input,
instructions: &prompt.get_full_instructions(&self.config.model_family),
};
if enabled!(tracing::Level::TRACE) {
trace!(
"POST to {}: {}",
self.provider
.get_compact_url(&auth)
.unwrap_or("<none>".to_string()),
serde_json::to_value(&payload).unwrap_or_default()
);
}
let response = req_builder
.json(&payload)
.send()
.await
.map_err(|source| CodexErr::ConnectionFailed(ConnectionFailedError { source }))?;
let status = response.status();
let body = response
.text()
.await
.map_err(|source| CodexErr::ConnectionFailed(ConnectionFailedError { source }))?;
if !status.is_success() {
return Err(CodexErr::UnexpectedStatus(UnexpectedResponseError {
status,
body,
request_id: None,
}));
}
let CompactHistoryResponse { output } = serde_json::from_str(&body)?;
Ok(output)
}
}
enum StreamAttemptError {

View File

@@ -13,6 +13,7 @@ use crate::parse_command::parse_command;
use crate::parse_turn_item;
use crate::response_processing::process_items;
use crate::terminal;
use crate::truncate::TruncationPolicy;
use crate::user_notification::UserNotifier;
use crate::util::error_or_panic;
use async_channel::Receiver;
@@ -55,6 +56,7 @@ use crate::ModelProviderInfo;
use crate::client::ModelClient;
use crate::client_common::Prompt;
use crate::client_common::ResponseEvent;
use crate::compact::collect_user_messages;
use crate::config::Config;
use crate::config::types::ShellEnvironmentPolicy;
use crate::context_manager::ContextManager;
@@ -63,10 +65,6 @@ use crate::error::CodexErr;
use crate::error::Result as CodexResult;
#[cfg(test)]
use crate::exec::StreamOutput;
// Removed: legacy executor wiring replaced by ToolOrchestrator flows.
// legacy normalize_exec_result no longer used after orchestrator migration
use crate::compact::build_compacted_history;
use crate::compact::collect_user_messages;
use crate::mcp::auth::compute_auth_statuses;
use crate::mcp_connection_manager::McpConnectionManager;
use crate::model_family::find_family_for_model;
@@ -120,7 +118,7 @@ use crate::user_instructions::UserInstructions;
use crate::user_notification::UserNotification;
use crate::util::backoff;
use codex_async_utils::OrCancelExt;
use codex_execpolicy2::Policy as ExecPolicyV2;
use codex_execpolicy2::Policy as ExecPolicy;
use codex_otel::otel_event_manager::OtelEventManager;
use codex_protocol::config_types::ReasoningEffort as ReasoningEffortConfig;
use codex_protocol::config_types::ReasoningSummary as ReasoningSummaryConfig;
@@ -166,9 +164,9 @@ impl Codex {
let user_instructions = get_user_instructions(&config).await;
let exec_policy_v2 =
crate::exec_policy::exec_policy_for(&config.features, &config.codex_home)
.map_err(|err| CodexErr::Fatal(format!("failed to load execpolicy2: {err}")))?;
let exec_policy = crate::exec_policy::exec_policy_for(&config.features, &config.codex_home)
.await
.map_err(|err| CodexErr::Fatal(format!("failed to load execpolicy: {err}")))?;
let config = Arc::new(config);
@@ -186,7 +184,7 @@ impl Codex {
cwd: config.cwd.clone(),
original_config_do_not_use: Arc::clone(&config),
features: config.features.clone(),
exec_policy_v2,
exec_policy,
session_source,
};
@@ -284,7 +282,8 @@ pub(crate) struct TurnContext {
pub(crate) final_output_json_schema: Option<Value>,
pub(crate) codex_linux_sandbox_exe: Option<PathBuf>,
pub(crate) tool_call_gate: Arc<ReadinessFlag>,
pub(crate) exec_policy_v2: Option<Arc<ExecPolicyV2>>,
pub(crate) exec_policy: Arc<ExecPolicy>,
pub(crate) truncation_policy: TruncationPolicy,
}
impl TurnContext {
@@ -341,8 +340,8 @@ pub(crate) struct SessionConfiguration {
/// Set of feature flags for this session
features: Features,
/// Optional execpolicy2 policy, applied only when enabled by feature flag.
exec_policy_v2: Option<Arc<ExecPolicyV2>>,
/// Execpolicy policy, applied only when enabled by feature flag.
exec_policy: Arc<ExecPolicy>,
// TODO(pakrym): Remove config from here
original_config_do_not_use: Arc<Config>,
@@ -413,7 +412,7 @@ impl Session {
);
let client = ModelClient::new(
Arc::new(per_turn_config),
Arc::new(per_turn_config.clone()),
auth_manager,
otel_event_manager,
provider,
@@ -443,7 +442,8 @@ impl Session {
final_output_json_schema: None,
codex_linux_sandbox_exe: config.codex_linux_sandbox_exe.clone(),
tool_call_gate: Arc::new(ReadinessFlag::new()),
exec_policy_v2: session_configuration.exec_policy_v2.clone(),
exec_policy: session_configuration.exec_policy.clone(),
truncation_policy: TruncationPolicy::new(&per_turn_config),
}
}
@@ -691,7 +691,8 @@ impl Session {
let reconstructed_history =
self.reconstruct_history_from_rollout(&turn_context, &rollout_items);
if !reconstructed_history.is_empty() {
self.record_into_history(&reconstructed_history).await;
self.record_into_history(&reconstructed_history, &turn_context)
.await;
}
// If persisting, persist all rollout items as-is (recorder filters)
@@ -948,7 +949,7 @@ impl Session {
turn_context: &TurnContext,
items: &[ResponseItem],
) {
self.record_into_history(items).await;
self.record_into_history(items, turn_context).await;
self.persist_rollout_response_items(items).await;
self.send_raw_response_items(turn_context, items).await;
}
@@ -962,17 +963,25 @@ impl Session {
for item in rollout_items {
match item {
RolloutItem::ResponseItem(response_item) => {
history.record_items(std::iter::once(response_item));
history.record_items(
std::iter::once(response_item),
turn_context.truncation_policy,
);
}
RolloutItem::Compacted(compacted) => {
let snapshot = history.get_history();
let user_messages = collect_user_messages(&snapshot);
let rebuilt = build_compacted_history(
self.build_initial_context(turn_context),
&user_messages,
&compacted.message,
);
history.replace(rebuilt);
// TODO(jif) clean
if let Some(replacement) = &compacted.replacement_history {
history.replace(replacement.clone());
} else {
let user_messages = collect_user_messages(&snapshot);
let rebuilt = compact::build_compacted_history(
self.build_initial_context(turn_context),
&user_messages,
&compacted.message,
);
history.replace(rebuilt);
}
}
_ => {}
}
@@ -981,9 +990,13 @@ impl Session {
}
/// Append ResponseItems to the in-memory conversation history only.
pub(crate) async fn record_into_history(&self, items: &[ResponseItem]) {
pub(crate) async fn record_into_history(
&self,
items: &[ResponseItem],
turn_context: &TurnContext,
) {
let mut state = self.state.lock().await;
state.record_items(items.iter());
state.record_items(items.iter(), turn_context.truncation_policy);
}
pub(crate) async fn replace_history(&self, items: Vec<ResponseItem>) {
@@ -1000,6 +1013,15 @@ impl Session {
self.persist_rollout_items(&rollout_items).await;
}
pub async fn enabled(&self, feature: Feature) -> bool {
self.state
.lock()
.await
.session_configuration
.features
.enabled(feature)
}
async fn send_raw_response_items(&self, turn_context: &TurnContext, items: &[ResponseItem]) {
for item in items {
self.send_event(
@@ -1177,14 +1199,7 @@ impl Session {
turn_context: Arc<TurnContext>,
cancellation_token: CancellationToken,
) {
if !self
.state
.lock()
.await
.session_configuration
.features
.enabled(Feature::GhostCommit)
{
if !self.enabled(Feature::GhostCommit).await {
return;
}
let token = match turn_context.tool_call_gate.subscribe().await {
@@ -1425,7 +1440,7 @@ mod handlers {
use codex_protocol::protocol::ReviewDecision;
use codex_protocol::protocol::ReviewRequest;
use codex_protocol::protocol::TurnAbortReason;
use codex_protocol::user_input::UserInput;
use std::sync::Arc;
use tracing::info;
use tracing::warn;
@@ -1633,16 +1648,9 @@ mod handlers {
let turn_context = sess
.new_turn_with_sub_id(sub_id, SessionSettingsUpdate::default())
.await;
// Attempt to inject input into current task
if let Err(items) = sess
.inject_input(vec![UserInput::Text {
text: turn_context.compact_prompt().to_string(),
}])
.await
{
sess.spawn_task(Arc::clone(&turn_context), items, CompactTask)
.await;
}
sess.spawn_task(Arc::clone(&turn_context), vec![], CompactTask)
.await;
}
pub async fn shutdown(sess: &Arc<Session>, sub_id: String) -> bool {
@@ -1768,7 +1776,8 @@ async fn spawn_review_thread(
final_output_json_schema: None,
codex_linux_sandbox_exe: parent_turn_context.codex_linux_sandbox_exe.clone(),
tool_call_gate: Arc::new(ReadinessFlag::new()),
exec_policy_v2: parent_turn_context.exec_policy_v2.clone(),
exec_policy: parent_turn_context.exec_policy.clone(),
truncation_policy: TruncationPolicy::new(&per_turn_config),
};
// Seed the child task with the review prompt as the initial user message.
@@ -1948,12 +1957,30 @@ async fn run_turn(
.client
.get_model_family()
.supports_parallel_tool_calls;
let parallel_tool_calls = model_supports_parallel;
// TODO(jif) revert once testing phase is done.
let parallel_tool_calls = model_supports_parallel
&& sess
.state
.lock()
.await
.session_configuration
.features
.enabled(Feature::ParallelToolCalls);
let mut base_instructions = turn_context.base_instructions.clone();
if parallel_tool_calls {
static INSTRUCTIONS: &str = include_str!("../templates/parallel/instructions.md");
static INSERTION_SPOT: &str = "## Editing constraints";
base_instructions
.as_mut()
.map(|base| base.replace(INSERTION_SPOT, INSTRUCTIONS));
}
let prompt = Prompt {
input,
tools: router.specs(),
parallel_tool_calls,
base_instructions_override: turn_context.base_instructions.clone(),
base_instructions_override: base_instructions,
output_schema: turn_context.final_output_json_schema.clone(),
};
@@ -2557,7 +2584,7 @@ mod tests {
cwd: config.cwd.clone(),
original_config_do_not_use: Arc::clone(&config),
features: Features::default(),
exec_policy_v2: None,
exec_policy: Arc::new(codex_execpolicy2::PolicyParser::new().build()),
session_source: SessionSource::Exec,
};
@@ -2635,7 +2662,7 @@ mod tests {
cwd: config.cwd.clone(),
original_config_do_not_use: Arc::clone(&config),
features: Features::default(),
exec_policy_v2: None,
exec_policy: Arc::new(codex_execpolicy2::PolicyParser::new().build()),
session_source: SessionSource::Exec,
};
@@ -2884,7 +2911,7 @@ mod tests {
for item in &initial_context {
rollout_items.push(RolloutItem::ResponseItem(item.clone()));
}
live_history.record_items(initial_context.iter());
live_history.record_items(initial_context.iter(), turn_context.truncation_policy);
let user1 = ResponseItem::Message {
id: None,
@@ -2893,7 +2920,7 @@ mod tests {
text: "first user".to_string(),
}],
};
live_history.record_items(std::iter::once(&user1));
live_history.record_items(std::iter::once(&user1), turn_context.truncation_policy);
rollout_items.push(RolloutItem::ResponseItem(user1.clone()));
let assistant1 = ResponseItem::Message {
@@ -2903,13 +2930,13 @@ mod tests {
text: "assistant reply one".to_string(),
}],
};
live_history.record_items(std::iter::once(&assistant1));
live_history.record_items(std::iter::once(&assistant1), turn_context.truncation_policy);
rollout_items.push(RolloutItem::ResponseItem(assistant1.clone()));
let summary1 = "summary one";
let snapshot1 = live_history.get_history();
let user_messages1 = collect_user_messages(&snapshot1);
let rebuilt1 = build_compacted_history(
let rebuilt1 = compact::build_compacted_history(
session.build_initial_context(turn_context),
&user_messages1,
summary1,
@@ -2917,6 +2944,7 @@ mod tests {
live_history.replace(rebuilt1);
rollout_items.push(RolloutItem::Compacted(CompactedItem {
message: summary1.to_string(),
replacement_history: None,
}));
let user2 = ResponseItem::Message {
@@ -2926,7 +2954,7 @@ mod tests {
text: "second user".to_string(),
}],
};
live_history.record_items(std::iter::once(&user2));
live_history.record_items(std::iter::once(&user2), turn_context.truncation_policy);
rollout_items.push(RolloutItem::ResponseItem(user2.clone()));
let assistant2 = ResponseItem::Message {
@@ -2936,13 +2964,13 @@ mod tests {
text: "assistant reply two".to_string(),
}],
};
live_history.record_items(std::iter::once(&assistant2));
live_history.record_items(std::iter::once(&assistant2), turn_context.truncation_policy);
rollout_items.push(RolloutItem::ResponseItem(assistant2.clone()));
let summary2 = "summary two";
let snapshot2 = live_history.get_history();
let user_messages2 = collect_user_messages(&snapshot2);
let rebuilt2 = build_compacted_history(
let rebuilt2 = compact::build_compacted_history(
session.build_initial_context(turn_context),
&user_messages2,
summary2,
@@ -2950,6 +2978,7 @@ mod tests {
live_history.replace(rebuilt2);
rollout_items.push(RolloutItem::Compacted(CompactedItem {
message: summary2.to_string(),
replacement_history: None,
}));
let user3 = ResponseItem::Message {
@@ -2959,7 +2988,7 @@ mod tests {
text: "third user".to_string(),
}],
};
live_history.record_items(std::iter::once(&user3));
live_history.record_items(std::iter::once(&user3), turn_context.truncation_policy);
rollout_items.push(RolloutItem::ResponseItem(user3.clone()));
let assistant3 = ResponseItem::Message {
@@ -2969,7 +2998,7 @@ mod tests {
text: "assistant reply three".to_string(),
}],
};
live_history.record_items(std::iter::once(&assistant3));
live_history.record_items(std::iter::once(&assistant3), turn_context.truncation_policy);
rollout_items.push(RolloutItem::ResponseItem(assistant3.clone()));
(rollout_items, live_history.get_history())

View File

@@ -1,6 +1,8 @@
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::SandboxPolicy;
use crate::sandboxing::SandboxPermissions;
use crate::bash::parse_shell_lc_plain_commands;
use crate::is_safe_command::is_known_safe_command;
@@ -8,7 +10,7 @@ pub fn requires_initial_appoval(
policy: AskForApproval,
sandbox_policy: &SandboxPolicy,
command: &[String],
with_escalated_permissions: bool,
sandbox_permissions: SandboxPermissions,
) -> bool {
if is_known_safe_command(command) {
return false;
@@ -24,8 +26,7 @@ pub fn requires_initial_appoval(
// In restricted sandboxes (ReadOnly/WorkspaceWrite), do not prompt for
// nonescalated, nondangerous commands — let the sandbox enforce
// restrictions (e.g., block network/write) without a user prompt.
let wants_escalation: bool = with_escalated_permissions;
if wants_escalation {
if sandbox_permissions.requires_escalated_permissions() {
return true;
}
command_might_be_dangerous(command)

View File

@@ -14,7 +14,9 @@ use crate::protocol::EventMsg;
use crate::protocol::TaskStartedEvent;
use crate::protocol::TurnContextItem;
use crate::protocol::WarningEvent;
use crate::truncate::truncate_middle;
use crate::truncate::TruncationPolicy;
use crate::truncate::approx_token_count;
use crate::truncate::truncate_text;
use crate::util::backoff;
use codex_protocol::items::TurnItem;
use codex_protocol::models::ContentItem;
@@ -59,7 +61,10 @@ async fn run_compact_task_inner(
let initial_input_for_turn: ResponseInputItem = ResponseInputItem::from(input);
let mut history = sess.clone_history().await;
history.record_items(&[initial_input_for_turn.into()]);
history.record_items(
&[initial_input_for_turn.into()],
turn_context.truncation_policy,
);
let mut truncated_count = 0usize;
@@ -167,6 +172,7 @@ async fn run_compact_task_inner(
let rollout_item = RolloutItem::Compacted(CompactedItem {
message: summary_text.clone(),
replacement_history: None,
});
sess.persist_rollout_items(&[rollout_item]).await;
@@ -229,7 +235,7 @@ pub(crate) fn build_compacted_history(
initial_context,
user_messages,
summary_text,
COMPACT_USER_MESSAGE_MAX_TOKENS * 4,
COMPACT_USER_MESSAGE_MAX_TOKENS,
)
}
@@ -237,20 +243,21 @@ fn build_compacted_history_with_limit(
mut history: Vec<ResponseItem>,
user_messages: &[String],
summary_text: &str,
max_bytes: usize,
max_tokens: usize,
) -> Vec<ResponseItem> {
let mut selected_messages: Vec<String> = Vec::new();
if max_bytes > 0 {
let mut remaining = max_bytes;
if max_tokens > 0 {
let mut remaining = max_tokens;
for message in user_messages.iter().rev() {
if remaining == 0 {
break;
}
if message.len() <= remaining {
let tokens = approx_token_count(message);
if tokens <= remaining {
selected_messages.push(message.clone());
remaining = remaining.saturating_sub(message.len());
remaining = remaining.saturating_sub(tokens);
} else {
let (truncated, _) = truncate_middle(message, remaining);
let truncated = truncate_text(message, TruncationPolicy::Tokens(remaining));
selected_messages.push(truncated);
break;
}
@@ -299,7 +306,8 @@ async fn drain_to_completed(
};
match event {
Ok(ResponseEvent::OutputItemDone(item)) => {
sess.record_into_history(std::slice::from_ref(&item)).await;
sess.record_into_history(std::slice::from_ref(&item), turn_context)
.await;
}
Ok(ResponseEvent::RateLimits(snapshot)) => {
sess.update_rate_limits(turn_context, snapshot).await;
@@ -317,6 +325,7 @@ async fn drain_to_completed(
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
@@ -408,16 +417,16 @@ mod tests {
}
#[test]
fn build_compacted_history_truncates_overlong_user_messages() {
fn build_token_limited_compacted_history_truncates_overlong_user_messages() {
// Use a small truncation limit so the test remains fast while still validating
// that oversized user content is truncated.
let max_bytes = 128;
let big = "X".repeat(max_bytes + 50);
let max_tokens = 16;
let big = "word ".repeat(200);
let history = super::build_compacted_history_with_limit(
Vec::new(),
std::slice::from_ref(&big),
"SUMMARY",
max_bytes,
max_tokens,
);
assert_eq!(history.len(), 2);
@@ -450,7 +459,7 @@ mod tests {
}
#[test]
fn build_compacted_history_appends_summary_message() {
fn build_token_limited_compacted_history_appends_summary_message() {
let initial_context: Vec<ResponseItem> = Vec::new();
let user_messages = vec!["first user message".to_string()];
let summary_text = "summary text";

View File

@@ -0,0 +1,100 @@
use std::sync::Arc;
use crate::Prompt;
use crate::codex::Session;
use crate::codex::TurnContext;
use crate::error::Result as CodexResult;
use crate::protocol::AgentMessageEvent;
use crate::protocol::CompactedItem;
use crate::protocol::ErrorEvent;
use crate::protocol::EventMsg;
use crate::protocol::RolloutItem;
use crate::protocol::TaskStartedEvent;
use codex_protocol::models::ResponseInputItem;
use codex_protocol::models::ResponseItem;
use codex_protocol::user_input::UserInput;
pub(crate) async fn run_remote_compact_task(
sess: Arc<Session>,
turn_context: Arc<TurnContext>,
input: Vec<UserInput>,
) -> Option<String> {
let start_event = EventMsg::TaskStarted(TaskStartedEvent {
model_context_window: turn_context.client.get_model_context_window(),
});
sess.send_event(&turn_context, start_event).await;
match run_remote_compact_task_inner(&sess, &turn_context, input).await {
Ok(()) => {
let event = EventMsg::AgentMessage(AgentMessageEvent {
message: "Compact task completed".to_string(),
});
sess.send_event(&turn_context, event).await;
}
Err(err) => {
let event = EventMsg::Error(ErrorEvent {
message: err.to_string(),
});
sess.send_event(&turn_context, event).await;
}
}
None
}
async fn run_remote_compact_task_inner(
sess: &Arc<Session>,
turn_context: &Arc<TurnContext>,
input: Vec<UserInput>,
) -> CodexResult<()> {
let mut history = sess.clone_history().await;
if !input.is_empty() {
let initial_input_for_turn: ResponseInputItem = ResponseInputItem::from(input);
history.record_items(
&[initial_input_for_turn.into()],
turn_context.truncation_policy,
);
}
let prompt = Prompt {
input: history.get_history_for_prompt(),
tools: vec![],
parallel_tool_calls: false,
base_instructions_override: turn_context.base_instructions.clone(),
output_schema: None,
};
let mut new_history = turn_context
.client
.compact_conversation_history(&prompt)
.await?;
// Required to keep `/undo` available after compaction
let ghost_snapshots: Vec<ResponseItem> = history
.get_history()
.iter()
.filter(|item| matches!(item, ResponseItem::GhostSnapshot { .. }))
.cloned()
.collect();
if !ghost_snapshots.is_empty() {
new_history.extend(ghost_snapshots);
}
sess.replace_history(new_history.clone()).await;
if let Some(estimated_tokens) = sess
.clone_history()
.await
.estimate_token_count(turn_context.as_ref())
{
sess.override_last_token_usage_estimate(turn_context.as_ref(), estimated_tokens)
.await;
}
let compacted_item = CompactedItem {
message: String::new(),
replacement_history: Some(new_history),
};
sess.persist_rollout_items(&[RolloutItem::Compacted(compacted_item)])
.await;
Ok(())
}

View File

@@ -550,6 +550,15 @@ impl ConfigEditsBuilder {
self
}
/// Enable or disable a feature flag by key under the `[features]` table.
pub fn set_feature_enabled(mut self, key: &str, enabled: bool) -> Self {
self.edits.push(ConfigEdit::SetPath {
segments: vec!["features".to_string(), key.to_string()],
value: value(enabled),
});
self
}
/// Apply edits on a blocking thread.
pub fn apply_blocking(self) -> anyhow::Result<()> {
apply_blocking(&self.codex_home, self.profile.as_deref(), &self.edits)

View File

@@ -195,6 +195,9 @@ pub struct Config {
/// Additional filenames to try when looking for project-level docs.
pub project_doc_fallback_filenames: Vec<String>,
/// Token budget applied when storing tool/function outputs in the context manager.
pub tool_output_token_limit: Option<usize>,
/// Directory containing all Codex state (defaults to `~/.codex` but can be
/// overridden by the `CODEX_HOME` environment variable).
pub codex_home: PathBuf,
@@ -636,6 +639,9 @@ pub struct ConfigToml {
/// Ordered list of fallback filenames to look for when AGENTS.md is missing.
pub project_doc_fallback_filenames: Option<Vec<String>>,
/// Token budget applied when storing tool/function outputs in the context manager.
pub tool_output_token_limit: Option<usize>,
/// Profile to use from the `profiles` map.
pub profile: Option<String>,
@@ -1209,6 +1215,7 @@ impl Config {
}
})
.collect(),
tool_output_token_limit: cfg.tool_output_token_limit,
codex_home,
history,
file_opener: cfg.file_opener.unwrap_or(UriBasedFileOpener::VsCode),
@@ -1313,6 +1320,16 @@ impl Config {
Ok(Some(s))
}
}
pub fn set_windows_sandbox_globally(&mut self, value: bool) {
crate::safety::set_windows_sandbox_enabled(value);
if value {
self.features.enable(Feature::WindowsSandbox);
} else {
self.features.disable(Feature::WindowsSandbox);
}
self.forced_auto_mode_downgraded_on_windows = !value;
}
}
fn default_model() -> String {
@@ -2961,6 +2978,7 @@ model_verbosity = "high"
model_providers: fixture.model_provider_map.clone(),
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
project_doc_fallback_filenames: Vec::new(),
tool_output_token_limit: None,
codex_home: fixture.codex_home(),
history: History::default(),
file_opener: UriBasedFileOpener::VsCode,
@@ -3032,6 +3050,7 @@ model_verbosity = "high"
model_providers: fixture.model_provider_map.clone(),
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
project_doc_fallback_filenames: Vec::new(),
tool_output_token_limit: None,
codex_home: fixture.codex_home(),
history: History::default(),
file_opener: UriBasedFileOpener::VsCode,
@@ -3118,6 +3137,7 @@ model_verbosity = "high"
model_providers: fixture.model_provider_map.clone(),
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
project_doc_fallback_filenames: Vec::new(),
tool_output_token_limit: None,
codex_home: fixture.codex_home(),
history: History::default(),
file_opener: UriBasedFileOpener::VsCode,
@@ -3190,6 +3210,7 @@ model_verbosity = "high"
model_providers: fixture.model_provider_map.clone(),
project_doc_max_bytes: PROJECT_DOC_MAX_BYTES,
project_doc_fallback_filenames: Vec::new(),
tool_output_token_limit: None,
codex_home: fixture.codex_home(),
history: History::default(),
file_opener: UriBasedFileOpener::VsCode,

View File

@@ -1,8 +1,8 @@
use crate::codex::TurnContext;
use crate::context_manager::normalize;
use crate::truncate;
use crate::truncate::format_output_for_model_body;
use crate::truncate::globally_truncate_function_output_items;
use crate::truncate::TruncationPolicy;
use crate::truncate::truncate_function_output_items_with_policy;
use crate::truncate::truncate_text;
use codex_protocol::models::FunctionCallOutputPayload;
use codex_protocol::models::ResponseItem;
use codex_protocol::protocol::TokenUsage;
@@ -10,12 +10,6 @@ use codex_protocol::protocol::TokenUsageInfo;
use codex_utils_tokenizer::Tokenizer;
use std::ops::Deref;
const CONTEXT_WINDOW_HARD_LIMIT_FACTOR: f64 = 1.1;
const CONTEXT_WINDOW_HARD_LIMIT_BYTES: usize =
(truncate::MODEL_FORMAT_MAX_BYTES as f64 * CONTEXT_WINDOW_HARD_LIMIT_FACTOR) as usize;
const CONTEXT_WINDOW_HARD_LIMIT_LINES: usize =
(truncate::MODEL_FORMAT_MAX_LINES as f64 * CONTEXT_WINDOW_HARD_LIMIT_FACTOR) as usize;
/// Transcript of conversation history
#[derive(Debug, Clone, Default)]
pub(crate) struct ContextManager {
@@ -50,7 +44,7 @@ impl ContextManager {
}
/// `items` is ordered from oldest to newest.
pub(crate) fn record_items<I>(&mut self, items: I)
pub(crate) fn record_items<I>(&mut self, items: I, policy: TruncationPolicy)
where
I: IntoIterator,
I::Item: std::ops::Deref<Target = ResponseItem>,
@@ -62,7 +56,7 @@ impl ContextManager {
continue;
}
let processed = Self::process_item(&item);
let processed = self.process_item(item_ref, policy);
self.items.push(processed);
}
}
@@ -150,18 +144,14 @@ impl ContextManager {
items.retain(|item| !matches!(item, ResponseItem::GhostSnapshot { .. }));
}
fn process_item(item: &ResponseItem) -> ResponseItem {
fn process_item(&self, item: &ResponseItem, policy: TruncationPolicy) -> ResponseItem {
match item {
ResponseItem::FunctionCallOutput { call_id, output } => {
let truncated = format_output_for_model_body(
output.content.as_str(),
CONTEXT_WINDOW_HARD_LIMIT_BYTES,
CONTEXT_WINDOW_HARD_LIMIT_LINES,
);
let truncated = truncate_text(output.content.as_str(), policy);
let truncated_items = output
.content_items
.as_ref()
.map(|items| globally_truncate_function_output_items(items));
.map(|items| truncate_function_output_items_with_policy(items, policy));
ResponseItem::FunctionCallOutput {
call_id: call_id.clone(),
output: FunctionCallOutputPayload {
@@ -172,11 +162,7 @@ impl ContextManager {
}
}
ResponseItem::CustomToolCallOutput { call_id, output } => {
let truncated = format_output_for_model_body(
output,
CONTEXT_WINDOW_HARD_LIMIT_BYTES,
CONTEXT_WINDOW_HARD_LIMIT_LINES,
);
let truncated = truncate_text(output, policy);
ResponseItem::CustomToolCallOutput {
call_id: call_id.clone(),
output: truncated,
@@ -188,6 +174,7 @@ impl ContextManager {
| ResponseItem::FunctionCall { .. }
| ResponseItem::WebSearchCall { .. }
| ResponseItem::CustomToolCall { .. }
| ResponseItem::CompactionSummary { .. }
| ResponseItem::GhostSnapshot { .. }
| ResponseItem::Other => item.clone(),
}
@@ -205,7 +192,8 @@ fn is_api_message(message: &ResponseItem) -> bool {
| ResponseItem::CustomToolCallOutput { .. }
| ResponseItem::LocalShellCall { .. }
| ResponseItem::Reasoning { .. }
| ResponseItem::WebSearchCall { .. } => true,
| ResponseItem::WebSearchCall { .. }
| ResponseItem::CompactionSummary { .. } => true,
ResponseItem::GhostSnapshot { .. } => false,
ResponseItem::Other => false,
}

View File

@@ -1,9 +1,8 @@
use super::*;
use crate::context_manager::MODEL_FORMAT_MAX_LINES;
use crate::truncate;
use crate::truncate::TruncationPolicy;
use codex_git::GhostCommit;
use codex_protocol::models::ContentItem;
use codex_protocol::models::FunctionCallOutputContentItem;
use codex_protocol::models::FunctionCallOutputPayload;
use codex_protocol::models::LocalShellAction;
use codex_protocol::models::LocalShellExecAction;
@@ -13,6 +12,9 @@ use codex_protocol::models::ReasoningItemReasoningSummary;
use pretty_assertions::assert_eq;
use regex_lite::Regex;
const EXEC_FORMAT_MAX_LINES: usize = 256;
const EXEC_FORMAT_MAX_BYTES: usize = 10_000;
fn assistant_msg(text: &str) -> ResponseItem {
ResponseItem::Message {
id: None,
@@ -25,7 +27,9 @@ fn assistant_msg(text: &str) -> ResponseItem {
fn create_history_with_items(items: Vec<ResponseItem>) -> ContextManager {
let mut h = ContextManager::new();
h.record_items(items.iter());
// Use a generous but fixed token budget; tests only rely on truncation
// behavior, not on a specific model's token limit.
h.record_items(items.iter(), TruncationPolicy::Tokens(10_000));
h
}
@@ -55,6 +59,7 @@ fn reasoning_msg(text: &str) -> ResponseItem {
#[test]
fn filters_non_api_messages() {
let mut h = ContextManager::default();
let policy = TruncationPolicy::Tokens(10_000);
// System message is not API messages; Other is ignored.
let system = ResponseItem::Message {
id: None,
@@ -64,12 +69,12 @@ fn filters_non_api_messages() {
}],
};
let reasoning = reasoning_msg("thinking...");
h.record_items([&system, &reasoning, &ResponseItem::Other]);
h.record_items([&system, &reasoning, &ResponseItem::Other], policy);
// User and assistant should be retained.
let u = user_msg("hi");
let a = assistant_msg("hello");
h.record_items([&u, &a]);
h.record_items([&u, &a], policy);
let items = h.contents();
assert_eq!(
@@ -237,6 +242,9 @@ fn normalization_retains_local_shell_outputs() {
#[test]
fn record_items_truncates_function_call_output_content() {
let mut history = ContextManager::new();
// Any reasonably small token budget works; the test only cares that
// truncation happens and the marker is present.
let policy = TruncationPolicy::Tokens(1_000);
let long_line = "a very long line to trigger truncation\n";
let long_output = long_line.repeat(2_500);
let item = ResponseItem::FunctionCallOutput {
@@ -248,15 +256,20 @@ fn record_items_truncates_function_call_output_content() {
},
};
history.record_items([&item]);
history.record_items([&item], policy);
assert_eq!(history.items.len(), 1);
match &history.items[0] {
ResponseItem::FunctionCallOutput { output, .. } => {
assert_ne!(output.content, long_output);
assert!(
output.content.starts_with("Total output lines:"),
"expected truncated summary, got {}",
output.content.contains("tokens truncated"),
"expected token-based truncation marker, got {}",
output.content
);
assert!(
output.content.contains("tokens truncated"),
"expected truncation marker, got {}",
output.content
);
}
@@ -267,6 +280,7 @@ fn record_items_truncates_function_call_output_content() {
#[test]
fn record_items_truncates_custom_tool_call_output_content() {
let mut history = ContextManager::new();
let policy = TruncationPolicy::Tokens(1_000);
let line = "custom output that is very long\n";
let long_output = line.repeat(2_500);
let item = ResponseItem::CustomToolCallOutput {
@@ -274,21 +288,48 @@ fn record_items_truncates_custom_tool_call_output_content() {
output: long_output.clone(),
};
history.record_items([&item]);
history.record_items([&item], policy);
assert_eq!(history.items.len(), 1);
match &history.items[0] {
ResponseItem::CustomToolCallOutput { output, .. } => {
assert_ne!(output, &long_output);
assert!(
output.starts_with("Total output lines:"),
"expected truncated summary, got {output}"
output.contains("tokens truncated"),
"expected token-based truncation marker, got {output}"
);
assert!(
output.contains("tokens truncated") || output.contains("bytes truncated"),
"expected truncation marker, got {output}"
);
}
other => panic!("unexpected history item: {other:?}"),
}
}
#[test]
fn record_items_respects_custom_token_limit() {
let mut history = ContextManager::new();
let policy = TruncationPolicy::Tokens(10);
let long_output = "tokenized content repeated many times ".repeat(200);
let item = ResponseItem::FunctionCallOutput {
call_id: "call-custom-limit".to_string(),
output: FunctionCallOutputPayload {
content: long_output,
success: Some(true),
..Default::default()
},
};
history.record_items([&item], policy);
let stored = match &history.items[0] {
ResponseItem::FunctionCallOutput { output, .. } => output,
other => panic!("unexpected history item: {other:?}"),
};
assert!(stored.content.contains("tokens truncated"));
}
fn assert_truncated_message_matches(message: &str, line: &str, total_lines: usize) {
let pattern = truncated_message_pattern(line, total_lines);
let regex = Regex::new(&pattern).unwrap_or_else(|err| {
@@ -302,23 +343,22 @@ fn assert_truncated_message_matches(message: &str, line: &str, total_lines: usiz
.expect("missing body capture")
.as_str();
assert!(
body.len() <= truncate::MODEL_FORMAT_MAX_BYTES,
body.len() <= EXEC_FORMAT_MAX_BYTES,
"body exceeds byte limit: {} bytes",
body.len()
);
}
fn truncated_message_pattern(line: &str, total_lines: usize) -> String {
let head_lines = MODEL_FORMAT_MAX_LINES / 2;
let tail_lines = MODEL_FORMAT_MAX_LINES - head_lines;
let head_lines = EXEC_FORMAT_MAX_LINES / 2;
let tail_lines = EXEC_FORMAT_MAX_LINES - head_lines;
let head_take = head_lines.min(total_lines);
let tail_take = tail_lines.min(total_lines.saturating_sub(head_take));
let omitted = total_lines.saturating_sub(head_take + tail_take);
let escaped_line = regex_lite::escape(line);
if omitted == 0 {
return format!(
r"(?s)^Total output lines: {total_lines}\n\n(?P<body>{escaped_line}.*\n\[\.{{3}} output truncated to fit {max_bytes} bytes \.{{3}}]\n\n.*)$",
max_bytes = truncate::MODEL_FORMAT_MAX_BYTES,
r"(?s)^Total output lines: {total_lines}\n\n(?P<body>{escaped_line}.*\n\[\.{{3}} removed \d+ bytes to fit {EXEC_FORMAT_MAX_BYTES} byte limit \.{{3}}]\n\n.*)$",
);
}
format!(
@@ -331,11 +371,7 @@ fn format_exec_output_truncates_large_error() {
let line = "very long execution error line that should trigger truncation\n";
let large_error = line.repeat(2_500); // way beyond both byte and line limits
let truncated = truncate::format_output_for_model_body(
&large_error,
truncate::MODEL_FORMAT_MAX_BYTES,
truncate::MODEL_FORMAT_MAX_LINES,
);
let truncated = truncate::truncate_with_line_bytes_budget(&large_error, EXEC_FORMAT_MAX_BYTES);
let total_lines = large_error.lines().count();
assert_truncated_message_matches(&truncated, line, total_lines);
@@ -344,17 +380,13 @@ fn format_exec_output_truncates_large_error() {
#[test]
fn format_exec_output_marks_byte_truncation_without_omitted_lines() {
let long_line = "a".repeat(truncate::MODEL_FORMAT_MAX_BYTES + 50);
let truncated = truncate::format_output_for_model_body(
&long_line,
truncate::MODEL_FORMAT_MAX_BYTES,
truncate::MODEL_FORMAT_MAX_LINES,
);
let long_line = "a".repeat(EXEC_FORMAT_MAX_BYTES + 50);
let truncated = truncate::truncate_with_line_bytes_budget(&long_line, EXEC_FORMAT_MAX_BYTES);
assert_ne!(truncated, long_line);
let removed_bytes = long_line.len().saturating_sub(EXEC_FORMAT_MAX_BYTES);
let marker_line = format!(
"[... output truncated to fit {} bytes ...]",
truncate::MODEL_FORMAT_MAX_BYTES
"[... removed {removed_bytes} bytes to fit {EXEC_FORMAT_MAX_BYTES} byte limit ...]"
);
assert!(
truncated.contains(&marker_line),
@@ -371,28 +403,20 @@ fn format_exec_output_returns_original_when_within_limits() {
let content = "example output\n".repeat(10);
assert_eq!(
truncate::format_output_for_model_body(
&content,
truncate::MODEL_FORMAT_MAX_BYTES,
truncate::MODEL_FORMAT_MAX_LINES
),
truncate::truncate_with_line_bytes_budget(&content, EXEC_FORMAT_MAX_BYTES),
content
);
}
#[test]
fn format_exec_output_reports_omitted_lines_and_keeps_head_and_tail() {
let total_lines = truncate::MODEL_FORMAT_MAX_LINES + 100;
let total_lines = EXEC_FORMAT_MAX_LINES + 100;
let content: String = (0..total_lines)
.map(|idx| format!("line-{idx}\n"))
.collect();
let truncated = truncate::format_output_for_model_body(
&content,
truncate::MODEL_FORMAT_MAX_BYTES,
truncate::MODEL_FORMAT_MAX_LINES,
);
let omitted = total_lines - truncate::MODEL_FORMAT_MAX_LINES;
let truncated = truncate::truncate_with_line_bytes_budget(&content, EXEC_FORMAT_MAX_BYTES);
let omitted = total_lines - EXEC_FORMAT_MAX_LINES;
let expected_marker = format!("[... omitted {omitted} of {total_lines} lines ...]");
assert!(
@@ -413,103 +437,24 @@ fn format_exec_output_reports_omitted_lines_and_keeps_head_and_tail() {
#[test]
fn format_exec_output_prefers_line_marker_when_both_limits_exceeded() {
let total_lines = truncate::MODEL_FORMAT_MAX_LINES + 42;
let total_lines = EXEC_FORMAT_MAX_LINES + 42;
let long_line = "x".repeat(256);
let content: String = (0..total_lines)
.map(|idx| format!("line-{idx}-{long_line}\n"))
.collect();
let truncated = truncate::format_output_for_model_body(
&content,
truncate::MODEL_FORMAT_MAX_BYTES,
truncate::MODEL_FORMAT_MAX_LINES,
);
let truncated = truncate::truncate_with_line_bytes_budget(&content, EXEC_FORMAT_MAX_BYTES);
assert!(
truncated.contains("[... omitted 42 of 298 lines ...]"),
"expected omitted marker when line count exceeds limit: {truncated}"
);
assert!(
!truncated.contains("output truncated to fit"),
!truncated.contains("byte limit"),
"line omission marker should take precedence over byte marker: {truncated}"
);
}
#[test]
fn truncates_across_multiple_under_limit_texts_and_reports_omitted() {
// Arrange: several text items, none exceeding per-item limit, but total exceeds budget.
let budget = truncate::MODEL_FORMAT_MAX_BYTES;
let t1_len = (budget / 2).saturating_sub(10);
let t2_len = (budget / 2).saturating_sub(10);
let remaining_after_t1_t2 = budget.saturating_sub(t1_len + t2_len);
let t3_len = 50; // gets truncated to remaining_after_t1_t2
let t4_len = 5; // omitted
let t5_len = 7; // omitted
let t1 = "a".repeat(t1_len);
let t2 = "b".repeat(t2_len);
let t3 = "c".repeat(t3_len);
let t4 = "d".repeat(t4_len);
let t5 = "e".repeat(t5_len);
let item = ResponseItem::FunctionCallOutput {
call_id: "call-omit".to_string(),
output: FunctionCallOutputPayload {
content: "irrelevant".to_string(),
content_items: Some(vec![
FunctionCallOutputContentItem::InputText { text: t1 },
FunctionCallOutputContentItem::InputText { text: t2 },
FunctionCallOutputContentItem::InputImage {
image_url: "img:mid".to_string(),
},
FunctionCallOutputContentItem::InputText { text: t3 },
FunctionCallOutputContentItem::InputText { text: t4 },
FunctionCallOutputContentItem::InputText { text: t5 },
]),
success: Some(true),
},
};
let mut history = ContextManager::new();
history.record_items([&item]);
assert_eq!(history.items.len(), 1);
let json = serde_json::to_value(&history.items[0]).expect("serialize to json");
let output = json
.get("output")
.expect("output field")
.as_array()
.expect("array output");
// Expect: t1 (full), t2 (full), image, t3 (truncated), summary mentioning 2 omitted.
assert_eq!(output.len(), 5);
let first = output[0].as_object().expect("first obj");
assert_eq!(first.get("type").unwrap(), "input_text");
let first_text = first.get("text").unwrap().as_str().unwrap();
assert_eq!(first_text.len(), t1_len);
let second = output[1].as_object().expect("second obj");
assert_eq!(second.get("type").unwrap(), "input_text");
let second_text = second.get("text").unwrap().as_str().unwrap();
assert_eq!(second_text.len(), t2_len);
assert_eq!(
output[2],
serde_json::json!({"type": "input_image", "image_url": "img:mid"})
);
let fourth = output[3].as_object().expect("fourth obj");
assert_eq!(fourth.get("type").unwrap(), "input_text");
let fourth_text = fourth.get("text").unwrap().as_str().unwrap();
assert_eq!(fourth_text.len(), remaining_after_t1_t2);
let summary = output[4].as_object().expect("summary obj");
assert_eq!(summary.get("type").unwrap(), "input_text");
let summary_text = summary.get("text").unwrap().as_str().unwrap();
assert!(summary_text.contains("omitted 2 text items"));
}
//TODO(aibrahim): run CI in release mode.
#[cfg(not(debug_assertions))]
#[test]

View File

@@ -1,7 +1,5 @@
mod history;
mod normalize;
pub(crate) use crate::truncate::MODEL_FORMAT_MAX_BYTES;
pub(crate) use crate::truncate::MODEL_FORMAT_MAX_LINES;
pub(crate) use crate::truncate::format_output_for_model_body;
pub(crate) use crate::truncate::truncate_with_line_bytes_budget;
pub(crate) use history::ContextManager;

View File

@@ -2,7 +2,8 @@ use crate::codex::ProcessedResponseItem;
use crate::exec::ExecToolCallOutput;
use crate::token_data::KnownPlan;
use crate::token_data::PlanType;
use crate::truncate::truncate_middle;
use crate::truncate::TruncationPolicy;
use crate::truncate::truncate_text;
use chrono::DateTime;
use chrono::Datelike;
use chrono::Local;
@@ -461,7 +462,10 @@ pub fn get_error_message_ui(e: &CodexErr) -> String {
_ => e.to_string(),
};
truncate_middle(&message, ERROR_MESSAGE_UI_MAX_BYTES).0
truncate_text(
&message,
TruncationPolicy::Bytes(ERROR_MESSAGE_UI_MAX_BYTES),
)
}
#[cfg(test)]

View File

@@ -1,4 +1,4 @@
use std::fs;
use std::io::ErrorKind;
use std::path::Path;
use std::path::PathBuf;
use std::sync::Arc;
@@ -11,14 +11,18 @@ use codex_execpolicy2::PolicyParser;
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::SandboxPolicy;
use thiserror::Error;
use tokio::fs;
use crate::bash::parse_shell_lc_plain_commands;
use crate::features::Feature;
use crate::features::Features;
use crate::sandboxing::SandboxPermissions;
use crate::tools::sandboxing::ApprovalRequirement;
const FORBIDDEN_REASON: &str = "execpolicy forbids this command";
const PROMPT_REASON: &str = "execpolicy requires approval for this command";
const POLICY_DIR_NAME: &str = "policy";
const POLICY_EXTENSION: &str = "codexpolicy";
#[derive(Debug, Error)]
pub enum ExecPolicyError {
@@ -41,52 +45,26 @@ pub enum ExecPolicyError {
},
}
pub(crate) fn exec_policy_for(
pub(crate) async fn exec_policy_for(
features: &Features,
codex_home: &Path,
) -> Result<Option<Arc<Policy>>, ExecPolicyError> {
if !features.enabled(Feature::ExecPolicyV2) {
return Ok(None);
) -> Result<Arc<Policy>, ExecPolicyError> {
if !features.enabled(Feature::ExecPolicy) {
return Ok(Arc::new(Policy::empty()));
}
let policy_dir = codex_home.to_path_buf();
let entries = match fs::read_dir(&policy_dir) {
Ok(entries) => entries,
Err(source) if source.kind() == std::io::ErrorKind::NotFound => return Ok(None),
Err(source) => {
return Err(ExecPolicyError::ReadDir {
dir: policy_dir,
source,
});
}
};
let mut policy_paths: Vec<PathBuf> = Vec::new();
for entry in entries {
let entry = entry.map_err(|source| ExecPolicyError::ReadDir {
dir: policy_dir.clone(),
source,
})?;
let path = entry.path();
if path
.extension()
.and_then(|ext| ext.to_str())
.is_some_and(|ext| ext == "codexpolicy")
&& path.is_file()
{
policy_paths.push(path);
}
}
policy_paths.sort();
let policy_dir = codex_home.join(POLICY_DIR_NAME);
let policy_paths = collect_policy_files(&policy_dir).await?;
let mut parser = PolicyParser::new();
for policy_path in &policy_paths {
let contents =
fs::read_to_string(policy_path).map_err(|source| ExecPolicyError::ReadFile {
path: policy_path.clone(),
source,
})?;
fs::read_to_string(policy_path)
.await
.map_err(|source| ExecPolicyError::ReadFile {
path: policy_path.clone(),
source,
})?;
let identifier = policy_path.to_string_lossy().to_string();
parser
.parse(&identifier, &contents)
@@ -98,12 +76,12 @@ pub(crate) fn exec_policy_for(
let policy = Arc::new(parser.build());
tracing::debug!(
file_count = policy_paths.len(),
"loaded execpolicy2 from {}",
"loaded execpolicy from {} files in {}",
policy_paths.len(),
policy_dir.display()
);
Ok(Some(policy))
Ok(policy)
}
fn evaluate_with_policy(
@@ -131,20 +109,18 @@ fn evaluate_with_policy(
}
Decision::Allow => Some(ApprovalRequirement::Skip),
},
Evaluation::NoMatch => None,
Evaluation::NoMatch { .. } => None,
}
}
pub(crate) fn approval_requirement_for_command(
policy: Option<&Policy>,
pub(crate) fn create_approval_requirement_for_command(
policy: &Policy,
command: &[String],
approval_policy: AskForApproval,
sandbox_policy: &SandboxPolicy,
with_escalated_permissions: bool,
sandbox_permissions: SandboxPermissions,
) -> ApprovalRequirement {
if let Some(policy) = policy
&& let Some(requirement) = evaluate_with_policy(policy, command, approval_policy)
{
if let Some(requirement) = evaluate_with_policy(policy, command, approval_policy) {
return requirement;
}
@@ -152,7 +128,7 @@ pub(crate) fn approval_requirement_for_command(
approval_policy,
sandbox_policy,
command,
with_escalated_permissions,
sandbox_permissions,
) {
ApprovalRequirement::NeedsApproval { reason: None }
} else {
@@ -160,6 +136,52 @@ pub(crate) fn approval_requirement_for_command(
}
}
async fn collect_policy_files(dir: &Path) -> Result<Vec<PathBuf>, ExecPolicyError> {
let mut read_dir = match fs::read_dir(dir).await {
Ok(read_dir) => read_dir,
Err(err) if err.kind() == ErrorKind::NotFound => return Ok(Vec::new()),
Err(source) => {
return Err(ExecPolicyError::ReadDir {
dir: dir.to_path_buf(),
source,
});
}
};
let mut policy_paths = Vec::new();
while let Some(entry) =
read_dir
.next_entry()
.await
.map_err(|source| ExecPolicyError::ReadDir {
dir: dir.to_path_buf(),
source,
})?
{
let path = entry.path();
let file_type = entry
.file_type()
.await
.map_err(|source| ExecPolicyError::ReadDir {
dir: dir.to_path_buf(),
source,
})?;
if path
.extension()
.and_then(|ext| ext.to_str())
.is_some_and(|ext| ext == POLICY_EXTENSION)
&& file_type.is_file()
{
policy_paths.push(path);
}
}
policy_paths.sort();
Ok(policy_paths)
}
#[cfg(test)]
mod tests {
use super::*;
@@ -168,28 +190,77 @@ mod tests {
use codex_protocol::protocol::AskForApproval;
use codex_protocol::protocol::SandboxPolicy;
use pretty_assertions::assert_eq;
use std::fs;
use tempfile::tempdir;
#[test]
fn returns_none_when_feature_disabled() {
let features = Features::with_defaults();
#[tokio::test]
async fn returns_empty_policy_when_feature_disabled() {
let mut features = Features::with_defaults();
features.disable(Feature::ExecPolicy);
let temp_dir = tempdir().expect("create temp dir");
let policy = exec_policy_for(&features, temp_dir.path()).expect("policy result");
let policy = exec_policy_for(&features, temp_dir.path())
.await
.expect("policy result");
assert!(policy.is_none());
let commands = [vec!["rm".to_string()]];
assert!(matches!(
policy.check_multiple(commands.iter()),
Evaluation::NoMatch { .. }
));
assert!(!temp_dir.path().join(POLICY_DIR_NAME).exists());
}
#[test]
fn returns_none_when_policy_dir_is_missing() {
let mut features = Features::with_defaults();
features.enable(Feature::ExecPolicyV2);
#[tokio::test]
async fn collect_policy_files_returns_empty_when_dir_missing() {
let temp_dir = tempdir().expect("create temp dir");
let missing_dir = temp_dir.path().join("missing");
let policy = exec_policy_for(&features, &missing_dir).expect("policy result");
let policy_dir = temp_dir.path().join(POLICY_DIR_NAME);
let files = collect_policy_files(&policy_dir)
.await
.expect("collect policy files");
assert!(policy.is_none());
assert!(files.is_empty());
}
#[tokio::test]
async fn loads_policies_from_policy_subdirectory() {
let temp_dir = tempdir().expect("create temp dir");
let policy_dir = temp_dir.path().join(POLICY_DIR_NAME);
fs::create_dir_all(&policy_dir).expect("create policy dir");
fs::write(
policy_dir.join("deny.codexpolicy"),
r#"prefix_rule(pattern=["rm"], decision="forbidden")"#,
)
.expect("write policy file");
let policy = exec_policy_for(&Features::with_defaults(), temp_dir.path())
.await
.expect("policy result");
let command = [vec!["rm".to_string()]];
assert!(matches!(
policy.check_multiple(command.iter()),
Evaluation::Match { .. }
));
}
#[tokio::test]
async fn ignores_policies_outside_policy_dir() {
let temp_dir = tempdir().expect("create temp dir");
fs::write(
temp_dir.path().join("root.codexpolicy"),
r#"prefix_rule(pattern=["ls"], decision="prompt")"#,
)
.expect("write policy file");
let policy = exec_policy_for(&Features::with_defaults(), temp_dir.path())
.await
.expect("policy result");
let command = [vec!["ls".to_string()]];
assert!(matches!(
policy.check_multiple(command.iter()),
Evaluation::NoMatch { .. }
));
}
#[test]
@@ -231,12 +302,12 @@ prefix_rule(pattern=["rm"], decision="forbidden")
let policy = parser.build();
let command = vec!["rm".to_string()];
let requirement = approval_requirement_for_command(
Some(&policy),
let requirement = create_approval_requirement_for_command(
&policy,
&command,
AskForApproval::OnRequest,
&SandboxPolicy::DangerFullAccess,
false,
SandboxPermissions::UseDefault,
);
assert_eq!(
@@ -257,12 +328,12 @@ prefix_rule(pattern=["rm"], decision="forbidden")
let policy = parser.build();
let command = vec!["rm".to_string()];
let requirement = approval_requirement_for_command(
Some(&policy),
let requirement = create_approval_requirement_for_command(
&policy,
&command,
AskForApproval::Never,
&SandboxPolicy::DangerFullAccess,
false,
SandboxPermissions::UseDefault,
);
assert_eq!(
@@ -277,12 +348,13 @@ prefix_rule(pattern=["rm"], decision="forbidden")
fn approval_requirement_falls_back_to_heuristics() {
let command = vec!["python".to_string()];
let requirement = approval_requirement_for_command(
None,
let empty_policy = Policy::empty();
let requirement = create_approval_requirement_for_command(
&empty_policy,
&command,
AskForApproval::UnlessTrusted,
&SandboxPolicy::ReadOnly,
false,
SandboxPermissions::UseDefault,
);
assert_eq!(

View File

@@ -27,6 +27,8 @@ pub enum Stage {
/// Unique features toggled via configuration.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum Feature {
/// Create a ghost commit at each turn.
GhostCommit,
/// Use the single unified PTY-backed exec tool.
UnifiedExec,
/// Use the shell command tool that takes `command` as a single string of
@@ -40,16 +42,18 @@ pub enum Feature {
ViewImageTool,
/// Allow the model to request web searches.
WebSearchRequest,
/// Gate the execpolicy2 enforcement for shell/unified exec.
ExecPolicyV2,
/// Gate the execpolicy enforcement for shell/unified exec.
ExecPolicy,
/// Enable the model-based risk assessments for sandboxed commands.
SandboxCommandAssessment,
/// Create a ghost commit at each turn.
GhostCommit,
/// Enable Windows sandbox (restricted token) on Windows.
WindowsSandbox,
/// Remote compaction enabled (only for ChatGPT auth)
RemoteCompaction,
/// Enable the default shell tool.
ShellTool,
/// Allow model to call multiple tools in parallel (only for models supporting it).
ParallelToolCalls,
}
impl Feature {
@@ -251,6 +255,14 @@ pub struct FeatureSpec {
}
pub const FEATURES: &[FeatureSpec] = &[
// Stable features.
FeatureSpec {
id: Feature::GhostCommit,
key: "undo",
stage: Stage::Stable,
default_enabled: true,
},
// Unstable features.
FeatureSpec {
id: Feature::UnifiedExec,
key: "unified_exec",
@@ -288,10 +300,10 @@ pub const FEATURES: &[FeatureSpec] = &[
default_enabled: false,
},
FeatureSpec {
id: Feature::ExecPolicyV2,
key: "exec_policy_v2",
id: Feature::ExecPolicy,
key: "exec_policy",
stage: Stage::Experimental,
default_enabled: false,
default_enabled: true,
},
FeatureSpec {
id: Feature::SandboxCommandAssessment,
@@ -299,18 +311,24 @@ pub const FEATURES: &[FeatureSpec] = &[
stage: Stage::Experimental,
default_enabled: false,
},
FeatureSpec {
id: Feature::GhostCommit,
key: "ghost_commit",
stage: Stage::Experimental,
default_enabled: true,
},
FeatureSpec {
id: Feature::WindowsSandbox,
key: "enable_experimental_windows_sandbox",
stage: Stage::Experimental,
default_enabled: false,
},
FeatureSpec {
id: Feature::RemoteCompaction,
key: "remote_compaction",
stage: Stage::Experimental,
default_enabled: false,
},
FeatureSpec {
id: Feature::ParallelToolCalls,
key: "parallel",
stage: Stage::Experimental,
default_enabled: false,
},
FeatureSpec {
id: Feature::ShellTool,
key: "shell_tool",

View File

@@ -13,6 +13,7 @@ mod client;
mod client_common;
pub mod codex;
mod codex_conversation;
mod compact_remote;
pub use codex_conversation::CodexConversation;
mod codex_delegate;
mod command_safety;
@@ -35,6 +36,7 @@ mod mcp_tool_call;
mod message_history;
mod model_provider_info;
pub mod parse_command;
pub mod powershell;
mod response_processing;
pub mod sandboxing;
pub mod token_data;

View File

@@ -4,6 +4,7 @@ use codex_protocol::config_types::Verbosity;
use crate::config::types::ReasoningSummaryFormat;
use crate::tools::handlers::apply_patch::ApplyPatchToolType;
use crate::tools::spec::ConfigShellToolType;
use crate::truncate::TruncationPolicy;
/// The `instructions` field in the payload sent to a model should always start
/// with this content.
@@ -66,6 +67,8 @@ pub struct ModelFamily {
/// Preferred shell tool type for this model family when features do not override it.
pub shell_type: ConfigShellToolType,
pub truncation_policy: TruncationPolicy,
}
macro_rules! model_family {
@@ -89,6 +92,7 @@ macro_rules! model_family {
shell_type: ConfigShellToolType::Default,
default_verbosity: None,
default_reasoning_effort: None,
truncation_policy: TruncationPolicy::Bytes(10_000),
};
// apply overrides
@@ -146,6 +150,7 @@ pub fn find_family_for_model(slug: &str) -> Option<ModelFamily> {
],
supports_parallel_tool_calls: true,
support_verbosity: true,
truncation_policy: TruncationPolicy::Tokens(10_000),
)
// Internal models.
@@ -161,8 +166,10 @@ pub fn find_family_for_model(slug: &str) -> Option<ModelFamily> {
"list_dir".to_string(),
"read_file".to_string(),
],
shell_type: if cfg!(windows) { ConfigShellToolType::ShellCommand } else { ConfigShellToolType::Default },
supports_parallel_tool_calls: true,
support_verbosity: true,
truncation_policy: TruncationPolicy::Tokens(10_000),
)
// Production models.
@@ -176,7 +183,10 @@ pub fn find_family_for_model(slug: &str) -> Option<ModelFamily> {
reasoning_summary_format: ReasoningSummaryFormat::Experimental,
base_instructions: GPT_5_CODEX_INSTRUCTIONS.to_string(),
apply_patch_tool_type: Some(ApplyPatchToolType::Freeform),
shell_type: if cfg!(windows) { ConfigShellToolType::ShellCommand } else { ConfigShellToolType::Default },
supports_parallel_tool_calls: true,
support_verbosity: false,
truncation_policy: TruncationPolicy::Tokens(10_000),
)
} else if slug.starts_with("gpt-5.1") {
model_family!(
@@ -187,6 +197,8 @@ pub fn find_family_for_model(slug: &str) -> Option<ModelFamily> {
default_verbosity: Some(Verbosity::Low),
base_instructions: GPT_5_1_INSTRUCTIONS.to_string(),
default_reasoning_effort: Some(ReasoningEffort::Medium),
truncation_policy: TruncationPolicy::Bytes(10_000),
supports_parallel_tool_calls: true,
)
} else if slug.starts_with("gpt-5") {
model_family!(
@@ -194,6 +206,7 @@ pub fn find_family_for_model(slug: &str) -> Option<ModelFamily> {
supports_reasoning_summaries: true,
needs_special_apply_patch_instructions: true,
support_verbosity: true,
truncation_policy: TruncationPolicy::Bytes(10_000),
)
} else {
None
@@ -216,5 +229,6 @@ pub fn derive_default_model_family(model: &str) -> ModelFamily {
shell_type: ConfigShellToolType::Default,
default_verbosity: None,
default_reasoning_effort: None,
truncation_policy: TruncationPolicy::Bytes(10_000),
}
}

View File

@@ -8,6 +8,7 @@
use crate::CodexAuth;
use crate::default_client::CodexHttpClient;
use crate::default_client::CodexRequestBuilder;
use crate::error::CodexErr;
use codex_app_server_protocol::AuthMode;
use serde::Deserialize;
use serde::Serialize;
@@ -109,21 +110,7 @@ impl ModelProviderInfo {
client: &'a CodexHttpClient,
auth: &Option<CodexAuth>,
) -> crate::error::Result<CodexRequestBuilder> {
let effective_auth = if let Some(secret_key) = &self.experimental_bearer_token {
Some(CodexAuth::from_api_key(secret_key))
} else {
match self.api_key() {
Ok(Some(key)) => Some(CodexAuth::from_api_key(&key)),
Ok(None) => auth.clone(),
Err(err) => {
if auth.is_some() {
auth.clone()
} else {
return Err(err);
}
}
}
};
let effective_auth = self.effective_auth(auth)?;
let url = self.get_full_url(&effective_auth);
@@ -136,6 +123,51 @@ impl ModelProviderInfo {
Ok(self.apply_http_headers(builder))
}
pub async fn create_compact_request_builder<'a>(
&'a self,
client: &'a CodexHttpClient,
auth: &Option<CodexAuth>,
) -> crate::error::Result<CodexRequestBuilder> {
if self.wire_api != WireApi::Responses {
return Err(CodexErr::UnsupportedOperation(
"Compaction endpoint requires Responses API providers".to_string(),
));
}
let effective_auth = self.effective_auth(auth)?;
let url = self.get_compact_url(&effective_auth).ok_or_else(|| {
CodexErr::UnsupportedOperation(
"Compaction endpoint requires Responses API providers".to_string(),
)
})?;
let mut builder = client.post(url);
if let Some(auth) = effective_auth.as_ref() {
builder = builder.bearer_auth(auth.get_token().await?);
}
Ok(self.apply_http_headers(builder))
}
fn effective_auth(&self, auth: &Option<CodexAuth>) -> crate::error::Result<Option<CodexAuth>> {
if let Some(secret_key) = &self.experimental_bearer_token {
return Ok(Some(CodexAuth::from_api_key(secret_key)));
}
match self.api_key() {
Ok(Some(key)) => Ok(Some(CodexAuth::from_api_key(&key))),
Ok(None) => Ok(auth.clone()),
Err(err) => {
if auth.is_some() {
Ok(auth.clone())
} else {
Err(err)
}
}
}
}
fn get_query_string(&self) -> String {
self.query_params
.as_ref()
@@ -173,6 +205,18 @@ impl ModelProviderInfo {
}
}
pub(crate) fn get_compact_url(&self, auth: &Option<CodexAuth>) -> Option<String> {
if self.wire_api != WireApi::Responses {
return None;
}
let full = self.get_full_url(auth);
if let Some((path, query)) = full.split_once('?') {
Some(format!("{path}/compact?{query}"))
} else {
Some(format!("{full}/compact"))
}
}
pub(crate) fn is_azure_responses_endpoint(&self) -> bool {
if self.wire_api != WireApi::Responses {
return false;

View File

@@ -1,6 +1,7 @@
use crate::bash::extract_bash_command;
use crate::bash::try_parse_shell;
use crate::bash::try_parse_word_only_commands_sequence;
use crate::powershell::extract_powershell_command;
use codex_protocol::parse_command::ParsedCommand;
use shlex::split as shlex_split;
use shlex::try_join as shlex_try_join;
@@ -11,6 +12,11 @@ pub fn shlex_join(tokens: &[String]) -> String {
.unwrap_or_else(|_| "<command included NUL byte>".to_string())
}
/// Extracts the shell and script from a command, regardless of platform
pub fn extract_shell_command(command: &[String]) -> Option<(&str, &str)> {
extract_bash_command(command).or_else(|| extract_powershell_command(command))
}
/// DO NOT REVIEW THIS CODE BY HAND
/// This parsing code is quite complex and not easy to hand-modify.
/// The easiest way to iterate is to add unit tests and have Codex fix the implementation.
@@ -877,6 +883,42 @@ mod tests {
}],
);
}
#[test]
fn powershell_command_is_stripped() {
assert_parsed(
&vec_str(&["powershell", "-Command", "Get-ChildItem"]),
vec![ParsedCommand::Unknown {
cmd: "Get-ChildItem".to_string(),
}],
);
}
#[test]
fn pwsh_with_noprofile_and_c_alias_is_stripped() {
assert_parsed(
&vec_str(&["pwsh", "-NoProfile", "-c", "Write-Host hi"]),
vec![ParsedCommand::Unknown {
cmd: "Write-Host hi".to_string(),
}],
);
}
#[test]
fn powershell_with_path_is_stripped() {
let command = if cfg!(windows) {
"C:\\windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
} else {
"/usr/local/bin/powershell.exe"
};
assert_parsed(
&vec_str(&[command, "-NoProfile", "-c", "Write-Host hi"]),
vec![ParsedCommand::Unknown {
cmd: "Write-Host hi".to_string(),
}],
);
}
}
pub fn parse_command_impl(command: &[String]) -> Vec<ParsedCommand> {
@@ -884,6 +926,12 @@ pub fn parse_command_impl(command: &[String]) -> Vec<ParsedCommand> {
return commands;
}
if let Some((_, script)) = extract_powershell_command(command) {
return vec![ParsedCommand::Unknown {
cmd: script.to_string(),
}];
}
let normalized = normalize_tokens(command);
let parts = if contains_connectors(&normalized) {
@@ -1190,6 +1238,7 @@ fn parse_find_query_and_path(tail: &[String]) -> (Option<String>, Option<String>
}
fn parse_shell_lc_commands(original: &[String]) -> Option<Vec<ParsedCommand>> {
// Only handle bash/zsh here; PowerShell is stripped separately without bash parsing.
let (_, script) = extract_bash_command(original)?;
if let Some(tree) = try_parse_shell(script)

View File

@@ -0,0 +1,93 @@
use std::path::PathBuf;
use crate::shell::ShellType;
use crate::shell::detect_shell_type;
const POWERSHELL_FLAGS: &[&str] = &["-nologo", "-noprofile", "-command", "-c"];
/// Extract the PowerShell script body from an invocation such as:
///
/// - ["pwsh", "-NoProfile", "-Command", "Get-ChildItem -Recurse | Select-String foo"]
/// - ["powershell.exe", "-Command", "Write-Host hi"]
/// - ["powershell", "-NoLogo", "-NoProfile", "-Command", "...script..."]
///
/// Returns (`shell`, `script`) when the first arg is a PowerShell executable and a
/// `-Command` (or `-c`) flag is present followed by a script string.
pub fn extract_powershell_command(command: &[String]) -> Option<(&str, &str)> {
if command.len() < 3 {
return None;
}
let shell = &command[0];
if detect_shell_type(&PathBuf::from(shell)) != Some(ShellType::PowerShell) {
return None;
}
// Find the first occurrence of -Command (accept common short alias -c as well)
let mut i = 1usize;
while i + 1 < command.len() {
let flag = &command[i];
// Reject unknown flags
if !POWERSHELL_FLAGS.contains(&flag.to_ascii_lowercase().as_str()) {
return None;
}
if flag.eq_ignore_ascii_case("-Command") || flag.eq_ignore_ascii_case("-c") {
let script = &command[i + 1];
return Some((shell, script.as_str()));
}
i += 1;
}
None
}
#[cfg(test)]
mod tests {
use super::extract_powershell_command;
#[test]
fn extracts_basic_powershell_command() {
let cmd = vec![
"powershell".to_string(),
"-Command".to_string(),
"Write-Host hi".to_string(),
];
let (_shell, script) = extract_powershell_command(&cmd).expect("extract");
assert_eq!(script, "Write-Host hi");
}
#[test]
fn extracts_lowercase_flags() {
let cmd = vec![
"powershell".to_string(),
"-nologo".to_string(),
"-command".to_string(),
"Write-Host hi".to_string(),
];
let (_shell, script) = extract_powershell_command(&cmd).expect("extract");
assert_eq!(script, "Write-Host hi");
}
#[test]
fn extracts_full_path_powershell_command() {
let command = if cfg!(windows) {
"C:\\windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe".to_string()
} else {
"/usr/local/bin/powershell.exe".to_string()
};
let cmd = vec![command, "-Command".to_string(), "Write-Host hi".to_string()];
let (_shell, script) = extract_powershell_command(&cmd).expect("extract");
assert_eq!(script, "Write-Host hi");
}
#[test]
fn extracts_with_noprofile_and_alias() {
let cmd = vec![
"pwsh".to_string(),
"-NoProfile".to_string(),
"-c".to_string(),
"Get-ChildItem | Select-String foo".to_string(),
];
let (_shell, script) = extract_powershell_command(&cmd).expect("extract");
assert_eq!(script, "Get-ChildItem | Select-String foo");
}
}

View File

@@ -27,7 +27,8 @@ pub(crate) fn should_persist_response_item(item: &ResponseItem) -> bool {
| ResponseItem::CustomToolCall { .. }
| ResponseItem::CustomToolCallOutput { .. }
| ResponseItem::WebSearchCall { .. }
| ResponseItem::GhostSnapshot { .. } => true,
| ResponseItem::GhostSnapshot { .. }
| ResponseItem::CompactionSummary { .. } => true,
ResponseItem::Other => false,
}
}

View File

@@ -814,6 +814,7 @@ async fn test_tail_skips_trailing_non_responses() -> Result<()> {
timestamp: format!("{ts}-compacted"),
item: RolloutItem::Compacted(CompactedItem {
message: "compacted".into(),
replacement_history: None,
}),
};
writeln!(file, "{}", serde_json::to_string(&compacted_line)?)?;

View File

@@ -26,6 +26,28 @@ use std::collections::HashMap;
use std::path::Path;
use std::path::PathBuf;
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum SandboxPermissions {
UseDefault,
RequireEscalated,
}
impl SandboxPermissions {
pub fn requires_escalated_permissions(self) -> bool {
matches!(self, SandboxPermissions::RequireEscalated)
}
}
impl From<bool> for SandboxPermissions {
fn from(with_escalated_permissions: bool) -> Self {
if with_escalated_permissions {
SandboxPermissions::RequireEscalated
} else {
SandboxPermissions::UseDefault
}
}
}
#[derive(Clone, Debug)]
pub struct CommandSpec {
pub program: String,

View File

@@ -61,10 +61,7 @@ impl Shell {
]
}
Shell::PowerShell(ps) => {
let mut args = vec![
ps.shell_path.to_string_lossy().to_string(),
"-NoLogo".to_string(),
];
let mut args = vec![ps.shell_path.to_string_lossy().to_string()];
if !use_login_shell {
args.push("-NoProfile".to_string());
}
@@ -192,7 +189,6 @@ pub fn detect_shell_type(shell_path: &PathBuf) -> Option<ShellType> {
Some("powershell") => Some(ShellType::PowerShell),
_ => {
let shell_name = shell_path.file_stem();
if let Some(shell_name) = shell_name
&& shell_name != shell_path
{
@@ -251,6 +247,14 @@ mod detect_shell_type_tests {
detect_shell_type(&PathBuf::from("powershell.exe")),
Some(ShellType::PowerShell)
);
assert_eq!(
detect_shell_type(&PathBuf::from(if cfg!(windows) {
"C:\\windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
} else {
"/usr/local/bin/pwsh"
})),
Some(ShellType::PowerShell)
);
assert_eq!(
detect_shell_type(&PathBuf::from("pwsh.exe")),
Some(ShellType::PowerShell)

View File

@@ -7,6 +7,7 @@ use crate::context_manager::ContextManager;
use crate::protocol::RateLimitSnapshot;
use crate::protocol::TokenUsage;
use crate::protocol::TokenUsageInfo;
use crate::truncate::TruncationPolicy;
/// Persistent, session-scoped state previously stored directly on `Session`.
pub(crate) struct SessionState {
@@ -18,20 +19,21 @@ pub(crate) struct SessionState {
impl SessionState {
/// Create a new session state mirroring previous `State::default()` semantics.
pub(crate) fn new(session_configuration: SessionConfiguration) -> Self {
let history = ContextManager::new();
Self {
session_configuration,
history: ContextManager::new(),
history,
latest_rate_limits: None,
}
}
// History helpers
pub(crate) fn record_items<I>(&mut self, items: I)
pub(crate) fn record_items<I>(&mut self, items: I, policy: TruncationPolicy)
where
I: IntoIterator,
I::Item: std::ops::Deref<Target = ResponseItem>,
{
self.history.record_items(items)
self.history.record_items(items, policy);
}
pub(crate) fn clone_history(&self) -> ContextManager {

View File

@@ -1,15 +1,14 @@
use std::sync::Arc;
use async_trait::async_trait;
use tokio_util::sync::CancellationToken;
use crate::codex::TurnContext;
use crate::compact;
use crate::state::TaskKind;
use codex_protocol::user_input::UserInput;
use super::SessionTask;
use super::SessionTaskContext;
use crate::codex::TurnContext;
use crate::features::Feature;
use crate::state::TaskKind;
use async_trait::async_trait;
use codex_app_server_protocol::AuthMode;
use codex_protocol::user_input::UserInput;
use tokio_util::sync::CancellationToken;
#[derive(Clone, Copy, Default)]
pub(crate) struct CompactTask;
@@ -27,6 +26,17 @@ impl SessionTask for CompactTask {
input: Vec<UserInput>,
_cancellation_token: CancellationToken,
) -> Option<String> {
compact::run_compact_task(session.clone_session(), ctx, input).await
let session = session.clone_session();
if session
.services
.auth_manager
.auth()
.is_some_and(|auth| auth.mode == AuthMode::ChatGPT)
&& session.enabled(Feature::RemoteCompaction).await
{
crate::compact_remote::run_remote_compact_task(session, ctx, input).await
} else {
crate::compact::run_compact_task(session, ctx, input).await
}
}
}

View File

@@ -9,10 +9,11 @@ use crate::apply_patch::convert_apply_patch_to_protocol;
use crate::codex::TurnContext;
use crate::exec::ExecParams;
use crate::exec_env::create_env;
use crate::exec_policy::approval_requirement_for_command;
use crate::exec_policy::create_approval_requirement_for_command;
use crate::function_tool::FunctionCallError;
use crate::is_safe_command::is_known_safe_command;
use crate::protocol::ExecCommandSource;
use crate::sandboxing::SandboxPermissions;
use crate::tools::context::ToolInvocation;
use crate::tools::context::ToolOutput;
use crate::tools::context::ToolPayload;
@@ -25,7 +26,6 @@ use crate::tools::runtimes::apply_patch::ApplyPatchRequest;
use crate::tools::runtimes::apply_patch::ApplyPatchRuntime;
use crate::tools::runtimes::shell::ShellRequest;
use crate::tools::runtimes::shell::ShellRuntime;
use crate::tools::sandboxing::ApprovalRequirement;
use crate::tools::sandboxing::ToolCtx;
pub struct ShellHandler;
@@ -119,7 +119,6 @@ impl ToolHandler for ShellHandler {
turn,
tracker,
call_id,
false,
)
.await
}
@@ -132,7 +131,6 @@ impl ToolHandler for ShellHandler {
turn,
tracker,
call_id,
true,
)
.await
}
@@ -180,7 +178,6 @@ impl ToolHandler for ShellCommandHandler {
turn,
tracker,
call_id,
false,
)
.await
}
@@ -194,7 +191,6 @@ impl ShellHandler {
turn: Arc<TurnContext>,
tracker: crate::tools::context::SharedTurnDiffTracker,
call_id: String,
is_user_shell_command: bool,
) -> Result<ToolOutput, FunctionCallError> {
// Approval policy guard for explicit escalation in non-OnRequest modes.
if exec_params.with_escalated_permissions.unwrap_or(false)
@@ -287,12 +283,7 @@ impl ShellHandler {
}
}
// Regular shell execution path.
let source = if is_user_shell_command {
ExecCommandSource::UserShell
} else {
ExecCommandSource::Agent
};
let source = ExecCommandSource::Agent;
let emitter =
ToolEmitter::shell(exec_params.command.clone(), exec_params.cwd.clone(), source);
let event_ctx = ToolEventCtx::new(session.as_ref(), turn.as_ref(), &call_id, None);
@@ -305,17 +296,13 @@ impl ShellHandler {
env: exec_params.env.clone(),
with_escalated_permissions: exec_params.with_escalated_permissions,
justification: exec_params.justification.clone(),
approval_requirement: if is_user_shell_command {
ApprovalRequirement::Skip
} else {
approval_requirement_for_command(
turn.exec_policy_v2.as_deref(),
&exec_params.command,
turn.approval_policy,
&turn.sandbox_policy,
exec_params.with_escalated_permissions.unwrap_or(false),
)
},
approval_requirement: create_approval_requirement_for_command(
&turn.exec_policy,
&exec_params.command,
turn.approval_policy,
&turn.sandbox_policy,
SandboxPermissions::from(exec_params.with_escalated_permissions.unwrap_or(false)),
),
};
let mut orchestrator = ToolOrchestrator::new();
let mut runtime = ShellRuntime::new();

View File

@@ -9,9 +9,7 @@ pub mod runtimes;
pub mod sandboxing;
pub mod spec;
use crate::context_manager::MODEL_FORMAT_MAX_BYTES;
use crate::context_manager::MODEL_FORMAT_MAX_LINES;
use crate::context_manager::format_output_for_model_body;
use crate::context_manager::truncate_with_line_bytes_budget;
use crate::exec::ExecToolCallOutput;
pub use router::ToolRouter;
use serde::Serialize;
@@ -22,6 +20,9 @@ pub(crate) const TELEMETRY_PREVIEW_MAX_LINES: usize = 64; // lines
pub(crate) const TELEMETRY_PREVIEW_TRUNCATION_NOTICE: &str =
"[... telemetry preview truncated ...]";
// TODO(aibrahim): migrate shell tool to use truncate text and respect config value
const SHELL_OUTPUT_MAX_BYTES: usize = 10_000;
/// Format the combined exec output for sending back to the model.
/// Includes exit code and duration metadata; truncates large bodies safely.
pub fn format_exec_output_for_model(exec_output: &ExecToolCallOutput) -> String {
@@ -77,5 +78,5 @@ pub fn format_exec_output_str(exec_output: &ExecToolCallOutput) -> String {
};
// Truncate for model consumption before serialization.
format_output_for_model_body(&body, MODEL_FORMAT_MAX_BYTES, MODEL_FORMAT_MAX_LINES)
truncate_with_line_bytes_budget(&body, SHELL_OUTPUT_MAX_BYTES)
}

View File

@@ -86,14 +86,17 @@ pub(crate) struct ApprovalCtx<'a> {
pub risk: Option<SandboxCommandAssessment>,
}
// Specifies what tool orchestrator should do with a given tool call.
#[derive(Clone, Debug, PartialEq, Eq)]
pub(crate) enum ApprovalRequirement {
/// No approval required for this tool call
Skip,
/// Approval required for this tool call
NeedsApproval { reason: Option<String> },
/// Execution forbidden for this tool call
Forbidden { reason: String },
}
/// Reflects the orchestrator's behavior (pre-refactor):
/// - Never, OnFailure: do not ask
/// - OnRequest: ask unless sandbox policy is DangerFullAccess
/// - UnlessTrusted: always ask

View File

@@ -1292,7 +1292,11 @@ mod tests {
"gpt-5-codex",
&Features::with_defaults(),
&[
"shell",
if cfg!(windows) {
"shell_command"
} else {
"shell"
},
"list_mcp_resources",
"list_mcp_resource_templates",
"read_mcp_resource",
@@ -1309,7 +1313,11 @@ mod tests {
"gpt-5.1-codex",
&Features::with_defaults(),
&[
"shell",
if cfg!(windows) {
"shell_command"
} else {
"shell"
},
"list_mcp_resources",
"list_mcp_resource_templates",
"read_mcp_resource",
@@ -1384,7 +1392,11 @@ mod tests {
"gpt-5.1-codex-mini",
&Features::with_defaults(),
&[
"shell",
if cfg!(windows) {
"shell_command"
} else {
"shell"
},
"list_mcp_resources",
"list_mcp_resource_templates",
"read_mcp_resource",

View File

@@ -5,45 +5,142 @@
use codex_protocol::models::FunctionCallOutputContentItem;
use codex_utils_string::take_bytes_at_char_boundary;
use codex_utils_string::take_last_bytes_at_char_boundary;
use codex_utils_tokenizer::Tokenizer;
/// Model-formatting limits: clients get full streams; only content sent to the model is truncated.
pub const MODEL_FORMAT_MAX_BYTES: usize = 10 * 1024; // 10 KiB
pub const MODEL_FORMAT_MAX_LINES: usize = 256; // lines
use crate::config::Config;
/// Globally truncate function output items to fit within `MODEL_FORMAT_MAX_BYTES`
/// by preserving as many text/image items as possible and appending a summary
/// for any omitted text items.
pub(crate) fn globally_truncate_function_output_items(
const APPROX_BYTES_PER_TOKEN: usize = 4;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum TruncationPolicy {
Bytes(usize),
Tokens(usize),
}
impl TruncationPolicy {
pub fn new(config: &Config) -> Self {
let config_token_limit = config.tool_output_token_limit;
match config.model_family.truncation_policy {
TruncationPolicy::Bytes(family_bytes) => {
if let Some(token_limit) = config_token_limit {
Self::Bytes(approx_bytes_for_tokens(token_limit))
} else {
Self::Bytes(family_bytes)
}
}
TruncationPolicy::Tokens(family_tokens) => {
if let Some(token_limit) = config_token_limit {
Self::Tokens(token_limit)
} else {
Self::Tokens(family_tokens)
}
}
}
}
/// Returns a token budget derived from this policy.
///
/// - For `Tokens`, this is the explicit token limit.
/// - For `Bytes`, this is an approximate token budget using the global
/// bytes-per-token heuristic.
pub fn token_budget(&self) -> usize {
match self {
TruncationPolicy::Bytes(bytes) => {
usize::try_from(approx_tokens_from_byte_count(*bytes)).unwrap_or(usize::MAX)
}
TruncationPolicy::Tokens(tokens) => *tokens,
}
}
/// Returns a byte budget derived from this policy.
///
/// - For `Bytes`, this is the explicit byte limit.
/// - For `Tokens`, this is an approximate byte budget using the global
/// bytes-per-token heuristic.
pub fn byte_budget(&self) -> usize {
match self {
TruncationPolicy::Bytes(bytes) => *bytes,
TruncationPolicy::Tokens(tokens) => approx_bytes_for_tokens(*tokens),
}
}
}
/// Format a block of exec/tool output for model consumption, truncating by
/// lines and bytes while preserving head and tail segments.
pub(crate) fn truncate_with_line_bytes_budget(content: &str, bytes_budget: usize) -> String {
// TODO(aibrahim): to be removed
let lines_budget = 256;
// Head+tail truncation for the model: show the beginning and end with an elision.
// Clients still receive full streams; only this formatted summary is capped.
let total_lines = content.lines().count();
if content.len() <= bytes_budget && total_lines <= lines_budget {
return content.to_string();
}
let output = truncate_formatted_exec_output(content, total_lines, bytes_budget, lines_budget);
format!("Total output lines: {total_lines}\n\n{output}")
}
pub(crate) fn truncate_text(content: &str, policy: TruncationPolicy) -> String {
match policy {
TruncationPolicy::Bytes(bytes) => truncate_with_byte_estimate(
content,
bytes,
TruncationSource::Policy(TruncationPolicy::Bytes(bytes)),
),
TruncationPolicy::Tokens(tokens) => {
let (truncated, _) = truncate_with_token_budget(
content,
tokens,
TruncationSource::Policy(TruncationPolicy::Tokens(tokens)),
);
truncated
}
}
}
/// Globally truncate function output items to fit within the given
/// truncation policy's budget, preserving as many text/image items as
/// possible and appending a summary for any omitted text items.
pub(crate) fn truncate_function_output_items_with_policy(
items: &[FunctionCallOutputContentItem],
policy: TruncationPolicy,
) -> Vec<FunctionCallOutputContentItem> {
let mut out: Vec<FunctionCallOutputContentItem> = Vec::with_capacity(items.len());
let mut remaining = MODEL_FORMAT_MAX_BYTES;
let mut remaining_budget = match policy {
TruncationPolicy::Bytes(_) => policy.byte_budget(),
TruncationPolicy::Tokens(_) => policy.token_budget(),
};
let mut omitted_text_items = 0usize;
for it in items {
match it {
FunctionCallOutputContentItem::InputText { text } => {
if remaining == 0 {
if remaining_budget == 0 {
omitted_text_items += 1;
continue;
}
let len = text.len();
if len <= remaining {
let cost = match policy {
TruncationPolicy::Bytes(_) => text.len(),
TruncationPolicy::Tokens(_) => approx_token_count(text),
};
if cost <= remaining_budget {
out.push(FunctionCallOutputContentItem::InputText { text: text.clone() });
remaining -= len;
remaining_budget = remaining_budget.saturating_sub(cost);
} else {
let slice = take_bytes_at_char_boundary(text, remaining);
if !slice.is_empty() {
out.push(FunctionCallOutputContentItem::InputText {
text: slice.to_string(),
});
let snippet_policy = match policy {
TruncationPolicy::Bytes(_) => TruncationPolicy::Bytes(remaining_budget),
TruncationPolicy::Tokens(_) => TruncationPolicy::Tokens(remaining_budget),
};
let snippet = truncate_text(text, snippet_policy);
if snippet.is_empty() {
omitted_text_items += 1;
} else {
out.push(FunctionCallOutputContentItem::InputText { text: snippet });
}
remaining = 0;
remaining_budget = 0;
}
}
// todo(aibrahim): handle input images; resize
FunctionCallOutputContentItem::InputImage { image_url } => {
out.push(FunctionCallOutputContentItem::InputImage {
image_url: image_url.clone(),
@@ -61,21 +158,81 @@ pub(crate) fn globally_truncate_function_output_items(
out
}
/// Format a block of exec/tool output for model consumption, truncating by
/// lines and bytes while preserving head and tail segments.
pub(crate) fn format_output_for_model_body(
content: &str,
limit_bytes: usize,
limit_lines: usize,
) -> String {
// Head+tail truncation for the model: show the beginning and end with an elision.
// Clients still receive full streams; only this formatted summary is capped.
let total_lines = content.lines().count();
if content.len() <= limit_bytes && total_lines <= limit_lines {
return content.to_string();
/// Truncate the middle of a UTF-8 string to at most `max_tokens` tokens,
/// preserving the beginning and the end. Returns the possibly truncated string
/// and `Some(original_token_count)` if truncation occurred; otherwise returns
/// the original string and `None`.
fn truncate_with_token_budget(
s: &str,
max_tokens: usize,
source: TruncationSource,
) -> (String, Option<u64>) {
if s.is_empty() {
return (String::new(), None);
}
let output = truncate_formatted_exec_output(content, total_lines, limit_bytes, limit_lines);
format!("Total output lines: {total_lines}\n\n{output}")
let byte_len = s.len();
if max_tokens > 0 {
let small_threshold = approx_bytes_for_tokens(max_tokens / 4);
if small_threshold > 0 && byte_len <= small_threshold {
return (s.to_string(), None);
}
}
let truncated = truncate_with_byte_estimate(s, approx_bytes_for_tokens(max_tokens), source);
let approx_total_usize = approx_token_count(s);
let approx_total = u64::try_from(approx_total_usize).unwrap_or(u64::MAX);
if truncated == s {
(truncated, None)
} else {
(truncated, Some(approx_total))
}
}
/// Truncate a string using a byte budget derived from the token budget, without
/// performing any real tokenization. This keeps the logic purely byte-based and
/// uses a bytes placeholder in the truncated output.
fn truncate_with_byte_estimate(s: &str, max_bytes: usize, source: TruncationSource) -> String {
if s.is_empty() {
return String::new();
}
if max_bytes == 0 {
// No budget to show content; just report that everything was truncated.
let marker = format_truncation_marker(source, removed_units_for_source(source, s.len()));
return marker;
}
if s.len() <= max_bytes {
return s.to_string();
}
let total_bytes = s.len();
let removed_bytes = total_bytes.saturating_sub(max_bytes);
let marker = format_truncation_marker(source, removed_units_for_source(source, removed_bytes));
let marker_len = marker.len();
if marker_len >= max_bytes {
let truncated_marker = truncate_on_boundary(&marker, max_bytes);
return truncated_marker.to_string();
}
let keep_budget = max_bytes - marker_len;
let (left_budget, right_budget) = split_budget(keep_budget);
let prefix_end = pick_prefix_end(s, left_budget);
let mut suffix_start = pick_suffix_start(s, right_budget);
if suffix_start < prefix_end {
suffix_start = prefix_end;
}
let mut out = assemble_truncated_output(&s[..prefix_end], &s[suffix_start..], &marker);
if out.len() > max_bytes {
let boundary = truncate_on_boundary(&out, max_bytes);
out.truncate(boundary.len());
}
out
}
fn truncate_formatted_exec_output(
@@ -114,13 +271,17 @@ fn truncate_formatted_exec_output(
let truncated_by_bytes = content.len() > limit_bytes;
// this is a bit wrong. We are counting metadata lines and not just shell output lines.
let marker = if omitted > 0 {
Some(format!(
"\n[... omitted {omitted} of {total_lines} lines ...]\n\n"
))
let marker_text = format_truncation_marker(
TruncationSource::LineOmission { total_lines },
u64::try_from(omitted).unwrap_or(u64::MAX),
);
Some(format!("\n{marker_text}\n\n"))
} else if truncated_by_bytes {
Some(format!(
"\n[... output truncated to fit {limit_bytes} bytes ...]\n\n"
))
let removed_bytes =
u64::try_from(content.len().saturating_sub(limit_bytes)).unwrap_or(u64::MAX);
let marker_text =
format_truncation_marker(TruncationSource::ByteLimit { limit_bytes }, removed_bytes);
Some(format!("\n{marker_text}\n\n"))
} else {
None
};
@@ -147,6 +308,103 @@ fn truncate_formatted_exec_output(
result
}
#[derive(Clone, Copy)]
pub enum TruncationSource {
Policy(TruncationPolicy),
LineOmission { total_lines: usize },
ByteLimit { limit_bytes: usize },
}
fn format_truncation_marker(source: TruncationSource, removed_count: u64) -> String {
match source {
TruncationSource::Policy(TruncationPolicy::Tokens(_)) => {
format!("[…{removed_count} tokens truncated…]")
}
TruncationSource::Policy(TruncationPolicy::Bytes(_)) => {
format!("[…{removed_count} bytes truncated…]")
}
TruncationSource::LineOmission { total_lines } => {
format!("[... omitted {removed_count} of {total_lines} lines ...]")
}
TruncationSource::ByteLimit { limit_bytes } => {
format!("[... removed {removed_count} bytes to fit {limit_bytes} byte limit ...]")
}
}
}
fn split_budget(budget: usize) -> (usize, usize) {
let left = budget / 2;
(left, budget - left)
}
fn removed_units_for_source(source: TruncationSource, removed_bytes: usize) -> u64 {
match source {
TruncationSource::Policy(TruncationPolicy::Tokens(_)) => {
approx_tokens_from_byte_count(removed_bytes)
}
_ => u64::try_from(removed_bytes).unwrap_or(u64::MAX),
}
}
fn assemble_truncated_output(prefix: &str, suffix: &str, marker: &str) -> String {
let mut out = String::with_capacity(prefix.len() + marker.len() + suffix.len() + 1);
out.push_str(prefix);
out.push_str(marker);
out.push('\n');
out.push_str(suffix);
out
}
pub(crate) fn approx_token_count(text: &str) -> usize {
let len = text.len();
len.saturating_add(APPROX_BYTES_PER_TOKEN.saturating_sub(1)) / APPROX_BYTES_PER_TOKEN
}
fn approx_bytes_for_tokens(tokens: usize) -> usize {
tokens.saturating_mul(APPROX_BYTES_PER_TOKEN)
}
fn approx_tokens_from_byte_count(bytes: usize) -> u64 {
let bytes_u64 = bytes as u64;
bytes_u64.saturating_add((APPROX_BYTES_PER_TOKEN as u64).saturating_sub(1))
/ (APPROX_BYTES_PER_TOKEN as u64)
}
fn truncate_on_boundary(input: &str, max_len: usize) -> &str {
if input.len() <= max_len {
return input;
}
let mut end = max_len;
while end > 0 && !input.is_char_boundary(end) {
end -= 1;
}
&input[..end]
}
fn pick_prefix_end(s: &str, left_budget: usize) -> usize {
if let Some(head) = s.get(..left_budget)
&& let Some(i) = head.rfind('\n')
{
return i + 1;
}
truncate_on_boundary(s, left_budget).len()
}
fn pick_suffix_start(s: &str, right_budget: usize) -> usize {
let start_tail = s.len().saturating_sub(right_budget);
if let Some(tail) = s.get(start_tail..)
&& let Some(i) = tail.find('\n')
{
return start_tail + i + 1;
}
let mut idx = start_tail.min(s.len());
while idx < s.len() && !s.is_char_boundary(idx) {
idx += 1;
}
idx
}
fn error_on_double_truncation(content: &str) {
if content.contains("Total output lines:") && content.contains("omitted") {
tracing::error!(
@@ -155,184 +413,31 @@ fn error_on_double_truncation(content: &str) {
}
}
/// Truncate an output string to a maximum number of “tokens”, where tokens are
/// approximated as individual `char`s. Preserves a prefix and suffix with an
/// elision marker describing how many tokens were omitted.
pub(crate) fn truncate_output_to_tokens(
output: &str,
max_tokens: usize,
) -> (String, Option<usize>) {
if max_tokens == 0 {
let total_tokens = output.chars().count();
let message = format!("{total_tokens} tokens truncated…");
return (message, Some(total_tokens));
}
let tokens: Vec<char> = output.chars().collect();
let total_tokens = tokens.len();
if total_tokens <= max_tokens {
return (output.to_string(), None);
}
let half = max_tokens / 2;
if half == 0 {
let truncated = total_tokens.saturating_sub(max_tokens);
let message = format!("{truncated} tokens truncated…");
return (message, Some(total_tokens));
}
let truncated = total_tokens.saturating_sub(half * 2);
let mut truncated_output = String::new();
truncated_output.extend(&tokens[..half]);
truncated_output.push_str(&format!("{truncated} tokens truncated…"));
truncated_output.extend(&tokens[total_tokens - half..]);
(truncated_output, Some(total_tokens))
}
/// Truncate the middle of a UTF-8 string to at most `max_bytes` bytes,
/// preserving the beginning and the end. Returns the possibly truncated
/// string and `Some(original_token_count)` (counted with the local tokenizer;
/// falls back to a 4-bytes-per-token estimate if the tokenizer cannot load)
/// if truncation occurred; otherwise returns the original string and `None`.
pub(crate) fn truncate_middle(s: &str, max_bytes: usize) -> (String, Option<u64>) {
if s.len() <= max_bytes {
return (s.to_string(), None);
}
// Build a tokenizer for counting (default to o200k_base; fall back to cl100k_base).
// If both fail, fall back to a 4-bytes-per-token estimate.
let tok = Tokenizer::try_default().ok();
let token_count = |text: &str| -> u64 {
if let Some(ref t) = tok {
t.count(text) as u64
} else {
(text.len() as u64).div_ceil(4)
}
};
let total_tokens = token_count(s);
if max_bytes == 0 {
return (
format!("{total_tokens} tokens truncated…"),
Some(total_tokens),
);
}
fn truncate_on_boundary(input: &str, max_len: usize) -> &str {
if input.len() <= max_len {
return input;
}
let mut end = max_len;
while end > 0 && !input.is_char_boundary(end) {
end -= 1;
}
&input[..end]
}
fn pick_prefix_end(s: &str, left_budget: usize) -> usize {
if let Some(head) = s.get(..left_budget)
&& let Some(i) = head.rfind('\n')
{
return i + 1;
}
truncate_on_boundary(s, left_budget).len()
}
fn pick_suffix_start(s: &str, right_budget: usize) -> usize {
let start_tail = s.len().saturating_sub(right_budget);
if let Some(tail) = s.get(start_tail..)
&& let Some(i) = tail.find('\n')
{
return start_tail + i + 1;
}
let mut idx = start_tail.min(s.len());
while idx < s.len() && !s.is_char_boundary(idx) {
idx += 1;
}
idx
}
// Iterate to stabilize marker length → keep budget → boundaries.
let mut guess_tokens: u64 = 1;
for _ in 0..4 {
let marker = format!("{guess_tokens} tokens truncated…");
let marker_len = marker.len();
let keep_budget = max_bytes.saturating_sub(marker_len);
if keep_budget == 0 {
return (
format!("{total_tokens} tokens truncated…"),
Some(total_tokens),
);
}
let left_budget = keep_budget / 2;
let right_budget = keep_budget - left_budget;
let prefix_end = pick_prefix_end(s, left_budget);
let mut suffix_start = pick_suffix_start(s, right_budget);
if suffix_start < prefix_end {
suffix_start = prefix_end;
}
// Tokens actually removed (middle slice) using the real tokenizer.
let removed_tokens = token_count(&s[prefix_end..suffix_start]);
// If the number of digits in the token count does not change the marker length,
// we can finalize output.
let final_marker = format!("{removed_tokens} tokens truncated…");
if final_marker.len() == marker_len {
let kept_content_bytes = prefix_end + (s.len() - suffix_start);
let mut out = String::with_capacity(final_marker.len() + kept_content_bytes + 1);
out.push_str(&s[..prefix_end]);
out.push_str(&final_marker);
out.push('\n');
out.push_str(&s[suffix_start..]);
return (out, Some(total_tokens));
}
guess_tokens = removed_tokens;
}
// Fallback build after iterations: compute with the last guess.
let marker = format!("{guess_tokens} tokens truncated…");
let marker_len = marker.len();
let keep_budget = max_bytes.saturating_sub(marker_len);
if keep_budget == 0 {
return (
format!("{total_tokens} tokens truncated…"),
Some(total_tokens),
);
}
let left_budget = keep_budget / 2;
let right_budget = keep_budget - left_budget;
let prefix_end = pick_prefix_end(s, left_budget);
let mut suffix_start = pick_suffix_start(s, right_budget);
if suffix_start < prefix_end {
suffix_start = prefix_end;
}
let mut out = String::with_capacity(marker_len + prefix_end + (s.len() - suffix_start) + 1);
out.push_str(&s[..prefix_end]);
out.push_str(&marker);
out.push('\n');
out.push_str(&s[suffix_start..]);
(out, Some(total_tokens))
}
#[cfg(test)]
mod tests {
use super::MODEL_FORMAT_MAX_BYTES;
use super::MODEL_FORMAT_MAX_LINES;
use super::format_output_for_model_body;
use super::globally_truncate_function_output_items;
use super::truncate_middle;
use super::truncate_output_to_tokens;
use crate::config::OPENAI_DEFAULT_MODEL;
use crate::model_family::derive_default_model_family;
use crate::model_family::find_family_for_model;
use super::TruncationPolicy;
use super::TruncationSource;
use super::approx_token_count;
use super::truncate_function_output_items_with_policy;
use super::truncate_with_line_bytes_budget;
use super::truncate_with_token_budget;
use codex_protocol::models::FunctionCallOutputContentItem;
use codex_utils_tokenizer::Tokenizer;
use pretty_assertions::assert_eq;
use regex_lite::Regex;
const MODEL_FORMAT_MAX_LINES: usize = 256;
fn model_format_max_bytes() -> usize {
find_family_for_model(OPENAI_DEFAULT_MODEL)
.unwrap_or_else(|| derive_default_model_family(OPENAI_DEFAULT_MODEL))
.truncation_policy
.byte_budget()
}
fn truncated_message_pattern(line: &str, total_lines: usize) -> String {
let head_lines = MODEL_FORMAT_MAX_LINES / 2;
let tail_lines = MODEL_FORMAT_MAX_LINES - head_lines;
@@ -342,7 +447,8 @@ mod tests {
let escaped_line = regex_lite::escape(line);
if omitted == 0 {
return format!(
r"(?s)^Total output lines: {total_lines}\n\n(?P<body>{escaped_line}.*\n\[\.{{3}} output truncated to fit {MODEL_FORMAT_MAX_BYTES} bytes \.{{3}}]\n\n.*)$",
r"(?s)^Total output lines: {total_lines}\n\n(?P<body>{escaped_line}.*\n\[\.{{3}} removed \d+ bytes to fit {max_bytes} byte limit \.{{3}}]\n\n.*)$",
max_bytes = model_format_max_bytes(),
);
}
format!(
@@ -351,88 +457,46 @@ mod tests {
}
#[test]
fn truncate_middle_no_newlines_fallback() {
let tok = Tokenizer::try_default().expect("load tokenizer");
let s = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ*";
let max_bytes = 32;
let (out, original) = truncate_middle(s, max_bytes);
assert!(out.starts_with("abc"));
assert!(out.contains("tokens truncated"));
assert!(out.ends_with("XYZ*"));
assert_eq!(original, Some(tok.count(s) as u64));
}
#[test]
fn truncate_middle_prefers_newline_boundaries() {
let tok = Tokenizer::try_default().expect("load tokenizer");
let mut s = String::new();
for i in 1..=20 {
s.push_str(&format!("{i:03}\n"));
}
assert_eq!(s.len(), 80);
let max_bytes = 64;
let (out, tokens) = truncate_middle(&s, max_bytes);
assert!(out.starts_with("001\n002\n003\n004\n"));
assert!(out.contains("tokens truncated"));
assert!(out.ends_with("017\n018\n019\n020\n"));
assert_eq!(tokens, Some(tok.count(&s) as u64));
}
#[test]
fn truncate_middle_handles_utf8_content() {
let tok = Tokenizer::try_default().expect("load tokenizer");
let s = "😀😀😀😀😀😀😀😀😀😀\nsecond line with ascii text\n";
let max_bytes = 32;
let (out, tokens) = truncate_middle(s, max_bytes);
assert!(out.contains("tokens truncated"));
assert!(!out.contains('\u{fffd}'));
assert_eq!(tokens, Some(tok.count(s) as u64));
}
#[test]
fn truncate_middle_prefers_newline_boundaries_2() {
let tok = Tokenizer::try_default().expect("load tokenizer");
// Build a multi-line string of 20 numbered lines (each "NNN\n").
let mut s = String::new();
for i in 1..=20 {
s.push_str(&format!("{i:03}\n"));
}
assert_eq!(s.len(), 80);
let max_bytes = 64;
let (out, total) = truncate_middle(&s, max_bytes);
assert!(out.starts_with("001\n002\n003\n004\n"));
assert!(out.contains("tokens truncated"));
assert!(out.ends_with("017\n018\n019\n020\n"));
assert_eq!(total, Some(tok.count(&s) as u64));
}
#[test]
fn truncate_output_to_tokens_returns_original_when_under_limit() {
fn truncate_middle_returns_original_when_under_limit() {
let s = "short output";
let (truncated, original) = truncate_output_to_tokens(s, 100);
assert_eq!(truncated, s);
let limit = 100;
let source = TruncationSource::Policy(TruncationPolicy::Tokens(limit));
let (out, original) = truncate_with_token_budget(s, limit, source);
assert_eq!(out, s);
assert_eq!(original, None);
}
#[test]
fn truncate_output_to_tokens_reports_truncation_at_zero_limit() {
fn truncate_middle_reports_truncation_at_zero_limit() {
let s = "abcdef";
let (truncated, original) = truncate_output_to_tokens(s, 0);
assert!(truncated.contains("tokens truncated"));
assert_eq!(original, Some(s.chars().count()));
let source = TruncationSource::Policy(TruncationPolicy::Tokens(0));
let (out, original) = truncate_with_token_budget(s, 0, source);
assert_eq!(out, "[…2 tokens truncated…]");
assert_eq!(original, Some(approx_token_count(s) as u64));
}
#[test]
fn truncate_output_to_tokens_preserves_prefix_and_suffix() {
let s = "abcdefghijklmnopqrstuvwxyz";
let max_tokens = 10;
let (truncated, original) = truncate_output_to_tokens(s, max_tokens);
assert!(truncated.starts_with("abcde"));
assert!(truncated.ends_with("vwxyz"));
assert_eq!(original, Some(s.chars().count()));
fn truncate_middle_enforces_token_budget() {
let s = "alpha beta gamma delta epsilon zeta eta theta iota kappa";
let max_tokens = 12;
let source = TruncationSource::Policy(TruncationPolicy::Tokens(max_tokens));
let (out, original) = truncate_with_token_budget(s, max_tokens, source);
assert!(out.contains("tokens truncated"));
assert_eq!(original, Some(approx_token_count(s) as u64));
assert!(out.len() < s.len(), "truncated output should be shorter");
}
#[test]
fn truncate_middle_handles_utf8_content() {
let s = "😀😀😀😀😀😀😀😀😀😀\nsecond line with text\n";
let max_tokens = 8;
let source = TruncationSource::Policy(TruncationPolicy::Tokens(max_tokens));
let (out, tokens) = truncate_with_token_budget(s, max_tokens, source);
assert!(out.contains("tokens truncated"));
assert!(!out.contains('\u{fffd}'));
assert_eq!(tokens, Some(approx_token_count(s) as u64));
assert!(out.len() < s.len(), "UTF-8 content should be shortened");
}
#[test]
@@ -440,11 +504,7 @@ mod tests {
let line = "very long execution error line that should trigger truncation\n";
let large_error = line.repeat(2_500); // way beyond both byte and line limits
let truncated = format_output_for_model_body(
&large_error,
MODEL_FORMAT_MAX_BYTES,
MODEL_FORMAT_MAX_LINES,
);
let truncated = truncate_with_line_bytes_budget(&large_error, model_format_max_bytes());
let total_lines = large_error.lines().count();
let pattern = truncated_message_pattern(line, total_lines);
@@ -459,7 +519,7 @@ mod tests {
.expect("missing body capture")
.as_str();
assert!(
body.len() <= MODEL_FORMAT_MAX_BYTES,
body.len() <= model_format_max_bytes(),
"body exceeds byte limit: {} bytes",
body.len()
);
@@ -468,16 +528,14 @@ mod tests {
#[test]
fn format_exec_output_marks_byte_truncation_without_omitted_lines() {
let long_line = "a".repeat(MODEL_FORMAT_MAX_BYTES + 50);
let truncated = format_output_for_model_body(
&long_line,
MODEL_FORMAT_MAX_BYTES,
MODEL_FORMAT_MAX_LINES,
);
let max_bytes = model_format_max_bytes();
let long_line = "a".repeat(max_bytes + 50);
let truncated = truncate_with_line_bytes_budget(&long_line, max_bytes);
assert_ne!(truncated, long_line);
let removed_bytes = long_line.len().saturating_sub(max_bytes);
let marker_line =
format!("[... output truncated to fit {MODEL_FORMAT_MAX_BYTES} bytes ...]");
format!("[... removed {removed_bytes} bytes to fit {max_bytes} byte limit ...]");
assert!(
truncated.contains(&marker_line),
"missing byte truncation marker: {truncated}"
@@ -493,7 +551,7 @@ mod tests {
let content = "example output\n".repeat(10);
assert_eq!(
format_output_for_model_body(&content, MODEL_FORMAT_MAX_BYTES, MODEL_FORMAT_MAX_LINES),
truncate_with_line_bytes_budget(&content, model_format_max_bytes()),
content
);
}
@@ -505,8 +563,7 @@ mod tests {
.map(|idx| format!("line-{idx}\n"))
.collect();
let truncated =
format_output_for_model_body(&content, MODEL_FORMAT_MAX_BYTES, MODEL_FORMAT_MAX_LINES);
let truncated = truncate_with_line_bytes_budget(&content, model_format_max_bytes());
let omitted = total_lines - MODEL_FORMAT_MAX_LINES;
let expected_marker = format!("[... omitted {omitted} of {total_lines} lines ...]");
@@ -535,39 +592,33 @@ mod tests {
.map(|idx| format!("line-{idx}-{long_line}\n"))
.collect();
let truncated =
format_output_for_model_body(&content, MODEL_FORMAT_MAX_BYTES, MODEL_FORMAT_MAX_LINES);
let truncated = truncate_with_line_bytes_budget(&content, model_format_max_bytes());
assert!(
truncated.contains("[... omitted 42 of 298 lines ...]"),
"expected omitted marker when line count exceeds limit: {truncated}"
);
assert!(
!truncated.contains("output truncated to fit"),
!truncated.contains("byte limit"),
"line omission marker should take precedence over byte marker: {truncated}"
);
}
#[test]
fn truncates_across_multiple_under_limit_texts_and_reports_omitted() {
// Arrange: several text items, none exceeding per-item limit, but total exceeds budget.
let budget = MODEL_FORMAT_MAX_BYTES;
let t1_len = (budget / 2).saturating_sub(10);
let t2_len = (budget / 2).saturating_sub(10);
let remaining_after_t1_t2 = budget.saturating_sub(t1_len + t2_len);
let t3_len = 50; // gets truncated to remaining_after_t1_t2
let t4_len = 5; // omitted
let t5_len = 7; // omitted
let t1 = "a".repeat(t1_len);
let t2 = "b".repeat(t2_len);
let t3 = "c".repeat(t3_len);
let t4 = "d".repeat(t4_len);
let t5 = "e".repeat(t5_len);
let chunk = "alpha beta gamma delta epsilon zeta eta theta iota kappa lambda mu nu xi omicron pi rho sigma tau upsilon phi chi psi omega.\n";
let chunk_tokens = approx_token_count(chunk);
assert!(chunk_tokens > 0, "chunk must consume tokens");
let limit = chunk_tokens * 3;
let t1 = chunk.to_string();
let t2 = chunk.to_string();
let t3 = chunk.repeat(10);
let t4 = chunk.to_string();
let t5 = chunk.to_string();
let items = vec![
FunctionCallOutputContentItem::InputText { text: t1 },
FunctionCallOutputContentItem::InputText { text: t2 },
FunctionCallOutputContentItem::InputText { text: t1.clone() },
FunctionCallOutputContentItem::InputText { text: t2.clone() },
FunctionCallOutputContentItem::InputImage {
image_url: "img:mid".to_string(),
},
@@ -576,7 +627,8 @@ mod tests {
FunctionCallOutputContentItem::InputText { text: t5 },
];
let output = globally_truncate_function_output_items(&items);
let output =
truncate_function_output_items_with_policy(&items, TruncationPolicy::Tokens(limit));
// Expect: t1 (full), t2 (full), image, t3 (truncated), summary mentioning 2 omitted.
assert_eq!(output.len(), 5);
@@ -585,13 +637,13 @@ mod tests {
FunctionCallOutputContentItem::InputText { text } => text,
other => panic!("unexpected first item: {other:?}"),
};
assert_eq!(first_text.len(), t1_len);
assert_eq!(first_text, &t1);
let second_text = match &output[1] {
FunctionCallOutputContentItem::InputText { text } => text,
other => panic!("unexpected second item: {other:?}"),
};
assert_eq!(second_text.len(), t2_len);
assert_eq!(second_text, &t2);
assert_eq!(
output[2],
@@ -604,7 +656,10 @@ mod tests {
FunctionCallOutputContentItem::InputText { text } => text,
other => panic!("unexpected fourth item: {other:?}"),
};
assert_eq!(fourth_text.len(), remaining_after_t1_t2);
assert!(
fourth_text.contains("tokens truncated"),
"expected marker in truncated snippet: {fourth_text}"
);
let summary_text = match &output[4] {
FunctionCallOutputContentItem::InputText { text } => text,

View File

@@ -45,6 +45,7 @@ pub(crate) const MIN_YIELD_TIME_MS: u64 = 250;
pub(crate) const MAX_YIELD_TIME_MS: u64 = 30_000;
pub(crate) const DEFAULT_MAX_OUTPUT_TOKENS: usize = 10_000;
pub(crate) const UNIFIED_EXEC_OUTPUT_MAX_BYTES: usize = 1024 * 1024; // 1 MiB
pub(crate) const UNIFIED_EXEC_OUTPUT_MAX_TOKENS: usize = UNIFIED_EXEC_OUTPUT_MAX_BYTES / 4;
pub(crate) struct UnifiedExecContext {
pub session: Arc<Session>,

View File

@@ -14,11 +14,13 @@ use crate::exec::ExecToolCallOutput;
use crate::exec::SandboxType;
use crate::exec::StreamOutput;
use crate::exec::is_likely_sandbox_denied;
use crate::truncate::truncate_middle;
use crate::truncate::TruncationPolicy;
use crate::truncate::truncate_text;
use codex_utils_pty::ExecCommandSession;
use codex_utils_pty::SpawnedPty;
use super::UNIFIED_EXEC_OUTPUT_MAX_BYTES;
use super::UNIFIED_EXEC_OUTPUT_MAX_TOKENS;
use super::UnifiedExecError;
#[derive(Debug, Default)]
@@ -165,7 +167,10 @@ impl UnifiedExecSession {
};
if is_likely_sandbox_denied(self.sandbox_type(), &exec_output) {
let (snippet, _) = truncate_middle(&aggregated_text, UNIFIED_EXEC_OUTPUT_MAX_BYTES);
let snippet = truncate_text(
&aggregated_text,
TruncationPolicy::Tokens(UNIFIED_EXEC_OUTPUT_MAX_TOKENS),
);
let message = if snippet.is_empty() {
format!("exit code {exit_code}")
} else {

View File

@@ -11,11 +11,12 @@ use crate::codex::TurnContext;
use crate::exec::ExecToolCallOutput;
use crate::exec::StreamOutput;
use crate::exec_env::create_env;
use crate::exec_policy::approval_requirement_for_command;
use crate::exec_policy::create_approval_requirement_for_command;
use crate::protocol::BackgroundEventEvent;
use crate::protocol::EventMsg;
use crate::protocol::ExecCommandSource;
use crate::sandboxing::ExecEnv;
use crate::sandboxing::SandboxPermissions;
use crate::tools::events::ToolEmitter;
use crate::tools::events::ToolEventCtx;
use crate::tools::events::ToolEventFailure;
@@ -24,6 +25,9 @@ use crate::tools::orchestrator::ToolOrchestrator;
use crate::tools::runtimes::unified_exec::UnifiedExecRequest as UnifiedExecToolRequest;
use crate::tools::runtimes::unified_exec::UnifiedExecRuntime;
use crate::tools::sandboxing::ToolCtx;
use crate::truncate::TruncationPolicy;
use crate::truncate::approx_token_count;
use crate::truncate::truncate_text;
use super::ExecCommandRequest;
use super::SessionEntry;
@@ -37,7 +41,6 @@ use super::generate_chunk_id;
use super::resolve_max_tokens;
use super::session::OutputBuffer;
use super::session::UnifiedExecSession;
use crate::truncate::truncate_output_to_tokens;
impl UnifiedExecSessionManager {
pub(crate) async fn exec_command(
@@ -71,7 +74,7 @@ impl UnifiedExecSessionManager {
let wall_time = Instant::now().saturating_duration_since(start);
let text = String::from_utf8_lossy(&collected).to_string();
let (output, original_token_count) = truncate_output_to_tokens(&text, max_tokens);
let output = truncate_text(&text, TruncationPolicy::Tokens(max_tokens));
let chunk_id = generate_chunk_id();
let has_exited = session.has_exited();
let stored_id = self
@@ -86,6 +89,8 @@ impl UnifiedExecSessionManager {
// Only include a session_id in the response if the process is still alive.
let session_id = if has_exited { None } else { Some(stored_id) };
let original_token_count = approx_token_count(&text);
let response = UnifiedExecResponse {
event_call_id: context.call_id.clone(),
chunk_id,
@@ -93,7 +98,7 @@ impl UnifiedExecSessionManager {
output,
session_id,
exit_code: exit_code.flatten(),
original_token_count,
original_token_count: Some(original_token_count),
session_command: Some(request.command.clone()),
};
@@ -176,7 +181,8 @@ impl UnifiedExecSessionManager {
let wall_time = Instant::now().saturating_duration_since(start);
let text = String::from_utf8_lossy(&collected).to_string();
let (output, original_token_count) = truncate_output_to_tokens(&text, max_tokens);
let output = truncate_text(&text, TruncationPolicy::Tokens(max_tokens));
let original_token_count = approx_token_count(&text);
let chunk_id = generate_chunk_id();
let status = self.refresh_session_state(session_id).await;
@@ -200,7 +206,7 @@ impl UnifiedExecSessionManager {
output,
session_id,
exit_code,
original_token_count,
original_token_count: Some(original_token_count),
session_command: Some(session_command.clone()),
};
@@ -445,12 +451,12 @@ impl UnifiedExecSessionManager {
create_env(&context.turn.shell_environment_policy),
with_escalated_permissions,
justification,
approval_requirement_for_command(
context.turn.exec_policy_v2.as_deref(),
create_approval_requirement_for_command(
&context.turn.exec_policy,
command,
context.turn.approval_policy,
&context.turn.sandbox_policy,
with_escalated_permissions.unwrap_or(false),
SandboxPermissions::from(with_escalated_permissions.unwrap_or(false)),
),
);
let tool_ctx = ToolCtx {

View File

@@ -0,0 +1,14 @@
## Exploration and reading files
- **Think first.** Before any tool call, decide ALL files/resources you will need.
- **Batch everything.** If you need multiple files (even from different places), read them together.
- **multi_tool_use.parallel** Use `multi_tool_use.parallel` to parallelize tool calls and only this.
- **Only make sequential calls if you truly cannot know the next file without seeing a result first.**
- **Workflow:** (a) plan all needed reads → (b) issue one parallel batch → (c) analyze results → (d) repeat if new, unpredictable reads arise.
**Additional notes**:
* Always maximize parallelism. Never read files one-by-one unless logically unavoidable.
* This concern every read/list/search operations including, but not only, `cat`, `rg`, `sed`, `ls`, `git show`, `nl`, `wc`, ...
* Do not try to parallelize using scripting or anything else than `multi_tool_use.parallel`.
## Editing constraints

View File

@@ -499,6 +499,14 @@ fn base_mock() -> (MockBuilder, ResponseMock) {
(mock, response_mock)
}
fn compact_mock() -> (MockBuilder, ResponseMock) {
let response_mock = ResponseMock::new();
let mock = Mock::given(method("POST"))
.and(path_regex(".*/responses/compact$"))
.and(response_mock.clone());
(mock, response_mock)
}
pub async fn mount_sse_once_match<M>(server: &MockServer, matcher: M, body: String) -> ResponseMock
where
M: wiremock::Match + Send + Sync + 'static,
@@ -521,6 +529,40 @@ pub async fn mount_sse_once(server: &MockServer, body: String) -> ResponseMock {
response_mock
}
pub async fn mount_compact_json_once_match<M>(
server: &MockServer,
matcher: M,
body: serde_json::Value,
) -> ResponseMock
where
M: wiremock::Match + Send + Sync + 'static,
{
let (mock, response_mock) = compact_mock();
mock.and(matcher)
.respond_with(
ResponseTemplate::new(200)
.insert_header("content-type", "application/json")
.set_body_json(body.clone()),
)
.up_to_n_times(1)
.mount(server)
.await;
response_mock
}
pub async fn mount_compact_json_once(server: &MockServer, body: serde_json::Value) -> ResponseMock {
let (mock, response_mock) = compact_mock();
mock.respond_with(
ResponseTemplate::new(200)
.insert_header("content-type", "application/json")
.set_body_json(body.clone()),
)
.up_to_n_times(1)
.mount(server)
.await;
response_mock
}
pub async fn start_mock_server() -> MockServer {
MockServer::builder()
.body_print_limit(BodyPrintLimit::Limited(80_000))

View File

@@ -49,6 +49,7 @@ pub enum ShellModelOutput {
pub struct TestCodexBuilder {
config_mutators: Vec<Box<ConfigMutator>>,
auth: CodexAuth,
}
impl TestCodexBuilder {
@@ -60,6 +61,11 @@ impl TestCodexBuilder {
self
}
pub fn with_auth(mut self, auth: CodexAuth) -> Self {
self.auth = auth;
self
}
pub fn with_model(self, model: &str) -> Self {
let new_model = model.to_string();
self.with_config(move |config| {
@@ -90,13 +96,12 @@ impl TestCodexBuilder {
) -> anyhow::Result<TestCodex> {
let (config, cwd) = self.prepare_config(server, &home).await?;
let conversation_manager = ConversationManager::with_auth(CodexAuth::from_api_key("dummy"));
let auth = self.auth.clone();
let conversation_manager = ConversationManager::with_auth(auth.clone());
let new_conversation = match resume_from {
Some(path) => {
let auth_manager = codex_core::AuthManager::from_auth_for_testing(
CodexAuth::from_api_key("dummy"),
);
let auth_manager = codex_core::AuthManager::from_auth_for_testing(auth);
conversation_manager
.resume_conversation_from_rollout(config.clone(), path, auth_manager)
.await?
@@ -345,5 +350,6 @@ fn function_call_output<'a>(bodies: &'a [Value], call_id: &str) -> &'a Value {
pub fn test_codex() -> TestCodexBuilder {
TestCodexBuilder {
config_mutators: vec![],
auth: CodexAuth::from_api_key("dummy"),
}
}

View File

@@ -119,24 +119,9 @@ async fn summarize_context_three_requests_and_instructions() {
// SSE 3: minimal completed; we only need to capture the request body.
let sse3 = sse(vec![ev_completed("r3")]);
// Mount three expectations, one per request, matched by body content.
let first_matcher = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains("\"text\":\"hello world\"") && !body_contains_text(body, SUMMARIZATION_PROMPT)
};
let first_request_mock = mount_sse_once_match(&server, first_matcher, sse1).await;
let second_matcher = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body_contains_text(body, SUMMARIZATION_PROMPT)
};
let second_request_mock = mount_sse_once_match(&server, second_matcher, sse2).await;
let third_matcher = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains(&format!("\"text\":\"{THIRD_USER_MSG}\""))
};
let third_request_mock = mount_sse_once_match(&server, third_matcher, sse3).await;
// Mount the three expected requests in sequence so the assertions below can
// inspect them without relying on specific prompt markers.
let request_log = mount_sse_sequence(&server, vec![sse1, sse2, sse3]).await;
// Build config pointing to the mock server and spawn Codex.
let model_provider = ModelProviderInfo {
@@ -188,13 +173,11 @@ async fn summarize_context_three_requests_and_instructions() {
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
// Inspect the three captured requests.
let req1 = first_request_mock.single_request();
let req2 = second_request_mock.single_request();
let req3 = third_request_mock.single_request();
let body1 = req1.body_json();
let body2 = req2.body_json();
let body3 = req3.body_json();
let requests = request_log.requests();
assert_eq!(requests.len(), 3, "expected exactly three requests");
let body1 = requests[0].body_json();
let body2 = requests[1].body_json();
let body3 = requests[2].body_json();
// Manual compact should keep the baseline developer instructions.
let instr1 = body1.get("instructions").and_then(|v| v.as_str()).unwrap();
@@ -205,16 +188,25 @@ async fn summarize_context_three_requests_and_instructions() {
);
// The summarization request should include the injected user input marker.
let body2_str = body2.to_string();
let input2 = body2.get("input").and_then(|v| v.as_array()).unwrap();
// The last item is the user message created from the injected input.
let last2 = input2.last().unwrap();
assert_eq!(last2.get("type").unwrap().as_str().unwrap(), "message");
assert_eq!(last2.get("role").unwrap().as_str().unwrap(), "user");
let text2 = last2["content"][0]["text"].as_str().unwrap();
assert_eq!(
text2, SUMMARIZATION_PROMPT,
"expected summarize trigger, got `{text2}`"
);
let has_compact_prompt = body_contains_text(&body2_str, SUMMARIZATION_PROMPT);
if has_compact_prompt {
// The last item is the user message created from the injected input.
let last2 = input2.last().unwrap();
assert_eq!(last2.get("type").unwrap().as_str().unwrap(), "message");
assert_eq!(last2.get("role").unwrap().as_str().unwrap(), "user");
let text2 = last2["content"][0]["text"].as_str().unwrap();
assert_eq!(
text2, SUMMARIZATION_PROMPT,
"expected summarize trigger, got `{text2}`"
);
} else {
assert!(
!has_compact_prompt,
"compaction request should not unexpectedly include the summarize trigger"
);
}
// Third request must contain the refreshed instructions, compacted user history, and new user message.
let input3 = body3.get("input").and_then(|v| v.as_array()).unwrap();
@@ -379,8 +371,19 @@ async fn manual_compact_uses_custom_prompt() {
}
}
assert!(found_custom_prompt, "custom prompt should be injected");
assert!(!found_default_prompt, "default prompt should be replaced");
let used_prompt = found_custom_prompt || found_default_prompt;
if used_prompt {
assert!(found_custom_prompt, "custom prompt should be injected");
assert!(
!found_default_prompt,
"default prompt should be replaced when a compact prompt is used"
);
} else {
assert!(
!found_default_prompt,
"summarization prompt should not appear if compaction omits a prompt"
);
}
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
@@ -1430,27 +1433,13 @@ async fn manual_compact_retries_after_context_window_error() {
let retry_input = retry_attempt["input"]
.as_array()
.unwrap_or_else(|| panic!("retry attempt missing input array: {retry_attempt}"));
let compact_contains_prompt =
body_contains_text(&compact_attempt.to_string(), SUMMARIZATION_PROMPT);
let retry_contains_prompt =
body_contains_text(&retry_attempt.to_string(), SUMMARIZATION_PROMPT);
assert_eq!(
compact_input
.last()
.and_then(|item| item.get("content"))
.and_then(|v| v.as_array())
.and_then(|items| items.first())
.and_then(|entry| entry.get("text"))
.and_then(|text| text.as_str()),
Some(SUMMARIZATION_PROMPT),
"compact attempt should include summarization prompt"
);
assert_eq!(
retry_input
.last()
.and_then(|item| item.get("content"))
.and_then(|v| v.as_array())
.and_then(|items| items.first())
.and_then(|entry| entry.get("text"))
.and_then(|text| text.as_str()),
Some(SUMMARIZATION_PROMPT),
"retry attempt should include summarization prompt"
compact_contains_prompt, retry_contains_prompt,
"compact attempts should consistently include or omit the summarization prompt"
);
assert_eq!(
retry_input.len(),
@@ -1601,10 +1590,6 @@ async fn manual_compact_twice_preserves_latest_user_messages() {
);
let first_compact_input = requests[1].input();
assert!(
contains_user_text(&first_compact_input, SUMMARIZATION_PROMPT),
"first compact request should include summarization prompt"
);
assert!(
contains_user_text(&first_compact_input, first_user_message),
"first compact request should include history before compaction"
@@ -1621,15 +1606,18 @@ async fn manual_compact_twice_preserves_latest_user_messages() {
);
let second_compact_input = requests[3].input();
assert!(
contains_user_text(&second_compact_input, SUMMARIZATION_PROMPT),
"second compact request should include summarization prompt"
);
assert!(
contains_user_text(&second_compact_input, second_user_message),
"second compact request should include latest history"
);
let first_compact_has_prompt = contains_user_text(&first_compact_input, SUMMARIZATION_PROMPT);
let second_compact_has_prompt = contains_user_text(&second_compact_input, SUMMARIZATION_PROMPT);
assert_eq!(
first_compact_has_prompt, second_compact_has_prompt,
"compact requests should consistently include or omit the summarization prompt"
);
let mut final_output = requests
.last()
.unwrap_or_else(|| panic!("final turn request missing for {final_user_message}"))

View File

@@ -0,0 +1,217 @@
#![allow(clippy::expect_used)]
use std::fs;
use anyhow::Result;
use codex_core::CodexAuth;
use codex_core::features::Feature;
use codex_core::protocol::EventMsg;
use codex_core::protocol::Op;
use codex_core::protocol::RolloutItem;
use codex_core::protocol::RolloutLine;
use codex_protocol::models::ContentItem;
use codex_protocol::models::ResponseItem;
use codex_protocol::user_input::UserInput;
use core_test_support::responses;
use core_test_support::skip_if_no_network;
use core_test_support::test_codex::TestCodexHarness;
use core_test_support::test_codex::test_codex;
use core_test_support::wait_for_event;
use pretty_assertions::assert_eq;
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn remote_compact_replaces_history_for_followups() -> Result<()> {
skip_if_no_network!(Ok(()));
let harness = TestCodexHarness::with_builder(
test_codex()
.with_auth(CodexAuth::create_dummy_chatgpt_auth_for_testing())
.with_config(|config| {
config.features.enable(Feature::RemoteCompaction);
}),
)
.await?;
let codex = harness.test().codex.clone();
let responses_mock = responses::mount_sse_sequence(
harness.server(),
vec![
responses::sse(vec![
responses::ev_assistant_message("m1", "FIRST_REMOTE_REPLY"),
responses::ev_completed("resp-1"),
]),
responses::sse(vec![
responses::ev_assistant_message("m2", "AFTER_COMPACT_REPLY"),
responses::ev_completed("resp-2"),
]),
],
)
.await;
let compacted_history = vec![ResponseItem::Message {
id: None,
role: "user".to_string(),
content: vec![ContentItem::InputText {
text: "REMOTE_COMPACTED_SUMMARY".to_string(),
}],
}];
let compact_mock = responses::mount_compact_json_once(
harness.server(),
serde_json::json!({ "output": compacted_history.clone() }),
)
.await;
codex
.submit(Op::UserInput {
items: vec![UserInput::Text {
text: "hello remote compact".into(),
}],
})
.await?;
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
codex.submit(Op::Compact).await?;
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
codex
.submit(Op::UserInput {
items: vec![UserInput::Text {
text: "after compact".into(),
}],
})
.await?;
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
let compact_request = compact_mock.single_request();
assert_eq!(compact_request.path(), "/v1/responses/compact");
assert_eq!(
compact_request.header("chatgpt-account-id").as_deref(),
Some("account_id")
);
assert_eq!(
compact_request.header("authorization").as_deref(),
Some("Bearer Access Token")
);
let compact_body = compact_request.body_json();
assert_eq!(
compact_body.get("model").and_then(|v| v.as_str()),
Some(harness.test().session_configured.model.as_str())
);
let compact_body_text = compact_body.to_string();
assert!(
compact_body_text.contains("hello remote compact"),
"expected compact request to include user history"
);
assert!(
compact_body_text.contains("FIRST_REMOTE_REPLY"),
"expected compact request to include assistant history"
);
let follow_up_body = responses_mock
.requests()
.last()
.expect("follow-up request missing")
.body_json()
.to_string();
assert!(
follow_up_body.contains("REMOTE_COMPACTED_SUMMARY"),
"expected follow-up request to use compacted history"
);
assert!(
!follow_up_body.contains("FIRST_REMOTE_REPLY"),
"expected follow-up request to drop pre-compaction assistant messages"
);
Ok(())
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn remote_compact_persists_replacement_history_in_rollout() -> Result<()> {
skip_if_no_network!(Ok(()));
let harness = TestCodexHarness::with_builder(
test_codex()
.with_auth(CodexAuth::create_dummy_chatgpt_auth_for_testing())
.with_config(|config| {
config.features.enable(Feature::RemoteCompaction);
}),
)
.await?;
let codex = harness.test().codex.clone();
let rollout_path = harness.test().session_configured.rollout_path.clone();
let responses_mock = responses::mount_sse_once(
harness.server(),
responses::sse(vec![
responses::ev_assistant_message("m1", "COMPACT_BASELINE_REPLY"),
responses::ev_completed("resp-1"),
]),
)
.await;
let compacted_history = vec![
ResponseItem::Message {
id: None,
role: "user".to_string(),
content: vec![ContentItem::InputText {
text: "COMPACTED_USER_SUMMARY".to_string(),
}],
},
ResponseItem::Message {
id: None,
role: "assistant".to_string(),
content: vec![ContentItem::OutputText {
text: "COMPACTED_ASSISTANT_NOTE".to_string(),
}],
},
];
let compact_mock = responses::mount_compact_json_once(
harness.server(),
serde_json::json!({ "output": compacted_history.clone() }),
)
.await;
codex
.submit(Op::UserInput {
items: vec![UserInput::Text {
text: "needs compaction".into(),
}],
})
.await?;
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
codex.submit(Op::Compact).await?;
wait_for_event(&codex, |ev| matches!(ev, EventMsg::TaskComplete(_))).await;
codex.submit(Op::Shutdown).await?;
wait_for_event(&codex, |ev| matches!(ev, EventMsg::ShutdownComplete)).await;
assert_eq!(responses_mock.requests().len(), 1);
assert_eq!(compact_mock.requests().len(), 1);
let rollout_text = fs::read_to_string(&rollout_path)?;
let mut saw_compacted_history = false;
for line in rollout_text
.lines()
.map(str::trim)
.filter(|l| !l.is_empty())
{
let Ok(entry) = serde_json::from_str::<RolloutLine>(line) else {
continue;
};
if let RolloutItem::Compacted(compacted) = entry.item
&& compacted.message.is_empty()
&& compacted.replacement_history.as_ref() == Some(&compacted_history)
{
saw_compacted_history = true;
break;
}
}
assert!(
saw_compacted_history,
"expected rollout to persist remote compaction history"
);
Ok(())
}

View File

@@ -76,6 +76,14 @@ fn is_ghost_snapshot_message(item: &Value) -> bool {
.is_some_and(|text| text.trim_start().starts_with("<ghost_snapshot>"))
}
fn normalize_line_endings_str(text: &str) -> String {
if text.contains('\r') {
text.replace("\r\n", "\n").replace('\r', "\n")
} else {
text.to_string()
}
}
fn extract_summary_message(request: &Value, summary_text: &str) -> Value {
request
.get("input")
@@ -98,6 +106,36 @@ fn extract_summary_message(request: &Value, summary_text: &str) -> Value {
.unwrap_or_else(|| panic!("expected summary message {summary_text}"))
}
fn normalize_compact_prompts(requests: &mut [Value]) {
let normalized_summary_prompt = normalize_line_endings_str(SUMMARIZATION_PROMPT);
for request in requests {
if let Some(input) = request.get_mut("input").and_then(Value::as_array_mut) {
input.retain(|item| {
if item.get("type").and_then(Value::as_str) != Some("message")
|| item.get("role").and_then(Value::as_str) != Some("user")
{
return true;
}
let content = item
.get("content")
.and_then(Value::as_array)
.cloned()
.unwrap_or_default();
if let Some(first) = content.first() {
let text = first
.get("text")
.and_then(Value::as_str)
.unwrap_or_default();
let normalized_text = normalize_line_endings_str(text);
!(text.is_empty() || normalized_text == normalized_summary_prompt)
} else {
false
}
});
}
}
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
/// Scenario: compact an initial conversation, resume it, fork one turn back, and
/// ensure the model-visible history matches expectations at each request.
@@ -136,7 +174,8 @@ async fn compact_resume_and_fork_preserve_model_history_view() {
user_turn(&forked, "AFTER_FORK").await;
// 3. Capture the requests to the model and validate the history slices.
let requests = gather_request_bodies(&server).await;
let mut requests = gather_request_bodies(&server).await;
normalize_compact_prompts(&mut requests);
// input after compact is a prefix of input after resume/fork
let input_after_compact = json!(requests[requests.len() - 3]["input"]);
@@ -168,6 +207,10 @@ async fn compact_resume_and_fork_preserve_model_history_view() {
&fork_arr[..compact_arr.len()]
);
let expected_model = requests[0]["model"]
.as_str()
.unwrap_or_default()
.to_string();
let prompt = requests[0]["instructions"]
.as_str()
.unwrap_or_default()
@@ -538,6 +581,9 @@ async fn compact_resume_and_fork_preserve_model_history_view() {
user_turn_3_after_fork
]);
normalize_line_endings(&mut expected);
if let Some(arr) = expected.as_array_mut() {
normalize_compact_prompts(arr);
}
assert_eq!(requests.len(), 5);
assert_eq!(json!(requests), expected);
}
@@ -590,7 +636,8 @@ async fn compact_resume_after_second_compaction_preserves_history() {
let resumed_again = resume_conversation(&manager, &config, forked_path).await;
user_turn(&resumed_again, AFTER_SECOND_RESUME).await;
let requests = gather_request_bodies(&server).await;
let mut requests = gather_request_bodies(&server).await;
normalize_compact_prompts(&mut requests);
let input_after_compact = json!(requests[requests.len() - 2]["input"]);
let input_after_resume = json!(requests[requests.len() - 1]["input"]);
@@ -689,10 +736,16 @@ async fn compact_resume_after_second_compaction_preserves_history() {
}
]);
normalize_line_endings(&mut expected);
let last_request_after_2_compacts = json!([{
let mut last_request_after_2_compacts = json!([{
"instructions": requests[requests.len() -1]["instructions"],
"input": requests[requests.len() -1]["input"],
}]);
if let Some(arr) = expected.as_array_mut() {
normalize_compact_prompts(arr);
}
if let Some(arr) = last_request_after_2_compacts.as_array_mut() {
normalize_compact_prompts(arr);
}
assert_eq!(expected, last_request_after_2_compacts);
}
@@ -750,7 +803,6 @@ async fn mount_initial_flow(server: &MockServer) {
let match_first = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body.contains("\"text\":\"hello world\"")
&& !body_contains_text(body, SUMMARIZATION_PROMPT)
&& !body.contains(&format!("\"text\":\"{SUMMARY_TEXT}\""))
&& !body.contains("\"text\":\"AFTER_COMPACT\"")
&& !body.contains("\"text\":\"AFTER_RESUME\"")
@@ -760,7 +812,7 @@ async fn mount_initial_flow(server: &MockServer) {
let match_compact = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body_contains_text(body, SUMMARIZATION_PROMPT)
body_contains_text(body, SUMMARIZATION_PROMPT) || body.contains(&json_fragment(FIRST_REPLY))
};
mount_sse_once_match(server, match_compact, sse2).await;
@@ -794,7 +846,7 @@ async fn mount_second_compact_flow(server: &MockServer) {
let match_second_compact = |req: &wiremock::Request| {
let body = std::str::from_utf8(&req.body).unwrap_or("");
body_contains_text(body, SUMMARIZATION_PROMPT) && body.contains("AFTER_FORK")
body.contains("AFTER_FORK")
};
mount_sse_once_match(server, match_second_compact, sse6).await;

View File

@@ -1,7 +1,6 @@
#![allow(clippy::unwrap_used, clippy::expect_used)]
use anyhow::Result;
use codex_core::features::Feature;
use codex_core::protocol::AskForApproval;
use codex_core::protocol::EventMsg;
use codex_core::protocol::Op;
@@ -20,11 +19,16 @@ use core_test_support::wait_for_event;
use serde_json::json;
use std::fs;
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn execpolicy2_blocks_shell_invocation() -> Result<()> {
#[tokio::test]
async fn execpolicy_blocks_shell_invocation() -> Result<()> {
let mut builder = test_codex().with_config(|config| {
config.features.enable(Feature::ExecPolicyV2);
let policy_path = config.codex_home.join("policy.codexpolicy");
let policy_path = config.codex_home.join("policy").join("policy.codexpolicy");
fs::create_dir_all(
policy_path
.parent()
.expect("policy directory must have a parent"),
)
.expect("create policy directory");
fs::write(
&policy_path,
r#"prefix_rule(pattern=["echo"], decision="forbidden")"#,

View File

@@ -24,10 +24,11 @@ mod cli_stream;
mod client;
mod codex_delegate;
mod compact;
mod compact_remote;
mod compact_resume_fork;
mod deprecation_notice;
mod exec;
mod execpolicy2;
mod exec_policy;
mod fork_conversation;
mod grep_files;
mod items;

View File

@@ -119,7 +119,12 @@ async fn model_selects_expected_tools() {
assert_eq!(
gpt5_codex_tools,
vec![
"shell".to_string(),
if cfg!(windows) {
"shell_command"
} else {
"shell"
}
.to_string(),
"list_mcp_resources".to_string(),
"list_mcp_resource_templates".to_string(),
"read_mcp_resource".to_string(),
@@ -133,7 +138,12 @@ async fn model_selects_expected_tools() {
assert_eq!(
gpt51_codex_tools,
vec![
"shell".to_string(),
if cfg!(windows) {
"shell_command"
} else {
"shell"
}
.to_string(),
"list_mcp_resources".to_string(),
"list_mcp_resource_templates".to_string(),
"read_mcp_resource".to_string(),

View File

@@ -27,7 +27,6 @@ use core_test_support::skip_if_no_network;
use core_test_support::test_codex::test_codex;
use core_test_support::wait_for_event;
use escargot::CargoBuild;
use regex_lite::Regex;
use serde_json::Value;
use serde_json::json;
use std::collections::HashMap;
@@ -48,7 +47,7 @@ async fn truncate_function_error_trims_respond_to_model() -> Result<()> {
let test = builder.build(&server).await?;
// Construct a very long, non-existent path to force a RespondToModel error with a large message
let long_path = "a".repeat(20_000);
let long_path = "long path text should trigger truncation".repeat(8_000);
let call_id = "grep-huge-error";
let args = json!({
"pattern": "alpha",
@@ -80,12 +79,16 @@ async fn truncate_function_error_trims_respond_to_model() -> Result<()> {
tracing::debug!(output = %output, "truncated function error output");
// Expect plaintext with byte-truncation marker and no omitted-lines marker
// Expect plaintext with token-based truncation marker and no omitted-lines marker
assert!(
serde_json::from_str::<serde_json::Value>(&output).is_err(),
"expected error output to be plain text",
);
let truncated_pattern = r#"(?s)^Total output lines: 1\s+.*\[\.\.\. output truncated to fit 11264 bytes \.\.\.\]\s*$"#;
assert!(
!output.contains("Total output lines:"),
"error output should not include line-based truncation header: {output}",
);
let truncated_pattern = r"(?s)^unable to access `.*tokens truncated.*$";
assert_regex_match(truncated_pattern, &output);
assert!(
!output.contains("omitted"),
@@ -269,7 +272,7 @@ async fn mcp_tool_call_output_exceeds_limit_truncated_for_model() -> Result<()>
let tool_name = format!("mcp__{server_name}__echo");
// Build a very large message to exceed 10KiB once serialized.
let large_msg = "long-message-with-newlines-".repeat(600);
let large_msg = "long-message-with-newlines-".repeat(6000);
let args_json = serde_json::json!({ "message": large_msg });
mount_sse_once(
@@ -334,22 +337,19 @@ async fn mcp_tool_call_output_exceeds_limit_truncated_for_model() -> Result<()>
.function_call_output_text(call_id)
.context("function_call_output present for rmcp call")?;
// Expect plain text with byte-based truncation marker.
// Expect plain text with token-based truncation marker; the original JSON body
// is truncated in the middle of the echo string.
assert!(
serde_json::from_str::<Value>(&output).is_err(),
"expected truncated MCP output to be plain text"
);
assert!(
output.starts_with("Total output lines: 1\n\n{"),
"expected total line header and JSON head, got: {output}"
!output.contains("Total output lines:"),
"MCP output should not include line-based truncation header: {output}"
);
let byte_marker = Regex::new(r"\[\.\.\. output truncated to fit 11264 bytes \.\.\.\]")
.expect("compile regex");
assert!(
byte_marker.is_match(&output),
"expected byte truncation marker, got: {output}"
);
let truncated_pattern = r#"(?s)^\{"echo":\s*"ECHOING: long-message-with-newlines-.*tokens truncated.*long-message-with-newlines-.*$"#;
assert_regex_match(truncated_pattern, &output);
Ok(())
}
@@ -453,3 +453,164 @@ async fn mcp_image_output_preserves_image_and_no_text_summary() -> Result<()> {
Ok(())
}
// Token-based policy should report token counts even when truncation is byte-estimated.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn token_policy_marker_reports_tokens() -> Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_config(|config| {
config.model = "gpt-5.1-codex".to_string(); // token policy
config.model_family =
find_family_for_model("gpt-5.1-codex").expect("model family for gpt-5.1-codex");
config.tool_output_token_limit = Some(50); // small budget to force truncation
});
let fixture = builder.build(&server).await?;
let call_id = "shell-token-marker";
let args = json!({
"command": ["/bin/sh", "-c", "seq 1 150"],
"timeout_ms": 5_000,
});
mount_sse_once(
&server,
sse(vec![
ev_response_created("resp-1"),
ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
ev_completed("resp-1"),
]),
)
.await;
let done_mock = mount_sse_once(
&server,
sse(vec![
ev_assistant_message("msg-1", "done"),
ev_completed("resp-2"),
]),
)
.await;
fixture
.submit_turn_with_policy("run the shell tool", SandboxPolicy::DangerFullAccess)
.await?;
let output = done_mock
.single_request()
.function_call_output_text(call_id)
.context("shell output present")?;
assert_regex_match(r"\[\u{2026}127 tokens truncated\u{2026}]", &output);
Ok(())
}
// Byte-based policy should report bytes removed.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn byte_policy_marker_reports_bytes() -> Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_config(|config| {
config.model = "gpt-5.1".to_string(); // byte policy
config.model_family = find_family_for_model("gpt-5.1").expect("model family for gpt-5.1");
config.tool_output_token_limit = Some(50); // ~200 byte cap
});
let fixture = builder.build(&server).await?;
let call_id = "shell-byte-marker";
let args = json!({
"command": ["/bin/sh", "-c", "seq 1 150"],
"timeout_ms": 5_000,
});
mount_sse_once(
&server,
sse(vec![
ev_response_created("resp-1"),
ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
ev_completed("resp-1"),
]),
)
.await;
let done_mock = mount_sse_once(
&server,
sse(vec![
ev_assistant_message("msg-1", "done"),
ev_completed("resp-2"),
]),
)
.await;
fixture
.submit_turn_with_policy("run the shell tool", SandboxPolicy::DangerFullAccess)
.await?;
let output = done_mock
.single_request()
.function_call_output_text(call_id)
.context("shell output present")?;
assert_regex_match(r"\[\u{2026}505 bytes truncated\u{2026}]", &output);
Ok(())
}
// Overriding config with a large token budget should avoid truncation.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn large_budget_avoids_truncation() -> Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_config(|config| {
config.model = "gpt-5.1-codex".to_string();
config.model_family =
find_family_for_model("gpt-5.1-codex").expect("model family for gpt-5.1-codex");
config.tool_output_token_limit = Some(50_000); // ample budget
});
let fixture = builder.build(&server).await?;
let call_id = "shell-no-trunc";
let args = json!({
"command": ["/bin/sh", "-c", "seq 1 1000"],
"timeout_ms": 5_000,
});
mount_sse_once(
&server,
sse(vec![
ev_response_created("resp-1"),
ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
ev_completed("resp-1"),
]),
)
.await;
let done_mock = mount_sse_once(
&server,
sse(vec![
ev_assistant_message("msg-1", "done"),
ev_completed("resp-2"),
]),
)
.await;
fixture
.submit_turn_with_policy(
"run big output without truncation",
SandboxPolicy::DangerFullAccess,
)
.await?;
let output = done_mock
.single_request()
.function_call_output_text(call_id)
.context("shell output present")?;
assert!(
!output.contains("truncated"),
"output should remain untruncated with ample budget"
);
Ok(())
}

View File

@@ -26,9 +26,11 @@ use core_test_support::test_codex::TestCodex;
use core_test_support::test_codex::test_codex;
use core_test_support::wait_for_event;
use core_test_support::wait_for_event_match;
use core_test_support::wait_for_event_with_timeout;
use regex_lite::Regex;
use serde_json::Value;
use serde_json::json;
use tokio::time::Duration;
fn extract_output_text(item: &Value) -> Option<&str> {
item.get("output").and_then(|value| match value {
@@ -814,7 +816,7 @@ async fn exec_command_reports_chunk_and_exit_metadata() -> Result<()> {
let call_id = "uexec-metadata";
let args = serde_json::json!({
"cmd": "printf 'abcdefghijklmnopqrstuvwxyz'",
"cmd": "printf 'token one token two token three token four token five token six token seven'",
"yield_time_ms": 500,
"max_output_tokens": 6,
});
@@ -1295,7 +1297,7 @@ async fn unified_exec_streams_after_lagged_output() -> Result<()> {
import sys
import time
chunk = b'x' * (1 << 20)
chunk = b'long content here to trigger truncation' * (1 << 10)
for _ in range(4):
sys.stdout.buffer.write(chunk)
sys.stdout.flush()
@@ -1365,8 +1367,13 @@ PY
summary: ReasoningSummary::Auto,
})
.await?;
wait_for_event(&codex, |event| matches!(event, EventMsg::TaskComplete(_))).await;
// This is a worst case scenario for the truncate logic.
wait_for_event_with_timeout(
&codex,
|event| matches!(event, EventMsg::TaskComplete(_)),
Duration::from_secs(10),
)
.await;
let requests = server.received_requests().await.expect("recorded requests");
assert!(!requests.is_empty(), "expected at least one POST request");
@@ -1523,14 +1530,15 @@ async fn unified_exec_formats_large_output_summary() -> Result<()> {
} = builder.build(&server).await?;
let script = r#"python3 - <<'PY'
for i in range(300):
print(f"line-{i}")
for i in range(10000):
print("token token ")
PY
"#;
let call_id = "uexec-large-output";
let args = serde_json::json!({
"cmd": script,
"max_output_tokens": 100,
"yield_time_ms": 500,
});
@@ -1577,15 +1585,14 @@ PY
let outputs = collect_tool_outputs(&bodies)?;
let large_output = outputs.get(call_id).expect("missing large output summary");
assert_regex_match(
concat!(
r"(?s)",
r"line-0.*?",
r"\[\.{3} omitted \d+ of \d+ lines \.{3}\].*?",
r"line-299",
),
&large_output.output,
);
let output_text = large_output.output.replace("\r\n", "\n");
let truncated_pattern = r#"(?s)^(token token \n){5,}.*\[\u{2026}\d+ tokens truncated\u{2026}]\n(token token \n){5,}$"#;
assert_regex_match(truncated_pattern, &output_text);
let original_tokens = large_output
.original_token_count
.expect("missing original_token_count for large output summary");
assert!(original_tokens > 0);
Ok(())
}

View File

@@ -272,7 +272,7 @@ async fn user_shell_command_is_truncated_only_once() -> anyhow::Result<()> {
let mut builder = test_codex().with_config(|config| {
config.model = "gpt-5.1-codex".to_string();
config.model_family =
find_family_for_model("gpt-5.1-codex").expect("gpt-5.1-codex is a model family");
find_family_for_model("gpt-5-codex").expect("gpt-5-codex is a model family");
});
let fixture = builder.build(&server).await?;

View File

@@ -5,7 +5,7 @@
- This release covers only the prefix-rule subset of the planned execpolicy v2 language; a richer language will follow.
- Tokens are matched in order; any `pattern` element may be a list to denote alternatives. `decision` defaults to `allow`; valid values: `allow`, `prompt`, `forbidden`.
- `match` / `not_match` supply example invocations that are validated at load time (think of them as unit tests); examples can be token arrays or strings (strings are tokenized with `shlex`).
- The CLI always prints the JSON serialization of the evaluation result (whether a match or not).
- The CLI always prints the JSON serialization of the evaluation result.
## Policy shapes
- Prefix rules use Starlark syntax:
@@ -18,6 +18,24 @@ prefix_rule(
)
```
## CLI
- From the Codex CLI, run `codex execpolicycheck` with one or more policy files (for example `src/default.codexpolicy`) to check a command:
```bash
codex execpolicycheck --policy path/to/policy.codexpolicy git status
```
- Pass multiple `--policy` flags to merge rules, evaluated in the order provided:
```bash
codex execpolicycheck --policy base.codexpolicy --policy overrides.codexpolicy git status
```
- Output is JSON by default; pass `--pretty` for pretty-printed JSON
- You can also run the standalone dev binary directly during development:
```bash
cargo run -p codex-execpolicy2 -- check --policy path/to/policy.codexpolicy git status
```
- Example outcomes:
- Match: `{"match": { ... "decision": "allow" ... }}`
- No match: `{"noMatch": {}}`
## Response shapes
- Match:
```json
@@ -38,22 +56,8 @@ prefix_rule(
- No match:
```json
"noMatch"
{"noMatch": {}}
```
- `matchedRules` lists every rule whose prefix matched the command; `matchedPrefix` is the exact prefix that matched.
- The effective `decision` is the strictest severity across all matches (`forbidden` > `prompt` > `allow`).
## CLI
- Provide one or more policy files (for example `src/default.codexpolicy`) to check a command:
```bash
cargo run -p codex-execpolicy2 -- check --policy path/to/policy.codexpolicy git status
```
- Pass multiple `--policy` flags to merge rules, evaluated in the order provided:
```bash
cargo run -p codex-execpolicy2 -- check --policy base.codexpolicy --policy overrides.codexpolicy git status
```
- Output is newline-delimited JSON by default; pass `--pretty` for pretty-printed JSON if desired.
- Example outcomes:
- Match: `{"match": { ... "decision": "allow" ... }}`
- No match: `"noMatch"`

View File

@@ -0,0 +1,64 @@
use std::fs;
use std::path::PathBuf;
use anyhow::Context;
use anyhow::Result;
use clap::Parser;
use crate::Evaluation;
use crate::Policy;
use crate::PolicyParser;
/// Arguments for evaluating a command against one or more execpolicy files.
#[derive(Debug, Parser, Clone)]
pub struct ExecPolicyCheckCommand {
/// Paths to execpolicy files to evaluate (repeatable).
#[arg(short, long = "policy", value_name = "PATH", required = true)]
pub policies: Vec<PathBuf>,
/// Pretty-print the JSON output.
#[arg(long)]
pub pretty: bool,
/// Command tokens to check against the policy.
#[arg(
value_name = "COMMAND",
required = true,
trailing_var_arg = true,
allow_hyphen_values = true
)]
pub command: Vec<String>,
}
impl ExecPolicyCheckCommand {
/// Load the policies for this command, evaluate the command, and render JSON output.
pub fn run(&self) -> Result<String> {
let policy = load_policies(&self.policies)?;
let evaluation = policy.check(&self.command);
format_evaluation_json(&evaluation, self.pretty)
}
}
pub fn format_evaluation_json(evaluation: &Evaluation, pretty: bool) -> Result<String> {
if pretty {
serde_json::to_string_pretty(evaluation).map_err(Into::into)
} else {
serde_json::to_string(evaluation).map_err(Into::into)
}
}
pub fn load_policies(policy_paths: &[PathBuf]) -> Result<Policy> {
let mut parser = PolicyParser::new();
for policy_path in policy_paths {
let policy_file_contents = fs::read_to_string(policy_path)
.with_context(|| format!("failed to read policy at {}", policy_path.display()))?;
let policy_identifier = policy_path.to_string_lossy().to_string();
parser
.parse(&policy_identifier, &policy_file_contents)
.with_context(|| format!("failed to parse policy at {}", policy_path.display()))?;
}
Ok(parser.build())
}

View File

@@ -1,5 +1,6 @@
pub mod decision;
pub mod error;
pub mod execpolicycheck;
pub mod parser;
pub mod policy;
pub mod rule;
@@ -7,6 +8,7 @@ pub mod rule;
pub use decision::Decision;
pub use error::Error;
pub use error::Result;
pub use execpolicycheck::ExecPolicyCheckCommand;
pub use parser::PolicyParser;
pub use policy::Evaluation;
pub use policy::Policy;

View File

@@ -1,66 +1,24 @@
use std::fs;
use std::path::PathBuf;
use anyhow::Context;
use anyhow::Result;
use clap::Parser;
use codex_execpolicy2::PolicyParser;
use codex_execpolicy2::ExecPolicyCheckCommand;
/// CLI for evaluating exec policies
#[derive(Parser)]
#[command(name = "codex-execpolicy2")]
enum Cli {
/// Evaluate a command against a policy.
Check {
#[arg(short, long = "policy", value_name = "PATH", required = true)]
policies: Vec<PathBuf>,
/// Pretty-print the JSON output.
#[arg(long)]
pretty: bool,
/// Command tokens to check.
#[arg(
value_name = "COMMAND",
required = true,
trailing_var_arg = true,
allow_hyphen_values = true
)]
command: Vec<String>,
},
Check(ExecPolicyCheckCommand),
}
fn main() -> Result<()> {
let cli = Cli::parse();
match cli {
Cli::Check {
policies,
command,
pretty,
} => cmd_check(policies, command, pretty),
Cli::Check(cmd) => cmd_check(cmd),
}
}
fn cmd_check(policy_paths: Vec<PathBuf>, args: Vec<String>, pretty: bool) -> Result<()> {
let policy = load_policies(&policy_paths)?;
let eval = policy.check(&args);
let json = if pretty {
serde_json::to_string_pretty(&eval)?
} else {
serde_json::to_string(&eval)?
};
fn cmd_check(cmd: ExecPolicyCheckCommand) -> Result<()> {
let json = cmd.run()?;
println!("{json}");
Ok(())
}
fn load_policies(policy_paths: &[PathBuf]) -> Result<codex_execpolicy2::Policy> {
let mut parser = PolicyParser::new();
for policy_path in policy_paths {
let policy_file_contents = fs::read_to_string(policy_path)
.with_context(|| format!("failed to read policy at {}", policy_path.display()))?;
let policy_identifier = policy_path.to_string_lossy().to_string();
parser.parse(&policy_identifier, &policy_file_contents)?;
}
Ok(parser.build())
}

View File

@@ -15,6 +15,10 @@ impl Policy {
Self { rules_by_program }
}
pub fn empty() -> Self {
Self::new(MultiMap::new())
}
pub fn rules(&self) -> &MultiMap<String, RuleRef> {
&self.rules_by_program
}
@@ -23,9 +27,9 @@ impl Policy {
let rules = match cmd.first() {
Some(first) => match self.rules_by_program.get_vec(first) {
Some(rules) => rules,
None => return Evaluation::NoMatch,
None => return Evaluation::NoMatch {},
},
None => return Evaluation::NoMatch,
None => return Evaluation::NoMatch {},
};
let matched_rules: Vec<RuleMatch> =
@@ -35,7 +39,7 @@ impl Policy {
decision,
matched_rules,
},
None => Evaluation::NoMatch,
None => Evaluation::NoMatch {},
}
}
@@ -48,7 +52,7 @@ impl Policy {
.into_iter()
.flat_map(|command| match self.check(command.as_ref()) {
Evaluation::Match { matched_rules, .. } => matched_rules,
Evaluation::NoMatch => Vec::new(),
Evaluation::NoMatch { .. } => Vec::new(),
})
.collect();
@@ -57,7 +61,7 @@ impl Policy {
decision,
matched_rules,
},
None => Evaluation::NoMatch,
None => Evaluation::NoMatch {},
}
}
}
@@ -65,7 +69,7 @@ impl Policy {
#[derive(Clone, Debug, Eq, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub enum Evaluation {
NoMatch,
NoMatch {},
Match {
decision: Decision,
#[serde(rename = "matchedRules")]

View File

@@ -10,6 +10,7 @@ use codex_execpolicy2::rule::PatternToken;
use codex_execpolicy2::rule::PrefixPattern;
use codex_execpolicy2::rule::PrefixRule;
use pretty_assertions::assert_eq;
use serde_json::json;
fn tokens(cmd: &[&str]) -> Vec<String> {
cmd.iter().map(std::string::ToString::to_string).collect()
@@ -60,6 +61,14 @@ prefix_rule(
);
}
#[test]
fn serializes_no_match_as_object() {
let serialized =
serde_json::to_value(&Evaluation::NoMatch {}).expect("should serialize evaluation");
assert_eq!(json!({"noMatch": {}}), serialized);
}
#[test]
fn parses_multiple_policy_files() {
let first_policy = r#"
@@ -288,7 +297,7 @@ prefix_rule(
"color.status=always",
"status",
]));
assert_eq!(Evaluation::NoMatch, no_match_eval);
assert_eq!(Evaluation::NoMatch {}, no_match_eval);
}
#[test]

View File

@@ -132,6 +132,9 @@ pub enum ResponseItem {
GhostSnapshot {
ghost_commit: GhostCommit,
},
CompactionSummary {
encrypted_content: String,
},
#[serde(other)]
Other,
}
@@ -379,7 +382,7 @@ impl Serialize for FunctionCallOutputPayload {
where
S: Serializer,
{
tracing::error!("Payload: {:?}", self);
tracing::debug!("Function call output payload: {:?}", self);
if let Some(items) = &self.content_items {
items.serialize(serializer)
} else {

View File

@@ -1126,6 +1126,8 @@ pub enum RolloutItem {
#[derive(Serialize, Deserialize, Clone, Debug, JsonSchema, TS)]
pub struct CompactedItem {
pub message: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub replacement_history: Option<Vec<ResponseItem>>,
}
impl From<CompactedItem> for ResponseItem {

View File

@@ -50,6 +50,7 @@ use tokio::sync::Mutex;
use crate::find_codex_home::find_codex_home;
const KEYRING_SERVICE: &str = "Codex MCP Credentials";
const REFRESH_SKEW_MILLIS: u64 = 30_000;
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct StoredOAuthTokens {
@@ -57,6 +58,8 @@ pub struct StoredOAuthTokens {
pub url: String,
pub client_id: String,
pub token_response: WrappedOAuthTokenResponse,
#[serde(default)]
pub expires_at: Option<u64>,
}
/// Determine where Codex should store and read MCP credentials.
@@ -113,6 +116,22 @@ pub(crate) fn has_oauth_tokens(
Ok(load_oauth_tokens(server_name, url, store_mode)?.is_some())
}
fn refresh_expires_in_from_timestamp(tokens: &mut StoredOAuthTokens) {
let Some(expires_at) = tokens.expires_at else {
return;
};
match expires_in_from_timestamp(expires_at) {
Some(seconds) => {
let duration = Duration::from_secs(seconds);
tokens.token_response.0.set_expires_in(Some(&duration));
}
None => {
tokens.token_response.0.set_expires_in(None);
}
}
}
fn load_oauth_tokens_from_keyring_with_fallback_to_file<K: KeyringStore>(
keyring_store: &K,
server_name: &str,
@@ -137,8 +156,9 @@ fn load_oauth_tokens_from_keyring<K: KeyringStore>(
let key = compute_store_key(server_name, url)?;
match keyring_store.load(KEYRING_SERVICE, &key) {
Ok(Some(serialized)) => {
let tokens: StoredOAuthTokens = serde_json::from_str(&serialized)
let mut tokens: StoredOAuthTokens = serde_json::from_str(&serialized)
.context("failed to deserialize OAuth tokens from keyring")?;
refresh_expires_in_from_timestamp(&mut tokens);
Ok(Some(tokens))
}
Ok(None) => Ok(None),
@@ -286,13 +306,24 @@ impl OAuthPersistor {
match maybe_credentials {
Some(credentials) => {
let mut last_credentials = self.inner.last_credentials.lock().await;
let new_token_response = WrappedOAuthTokenResponse(credentials.clone());
let same_token = last_credentials
.as_ref()
.map(|prev| prev.token_response == new_token_response)
.unwrap_or(false);
let expires_at = if same_token {
last_credentials.as_ref().and_then(|prev| prev.expires_at)
} else {
compute_expires_at_millis(&credentials)
};
let stored = StoredOAuthTokens {
server_name: self.inner.server_name.clone(),
url: self.inner.url.clone(),
client_id,
token_response: WrappedOAuthTokenResponse(credentials.clone()),
token_response: new_token_response,
expires_at,
};
let mut last_credentials = self.inner.last_credentials.lock().await;
if last_credentials.as_ref() != Some(&stored) {
save_oauth_tokens(&self.inner.server_name, &stored, self.inner.store_mode)?;
*last_credentials = Some(stored);
@@ -317,6 +348,30 @@ impl OAuthPersistor {
Ok(())
}
pub(crate) async fn refresh_if_needed(&self) -> Result<()> {
let expires_at = {
let guard = self.inner.last_credentials.lock().await;
guard.as_ref().and_then(|tokens| tokens.expires_at)
};
if !token_needs_refresh(expires_at) {
return Ok(());
}
{
let manager = self.inner.authorization_manager.clone();
let guard = manager.lock().await;
guard.refresh_token().await.with_context(|| {
format!(
"failed to refresh OAuth tokens for server {}",
self.inner.server_name
)
})?;
}
self.persist_if_needed().await
}
}
const FALLBACK_FILENAME: &str = ".credentials.json";
@@ -366,19 +421,14 @@ fn load_oauth_tokens_from_file(server_name: &str, url: &str) -> Result<Option<St
token_response.set_scopes(Some(scopes.into_iter().map(Scope::new).collect()));
}
if let Some(expires_at) = entry.expires_at
&& let Some(seconds) = expires_in_from_timestamp(expires_at)
{
let duration = Duration::from_secs(seconds);
token_response.set_expires_in(Some(&duration));
}
let stored = StoredOAuthTokens {
let mut stored = StoredOAuthTokens {
server_name: entry.server_name.clone(),
url: entry.server_url.clone(),
client_id: entry.client_id.clone(),
token_response: WrappedOAuthTokenResponse(token_response),
expires_at: entry.expires_at,
};
refresh_expires_in_from_timestamp(&mut stored);
return Ok(Some(stored));
}
@@ -391,6 +441,9 @@ fn save_oauth_tokens_to_file(tokens: &StoredOAuthTokens) -> Result<()> {
let mut store = read_fallback_file()?.unwrap_or_default();
let token_response = &tokens.token_response.0;
let expires_at = tokens
.expires_at
.or_else(|| compute_expires_at_millis(token_response));
let refresh_token = token_response
.refresh_token()
.map(|token| token.secret().to_string());
@@ -403,7 +456,7 @@ fn save_oauth_tokens_to_file(tokens: &StoredOAuthTokens) -> Result<()> {
server_url: tokens.url.clone(),
client_id: tokens.client_id.clone(),
access_token: token_response.access_token().secret().to_string(),
expires_at: compute_expires_at_millis(token_response),
expires_at,
refresh_token,
scopes,
};
@@ -427,7 +480,7 @@ fn delete_oauth_tokens_from_file(key: &str) -> Result<bool> {
Ok(removed)
}
fn compute_expires_at_millis(response: &OAuthTokenResponse) -> Option<u64> {
pub(crate) fn compute_expires_at_millis(response: &OAuthTokenResponse) -> Option<u64> {
let expires_in = response.expires_in()?;
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
@@ -454,6 +507,19 @@ fn expires_in_from_timestamp(expires_at: u64) -> Option<u64> {
}
}
fn token_needs_refresh(expires_at: Option<u64>) -> bool {
let Some(expires_at) = expires_at else {
return false;
};
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or_else(|_| Duration::from_secs(0))
.as_millis() as u64;
now.saturating_add(REFRESH_SKEW_MILLIS) >= expires_at
}
fn compute_store_key(server_name: &str, server_url: &str) -> Result<String> {
let mut payload = JsonMap::new();
payload.insert(
@@ -589,8 +655,9 @@ mod tests {
store.save(KEYRING_SERVICE, &key, &serialized)?;
let loaded =
super::load_oauth_tokens_from_keyring(&store, &tokens.server_name, &tokens.url)?;
assert_eq!(loaded, Some(expected));
super::load_oauth_tokens_from_keyring(&store, &tokens.server_name, &tokens.url)?
.expect("tokens should load from keyring");
assert_tokens_match_without_expiry(&loaded, &expected);
Ok(())
}
@@ -750,6 +817,43 @@ mod tests {
Ok(())
}
#[test]
fn refresh_expires_in_from_timestamp_restores_future_durations() {
let mut tokens = sample_tokens();
let expires_at = tokens.expires_at.expect("expires_at should be set");
tokens.token_response.0.set_expires_in(None);
super::refresh_expires_in_from_timestamp(&mut tokens);
let actual = tokens
.token_response
.0
.expires_in()
.expect("expires_in should be restored")
.as_secs();
let expected = super::expires_in_from_timestamp(expires_at)
.expect("expires_at should still be in the future");
let diff = actual.abs_diff(expected);
assert!(diff <= 1, "expires_in drift too large: diff={diff}");
}
#[test]
fn refresh_expires_in_from_timestamp_clears_expired_tokens() {
let mut tokens = sample_tokens();
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or_else(|_| Duration::from_secs(0));
let expired_at = now.as_millis() as u64;
tokens.expires_at = Some(expired_at.saturating_sub(1000));
let duration = Duration::from_secs(600);
tokens.token_response.0.set_expires_in(Some(&duration));
super::refresh_expires_in_from_timestamp(&mut tokens);
assert!(tokens.token_response.0.expires_in().is_none());
}
fn assert_tokens_match_without_expiry(
actual: &StoredOAuthTokens,
expected: &StoredOAuthTokens,
@@ -757,6 +861,7 @@ mod tests {
assert_eq!(actual.server_name, expected.server_name);
assert_eq!(actual.url, expected.url);
assert_eq!(actual.client_id, expected.client_id);
assert_eq!(actual.expires_at, expected.expires_at);
assert_token_response_match_without_expiry(
&actual.token_response,
&expected.token_response,
@@ -803,12 +908,14 @@ mod tests {
]));
let expires_in = Duration::from_secs(3600);
response.set_expires_in(Some(&expires_in));
let expires_at = super::compute_expires_at_millis(&response);
StoredOAuthTokens {
server_name: "test-server".to_string(),
url: "https://example.test".to_string(),
client_id: "client-id".to_string(),
token_response: WrappedOAuthTokenResponse(response),
expires_at,
}
}
}

View File

@@ -17,6 +17,7 @@ use urlencoding::decode;
use crate::OAuthCredentialsStoreMode;
use crate::StoredOAuthTokens;
use crate::WrappedOAuthTokenResponse;
use crate::oauth::compute_expires_at_millis;
use crate::save_oauth_tokens;
use crate::utils::apply_default_headers;
use crate::utils::build_default_headers;
@@ -91,11 +92,13 @@ pub async fn perform_oauth_login(
let credentials =
credentials_opt.ok_or_else(|| anyhow!("OAuth provider did not return credentials"))?;
let expires_at = compute_expires_at_millis(&credentials);
let stored = StoredOAuthTokens {
server_name: server_name.to_string(),
url: server_url.to_string(),
client_id,
token_response: WrappedOAuthTokenResponse(credentials),
expires_at,
};
save_oauth_tokens(server_name, &stored, store_mode)?;

View File

@@ -267,6 +267,7 @@ impl RmcpClient {
params: Option<ListToolsRequestParams>,
timeout: Option<Duration>,
) -> Result<ListToolsResult> {
self.refresh_oauth_if_needed().await;
let service = self.service().await?;
let rmcp_params = params
.map(convert_to_rmcp::<_, PaginatedRequestParam>)
@@ -284,6 +285,7 @@ impl RmcpClient {
params: Option<ListResourcesRequestParams>,
timeout: Option<Duration>,
) -> Result<ListResourcesResult> {
self.refresh_oauth_if_needed().await;
let service = self.service().await?;
let rmcp_params = params
.map(convert_to_rmcp::<_, PaginatedRequestParam>)
@@ -301,6 +303,7 @@ impl RmcpClient {
params: Option<ListResourceTemplatesRequestParams>,
timeout: Option<Duration>,
) -> Result<ListResourceTemplatesResult> {
self.refresh_oauth_if_needed().await;
let service = self.service().await?;
let rmcp_params = params
.map(convert_to_rmcp::<_, PaginatedRequestParam>)
@@ -318,6 +321,7 @@ impl RmcpClient {
params: ReadResourceRequestParams,
timeout: Option<Duration>,
) -> Result<ReadResourceResult> {
self.refresh_oauth_if_needed().await;
let service = self.service().await?;
let rmcp_params: ReadResourceRequestParam = convert_to_rmcp(params)?;
let fut = service.read_resource(rmcp_params);
@@ -333,6 +337,7 @@ impl RmcpClient {
arguments: Option<serde_json::Value>,
timeout: Option<Duration>,
) -> Result<CallToolResult> {
self.refresh_oauth_if_needed().await;
let service = self.service().await?;
let params = CallToolRequestParams { arguments, name };
let rmcp_params: CallToolRequestParam = convert_to_rmcp(params)?;
@@ -371,6 +376,14 @@ impl RmcpClient {
warn!("failed to persist OAuth tokens: {error}");
}
}
async fn refresh_oauth_if_needed(&self) {
if let Some(runtime) = self.oauth_persistor().await
&& let Err(error) = runtime.refresh_if_needed().await
{
warn!("failed to refresh OAuth tokens: {error}");
}
}
}
async fn create_oauth_transport_and_runtime(

View File

@@ -23,8 +23,12 @@ use codex_core::AuthManager;
use codex_core::ConversationManager;
use codex_core::config::Config;
use codex_core::config::edit::ConfigEditsBuilder;
#[cfg(target_os = "windows")]
use codex_core::features::Feature;
use codex_core::model_family::find_family_for_model;
use codex_core::protocol::FinalOutput;
#[cfg(target_os = "windows")]
use codex_core::protocol::Op;
use codex_core::protocol::SessionSource;
use codex_core::protocol::TokenUsage;
use codex_core::protocol_config_types::ReasoningEffort as ReasoningEffortConfig;
@@ -220,7 +224,7 @@ impl App {
let enhanced_keys_supported = tui.enhanced_keys_supported();
let chat_widget = match resume_selection {
let mut chat_widget = match resume_selection {
ResumeSelection::StartFresh | ResumeSelection::Exit => {
let init = crate::chatwidget::ChatWidgetInit {
config: config.clone(),
@@ -263,6 +267,8 @@ impl App {
}
};
chat_widget.maybe_prompt_windows_sandbox_enable();
let file_search = FileSearchManager::new(config.cwd.clone(), app_event_tx.clone());
#[cfg(not(debug_assertions))]
let upgrade_version = crate::updates::get_upgrade_version(&config);
@@ -537,8 +543,71 @@ impl App {
AppEvent::OpenFeedbackConsent { category } => {
self.chat_widget.open_feedback_consent(category);
}
AppEvent::ShowWindowsAutoModeInstructions => {
self.chat_widget.open_windows_auto_mode_instructions();
AppEvent::OpenWindowsSandboxEnablePrompt { preset } => {
self.chat_widget.open_windows_sandbox_enable_prompt(preset);
}
AppEvent::EnableWindowsSandboxForAuto { preset } => {
#[cfg(target_os = "windows")]
{
let profile = self.active_profile.as_deref();
let feature_key = Feature::WindowsSandbox.key();
match ConfigEditsBuilder::new(&self.config.codex_home)
.with_profile(profile)
.set_feature_enabled(feature_key, true)
.apply()
.await
{
Ok(()) => {
self.config.set_windows_sandbox_globally(true);
self.chat_widget.clear_forced_auto_mode_downgrade();
if let Some((sample_paths, extra_count, failed_scan)) =
self.chat_widget.world_writable_warning_details()
{
self.app_event_tx.send(
AppEvent::OpenWorldWritableWarningConfirmation {
preset: Some(preset.clone()),
sample_paths,
extra_count,
failed_scan,
},
);
} else {
self.app_event_tx.send(AppEvent::CodexOp(
Op::OverrideTurnContext {
cwd: None,
approval_policy: Some(preset.approval),
sandbox_policy: Some(preset.sandbox.clone()),
model: None,
effort: None,
summary: None,
},
));
self.app_event_tx
.send(AppEvent::UpdateAskForApprovalPolicy(preset.approval));
self.app_event_tx
.send(AppEvent::UpdateSandboxPolicy(preset.sandbox.clone()));
self.chat_widget.add_info_message(
"Enabled the Windows sandbox feature and switched to Auto mode."
.to_string(),
None,
);
}
}
Err(err) => {
tracing::error!(
error = %err,
"failed to enable Windows sandbox feature"
);
self.chat_widget.add_error_message(format!(
"Failed to enable the Windows sandbox feature: {err}"
));
}
}
}
#[cfg(not(target_os = "windows"))]
{
let _ = preset;
}
}
AppEvent::PersistModelSelection { model, effort } => {
let profile = self.active_profile.as_deref();
@@ -593,6 +662,13 @@ impl App {
| codex_core::protocol::SandboxPolicy::ReadOnly
);
self.config.sandbox_policy = policy.clone();
#[cfg(target_os = "windows")]
if !matches!(policy, codex_core::protocol::SandboxPolicy::ReadOnly)
|| codex_core::get_platform_sandbox().is_some()
{
self.config.forced_auto_mode_downgraded_on_windows = false;
}
self.chat_widget.set_sandbox_policy(policy);
// If sandbox policy becomes workspace-write or read-only, run the Windows world-writable scan.
@@ -868,7 +944,6 @@ mod tests {
fn make_test_app() -> App {
let (chat_widget, app_event_tx, _rx, _op_rx) = make_chatwidget_manual_with_sender();
let config = chat_widget.config_ref().clone();
let server = Arc::new(ConversationManager::with_auth(CodexAuth::from_api_key(
"Test API Key",
)));

View File

@@ -91,9 +91,17 @@ pub(crate) enum AppEvent {
failed_scan: bool,
},
/// Show Windows Subsystem for Linux setup instructions for auto mode.
/// Prompt to enable the Windows sandbox feature before using Auto mode.
#[cfg_attr(not(target_os = "windows"), allow(dead_code))]
ShowWindowsAutoModeInstructions,
OpenWindowsSandboxEnablePrompt {
preset: ApprovalPreset,
},
/// Enable the Windows sandbox feature and switch to Auto mode.
#[cfg_attr(not(target_os = "windows"), allow(dead_code))]
EnableWindowsSandboxForAuto {
preset: ApprovalPreset,
},
/// Update the current approval policy in the running app and widget.
UpdateAskForApprovalPolicy(AskForApproval),

View File

@@ -95,8 +95,6 @@ use crate::history_cell::HistoryCell;
use crate::history_cell::McpToolCallCell;
use crate::history_cell::PlainHistoryCell;
use crate::markdown::append_markdown;
#[cfg(target_os = "windows")]
use crate::onboarding::WSL_INSTRUCTIONS;
use crate::render::Insets;
use crate::render::renderable::ColumnRenderable;
use crate::render::renderable::FlexRenderable;
@@ -2172,40 +2170,16 @@ impl ChatWidget {
let mut items: Vec<SelectionItem> = Vec::new();
let presets: Vec<ApprovalPreset> = builtin_approval_presets();
#[cfg(target_os = "windows")]
let header_renderable: Box<dyn Renderable> = if self
.config
.forced_auto_mode_downgraded_on_windows
{
use ratatui_macros::line;
let mut header = ColumnRenderable::new();
header.push(line![
"Codex forced your settings back to Read Only on this Windows machine.".bold()
]);
header.push(line![
"To re-enable Auto mode, run Codex inside Windows Subsystem for Linux (WSL) or enable Full Access manually.".dim()
]);
Box::new(header)
} else {
Box::new(())
};
let forced_windows_read_only = self.config.forced_auto_mode_downgraded_on_windows
&& codex_core::get_platform_sandbox().is_none();
#[cfg(not(target_os = "windows"))]
let header_renderable: Box<dyn Renderable> = Box::new(());
let forced_windows_read_only = false;
for preset in presets.into_iter() {
let is_current =
current_approval == preset.approval && current_sandbox == preset.sandbox;
let name = preset.label.to_string();
let description_text = preset.description;
let description = if cfg!(target_os = "windows")
&& preset.id == "auto"
&& codex_core::get_platform_sandbox().is_none()
{
Some(format!(
"{description_text}\nRequires Windows Subsystem for Linux (WSL). Show installation instructions..."
))
} else {
Some(description_text.to_string())
};
let description = Some(description_text.to_string());
let requires_confirmation = preset.id == "full-access"
&& !self
.config
@@ -2223,53 +2197,16 @@ impl ChatWidget {
#[cfg(target_os = "windows")]
{
if codex_core::get_platform_sandbox().is_none() {
vec![Box::new(|tx| {
tx.send(AppEvent::ShowWindowsAutoModeInstructions);
let preset_clone = preset.clone();
vec![Box::new(move |tx| {
tx.send(AppEvent::OpenWindowsSandboxEnablePrompt {
preset: preset_clone.clone(),
});
})]
} else if !self
.config
.notices
.hide_world_writable_warning
.unwrap_or(false)
&& self.windows_world_writable_flagged()
} else if let Some((sample_paths, extra_count, failed_scan)) =
self.world_writable_warning_details()
{
let preset_clone = preset.clone();
// Compute sample paths for the warning popup.
let mut env_map: std::collections::HashMap<String, String> =
std::collections::HashMap::new();
for (k, v) in std::env::vars() {
env_map.insert(k, v);
}
let (sample_paths, extra_count, failed_scan) =
match codex_windows_sandbox::preflight_audit_everyone_writable(
&self.config.cwd,
&env_map,
Some(self.config.codex_home.as_path()),
) {
Ok(paths) if !paths.is_empty() => {
fn normalize_windows_path_for_display(
p: &std::path::Path,
) -> String {
let canon = dunce::canonicalize(p)
.unwrap_or_else(|_| p.to_path_buf());
canon.display().to_string().replace('/', "\\")
}
let as_strings: Vec<String> = paths
.iter()
.map(|p| normalize_windows_path_for_display(p))
.collect();
let samples: Vec<String> =
as_strings.iter().take(3).cloned().collect();
let extra = if as_strings.len() > samples.len() {
as_strings.len() - samples.len()
} else {
0
};
(samples, extra, false)
}
Err(_) => (Vec::new(), 0, true),
_ => (Vec::new(), 0, false),
};
vec![Box::new(move |tx| {
tx.send(AppEvent::OpenWorldWritableWarningConfirmation {
preset: Some(preset_clone.clone()),
@@ -2300,10 +2237,17 @@ impl ChatWidget {
}
self.bottom_pane.show_selection_view(SelectionViewParams {
title: Some("Select Approval Mode".to_string()),
title: Some(
if forced_windows_read_only {
"Select approval mode (Codex changed your permissions to Read Only because the Windows sandbox is off)"
.to_string()
} else {
"Select Approval Mode".to_string()
},
),
footer_hint: Some(standard_popup_hint_line()),
items,
header: header_renderable,
header: Box::new(()),
..Default::default()
});
}
@@ -2328,20 +2272,22 @@ impl ChatWidget {
}
#[cfg(target_os = "windows")]
fn windows_world_writable_flagged(&self) -> bool {
use std::collections::HashMap;
let mut env_map: HashMap<String, String> = HashMap::new();
for (k, v) in std::env::vars() {
env_map.insert(k, v);
}
match codex_windows_sandbox::preflight_audit_everyone_writable(
&self.config.cwd,
&env_map,
Some(self.config.codex_home.as_path()),
) {
Ok(paths) => !paths.is_empty(),
Err(_) => true,
pub(crate) fn world_writable_warning_details(&self) -> Option<(Vec<String>, usize, bool)> {
if self
.config
.notices
.hide_world_writable_warning
.unwrap_or(false)
{
return None;
}
codex_windows_sandbox::world_writable_warning_details(self.config.codex_home.as_path())
}
#[cfg(not(target_os = "windows"))]
#[allow(dead_code)]
pub(crate) fn world_writable_warning_details(&self) -> Option<(Vec<String>, usize, bool)> {
None
}
pub(crate) fn open_full_access_confirmation(&mut self, preset: ApprovalPreset) {
@@ -2426,7 +2372,6 @@ impl ChatWidget {
SandboxPolicy::ReadOnly => "Read-Only mode",
_ => "Auto mode",
};
let title_line = Line::from("Unprotected directories found").bold();
let info_line = if failed_scan {
Line::from(vec![
"We couldn't complete the world-writable scan, so protections cannot be verified. "
@@ -2443,7 +2388,6 @@ impl ChatWidget {
.fg(Color::Red),
])
};
header_children.push(Box::new(title_line));
header_children.push(Box::new(
Paragraph::new(vec![info_line]).wrap(Wrap { trim: false }),
));
@@ -2452,8 +2396,9 @@ impl ChatWidget {
// Show up to three examples and optionally an "and X more" line.
let mut lines: Vec<Line> = Vec::new();
lines.push(Line::from("Examples:").bold());
lines.push(Line::from(""));
for p in &sample_paths {
lines.push(Line::from(format!(" - {p}")));
lines.push(Line::from(format!(" - {p}")));
}
if extra_count > 0 {
lines.push(Line::from(format!("and {extra_count} more")));
@@ -2521,21 +2466,33 @@ impl ChatWidget {
}
#[cfg(target_os = "windows")]
pub(crate) fn open_windows_auto_mode_instructions(&mut self) {
pub(crate) fn open_windows_sandbox_enable_prompt(&mut self, preset: ApprovalPreset) {
use ratatui_macros::line;
let mut header = ColumnRenderable::new();
header.push(line![
"Auto mode requires Windows Subsystem for Linux (WSL2).".bold()
"Auto mode requires the experimental Windows sandbox.".bold(),
" Turn it on to enable sandboxed commands on Windows."
]);
header.push(line!["Run Codex inside WSL to enable sandboxed commands."]);
header.push(line![""]);
header.push(Paragraph::new(WSL_INSTRUCTIONS).wrap(Wrap { trim: false }));
let preset_clone = preset;
let items = vec![SelectionItem {
name: "Back".to_string(),
name: "Turn on Windows sandbox and use Auto mode".to_string(),
description: Some(
"Return to the approval mode list. Auto mode stays disabled outside WSL."
"Adds enable_experimental_windows_sandbox = true to config.toml and switches to Auto mode."
.to_string(),
),
actions: vec![Box::new(move |tx| {
tx.send(AppEvent::EnableWindowsSandboxForAuto {
preset: preset_clone.clone(),
});
})],
dismiss_on_select: true,
..Default::default()
}, SelectionItem {
name: "Go Back".to_string(),
description: Some(
"Stay on read-only or full access without enabling the sandbox feature."
.to_string(),
),
actions: vec![Box::new(|tx| {
@@ -2555,7 +2512,31 @@ impl ChatWidget {
}
#[cfg(not(target_os = "windows"))]
pub(crate) fn open_windows_auto_mode_instructions(&mut self) {}
pub(crate) fn open_windows_sandbox_enable_prompt(&mut self, _preset: ApprovalPreset) {}
#[cfg(target_os = "windows")]
pub(crate) fn maybe_prompt_windows_sandbox_enable(&mut self) {
if self.config.forced_auto_mode_downgraded_on_windows
&& codex_core::get_platform_sandbox().is_none()
&& let Some(preset) = builtin_approval_presets()
.into_iter()
.find(|preset| preset.id == "auto")
{
self.open_windows_sandbox_enable_prompt(preset);
}
}
#[cfg(not(target_os = "windows"))]
pub(crate) fn maybe_prompt_windows_sandbox_enable(&mut self) {}
#[cfg(target_os = "windows")]
pub(crate) fn clear_forced_auto_mode_downgrade(&mut self) {
self.config.forced_auto_mode_downgraded_on_windows = false;
}
#[cfg(not(target_os = "windows"))]
#[allow(dead_code)]
pub(crate) fn clear_forced_auto_mode_downgrade(&mut self) {}
/// Set the approval policy in the widget's config copy.
pub(crate) fn set_approval_policy(&mut self, policy: AskForApproval) {
@@ -2564,7 +2545,16 @@ impl ChatWidget {
/// Set the sandbox policy in the widget's config copy.
pub(crate) fn set_sandbox_policy(&mut self, policy: SandboxPolicy) {
#[cfg(target_os = "windows")]
let should_clear_downgrade = !matches!(policy, SandboxPolicy::ReadOnly)
|| codex_core::get_platform_sandbox().is_some();
self.config.sandbox_policy = policy;
#[cfg(target_os = "windows")]
if should_clear_downgrade {
self.config.forced_auto_mode_downgraded_on_windows = false;
}
}
pub(crate) fn set_full_access_warning_acknowledged(&mut self, acknowledged: bool) {

View File

@@ -4,14 +4,10 @@ expression: popup
---
Select Approval Mode
1. Read Only (current) Codex can read files and answer questions. Codex
requires approval to make edits, run commands, or
access network.
2. Auto Codex can read files, make edits, and run commands
in the workspace. Codex requires approval to work
outside the workspace or access network.
3. Full Access Codex can read files, make edits, and run commands
with network access, without approval. Exercise
caution.
1. Read Only (current) Requires approval to edit files and run commands.
2. Agent Read and edit files, and run commands.
3. Agent (full access) Codex can edit files outside this workspace and run
commands with network access. Exercise caution when
using.
Press enter to confirm or esc to go back

View File

@@ -4,16 +4,10 @@ expression: popup
---
Select Approval Mode
1. Read Only (current) Codex can read files and answer questions. Codex
requires approval to make edits, run commands, or
access network.
2. Auto Codex can read files, make edits, and run commands
in the workspace. Codex requires approval to work
outside the workspace or access network.
Requires Windows Subsystem for Linux (WSL). Show
installation instructions...
3. Full Access Codex can read files, make edits, and run commands
with network access, without approval. Exercise
caution.
1. Read Only (current) Requires approval to edit files and run commands.
2. Agent Read and edit files, and run commands.
3. Agent (full access) Codex can edit files outside this workspace and run
commands with network access. Exercise caution when
using.
Press enter to confirm or esc to go back

View File

@@ -58,6 +58,11 @@ use tempfile::tempdir;
use tokio::sync::mpsc::error::TryRecvError;
use tokio::sync::mpsc::unbounded_channel;
#[cfg(target_os = "windows")]
fn set_windows_sandbox_enabled(enabled: bool) {
codex_core::set_windows_sandbox_enabled(enabled);
}
fn test_config() -> Config {
// Use base defaults to avoid depending on host state.
Config::load_from_base_config_with_overrides(
@@ -1456,28 +1461,6 @@ fn approvals_selection_popup_snapshot() {
assert_snapshot!("approvals_selection_popup", popup);
}
#[test]
fn approvals_popup_includes_wsl_note_for_auto_mode() {
let (mut chat, _rx, _op_rx) = make_chatwidget_manual();
if cfg!(target_os = "windows") {
chat.config.forced_auto_mode_downgraded_on_windows = true;
}
chat.open_approvals_popup();
let popup = render_bottom_popup(&chat, 80);
assert_eq!(
popup.contains("Requires Windows Subsystem for Linux (WSL)"),
cfg!(target_os = "windows"),
"expected auto preset description to mention WSL requirement only on Windows, popup: {popup}"
);
assert_eq!(
popup.contains("Codex forced your settings back to Read Only on this Windows machine."),
cfg!(target_os = "windows") && chat.config.forced_auto_mode_downgraded_on_windows,
"expected downgrade notice only when auto mode is forced off on Windows, popup: {popup}"
);
}
#[test]
fn full_access_confirmation_popup_snapshot() {
let (mut chat, _rx, _op_rx) = make_chatwidget_manual();
@@ -1494,18 +1477,41 @@ fn full_access_confirmation_popup_snapshot() {
#[cfg(target_os = "windows")]
#[test]
fn windows_auto_mode_instructions_popup_lists_install_steps() {
fn windows_auto_mode_prompt_requests_enabling_sandbox_feature() {
let (mut chat, _rx, _op_rx) = make_chatwidget_manual();
chat.open_windows_auto_mode_instructions();
let preset = builtin_approval_presets()
.into_iter()
.find(|preset| preset.id == "auto")
.expect("auto preset");
chat.open_windows_sandbox_enable_prompt(preset);
let popup = render_bottom_popup(&chat, 120);
assert!(
popup.contains("wsl --install"),
"expected WSL instructions popup to include install command, popup: {popup}"
popup.contains("experimental Windows sandbox"),
"expected auto mode prompt to mention enabling the sandbox feature, popup: {popup}"
);
}
#[cfg(target_os = "windows")]
#[test]
fn startup_prompts_for_windows_sandbox_when_auto_requested() {
let (mut chat, _rx, _op_rx) = make_chatwidget_manual();
set_windows_sandbox_enabled(false);
chat.config.forced_auto_mode_downgraded_on_windows = true;
chat.maybe_prompt_windows_sandbox_enable();
let popup = render_bottom_popup(&chat, 120);
assert!(
popup.contains("Turn on Windows sandbox and use Auto mode"),
"expected startup prompt to offer enabling the sandbox: {popup}"
);
set_windows_sandbox_enabled(true);
}
#[test]
fn model_reasoning_selection_popup_snapshot() {
let (mut chat, _rx, _op_rx) = make_chatwidget_manual();

View File

@@ -1,7 +1,7 @@
use std::path::Path;
use std::path::PathBuf;
use codex_core::bash::extract_bash_command;
use codex_core::parse_command::extract_shell_command;
use dirs::home_dir;
use shlex::try_join;
@@ -10,7 +10,7 @@ pub(crate) fn escape_command(command: &[String]) -> String {
}
pub(crate) fn strip_bash_lc_and_escape(command: &[String]) -> String {
if let Some((_, script)) = extract_bash_command(command) {
if let Some((_, script)) = extract_shell_command(command) {
return script.to_string();
}
escape_command(command)

View File

@@ -86,7 +86,7 @@ mod wrapping;
#[cfg(test)]
pub mod test_backend;
use crate::onboarding::WSL_INSTRUCTIONS;
use crate::onboarding::TrustDirectorySelection;
use crate::onboarding::onboarding_screen::OnboardingScreenArgs;
use crate::onboarding::onboarding_screen::run_onboarding_app;
use crate::tui::Tui;
@@ -389,20 +389,13 @@ async fn run_ratatui_app(
);
let login_status = get_login_status(&initial_config);
let should_show_trust_screen = should_show_trust_screen(&initial_config);
let should_show_windows_wsl_screen =
cfg!(target_os = "windows") && !initial_config.windows_wsl_setup_acknowledged;
let should_show_onboarding = should_show_onboarding(
login_status,
&initial_config,
should_show_trust_screen,
should_show_windows_wsl_screen,
);
let should_show_onboarding =
should_show_onboarding(login_status, &initial_config, should_show_trust_screen);
let config = if should_show_onboarding {
let onboarding_result = run_onboarding_app(
OnboardingScreenArgs {
show_login_screen: should_show_login_screen(login_status, &initial_config),
show_windows_wsl_screen: should_show_windows_wsl_screen,
show_trust_screen: should_show_trust_screen,
login_status,
auth_manager: auth_manager.clone(),
@@ -421,21 +414,12 @@ async fn run_ratatui_app(
update_action: None,
});
}
if onboarding_result.windows_install_selected {
restore();
session_log::log_session_end();
let _ = tui.terminal.clear();
if let Err(err) = writeln!(std::io::stdout(), "{WSL_INSTRUCTIONS}") {
tracing::error!("Failed to write WSL instructions: {err}");
}
return Ok(AppExitInfo {
token_usage: codex_core::protocol::TokenUsage::default(),
conversation_id: None,
update_action: None,
});
}
// if the user acknowledged windows or made any trust decision, reload the config accordingly
if should_show_windows_wsl_screen || onboarding_result.directory_trust_decision.is_some() {
// if the user acknowledged windows or made an explicit decision ato trust the directory, reload the config accordingly
if onboarding_result
.directory_trust_decision
.map(|d| d == TrustDirectorySelection::Trust)
.unwrap_or(false)
{
load_config_or_exit(cli_kv_overrides, overrides).await
} else {
initial_config
@@ -584,7 +568,7 @@ async fn load_config_or_exit(
/// show the trust screen.
fn should_show_trust_screen(config: &Config) -> bool {
if cfg!(target_os = "windows") && get_platform_sandbox().is_none() {
// If the experimental sandbox is not enabled, Native Windows cannot enforce sandboxed write access without WSL; skip the trust prompt entirely.
// If the experimental sandbox is not enabled, Native Windows cannot enforce sandboxed write access; skip the trust prompt entirely.
return false;
}
if config.did_user_set_custom_approval_policy_or_sandbox_mode {
@@ -599,12 +583,7 @@ fn should_show_onboarding(
login_status: LoginStatus,
config: &Config,
show_trust_screen: bool,
show_windows_wsl_screen: bool,
) -> bool {
if show_windows_wsl_screen {
return true;
}
if show_trust_screen {
return true;
}
@@ -628,7 +607,6 @@ mod tests {
use codex_core::config::ConfigOverrides;
use codex_core::config::ConfigToml;
use codex_core::config::ProjectConfig;
use codex_core::set_windows_sandbox_enabled;
use serial_test::serial;
use tempfile::TempDir;
@@ -643,7 +621,7 @@ mod tests {
)?;
config.did_user_set_custom_approval_policy_or_sandbox_mode = false;
config.active_project = ProjectConfig { trust_level: None };
set_windows_sandbox_enabled(false);
config.set_windows_sandbox_globally(false);
let should_show = should_show_trust_screen(&config);
if cfg!(target_os = "windows") {
@@ -670,7 +648,7 @@ mod tests {
)?;
config.did_user_set_custom_approval_policy_or_sandbox_mode = false;
config.active_project = ProjectConfig { trust_level: None };
set_windows_sandbox_enabled(true);
config.set_windows_sandbox_globally(true);
let should_show = should_show_trust_screen(&config);
if cfg!(target_os = "windows") {

View File

@@ -3,6 +3,3 @@ pub mod onboarding_screen;
mod trust_directory;
pub use trust_directory::TrustDirectorySelection;
mod welcome;
mod windows;
pub(crate) use windows::WSL_INSTRUCTIONS;

View File

@@ -20,7 +20,6 @@ use crate::onboarding::auth::SignInState;
use crate::onboarding::trust_directory::TrustDirectorySelection;
use crate::onboarding::trust_directory::TrustDirectoryWidget;
use crate::onboarding::welcome::WelcomeWidget;
use crate::onboarding::windows::WindowsSetupWidget;
use crate::tui::FrameRequester;
use crate::tui::Tui;
use crate::tui::TuiEvent;
@@ -30,7 +29,6 @@ use std::sync::RwLock;
#[allow(clippy::large_enum_variant)]
enum Step {
Windows(WindowsSetupWidget),
Welcome(WelcomeWidget),
Auth(AuthModeWidget),
TrustDirectory(TrustDirectoryWidget),
@@ -56,12 +54,10 @@ pub(crate) struct OnboardingScreen {
request_frame: FrameRequester,
steps: Vec<Step>,
is_done: bool,
windows_install_selected: bool,
should_exit: bool,
}
pub(crate) struct OnboardingScreenArgs {
pub show_windows_wsl_screen: bool,
pub show_trust_screen: bool,
pub show_login_screen: bool,
pub login_status: LoginStatus,
@@ -71,14 +67,12 @@ pub(crate) struct OnboardingScreenArgs {
pub(crate) struct OnboardingResult {
pub directory_trust_decision: Option<TrustDirectorySelection>,
pub windows_install_selected: bool,
pub should_exit: bool,
}
impl OnboardingScreen {
pub(crate) fn new(tui: &mut Tui, args: OnboardingScreenArgs) -> Self {
let OnboardingScreenArgs {
show_windows_wsl_screen,
show_trust_screen,
show_login_screen,
login_status,
@@ -91,9 +85,6 @@ impl OnboardingScreen {
let codex_home = config.codex_home;
let cli_auth_credentials_store_mode = config.cli_auth_credentials_store_mode;
let mut steps: Vec<Step> = Vec::new();
if show_windows_wsl_screen {
steps.push(Step::Windows(WindowsSetupWidget::new(codex_home.clone())));
}
steps.push(Step::Welcome(WelcomeWidget::new(
!matches!(login_status, LoginStatus::NotAuthenticated),
tui.frame_requester(),
@@ -138,7 +129,6 @@ impl OnboardingScreen {
request_frame: tui.frame_requester(),
steps,
is_done: false,
windows_install_selected: false,
should_exit: false,
}
}
@@ -200,10 +190,6 @@ impl OnboardingScreen {
.flatten()
}
pub fn windows_install_selected(&self) -> bool {
self.windows_install_selected
}
pub fn should_exit(&self) -> bool {
self.should_exit
}
@@ -249,14 +235,6 @@ impl KeyboardHandler for OnboardingScreen {
}
}
};
if self
.steps
.iter()
.any(|step| matches!(step, Step::Windows(widget) if widget.exit_requested()))
{
self.windows_install_selected = true;
self.is_done = true;
}
self.request_frame.schedule_frame();
}
@@ -338,7 +316,6 @@ impl WidgetRef for &OnboardingScreen {
impl KeyboardHandler for Step {
fn handle_key_event(&mut self, key_event: KeyEvent) {
match self {
Step::Windows(widget) => widget.handle_key_event(key_event),
Step::Welcome(widget) => widget.handle_key_event(key_event),
Step::Auth(widget) => widget.handle_key_event(key_event),
Step::TrustDirectory(widget) => widget.handle_key_event(key_event),
@@ -347,7 +324,6 @@ impl KeyboardHandler for Step {
fn handle_paste(&mut self, pasted: String) {
match self {
Step::Windows(_) => {}
Step::Welcome(_) => {}
Step::Auth(widget) => widget.handle_paste(pasted),
Step::TrustDirectory(widget) => widget.handle_paste(pasted),
@@ -358,7 +334,6 @@ impl KeyboardHandler for Step {
impl StepStateProvider for Step {
fn get_step_state(&self) -> StepState {
match self {
Step::Windows(w) => w.get_step_state(),
Step::Welcome(w) => w.get_step_state(),
Step::Auth(w) => w.get_step_state(),
Step::TrustDirectory(w) => w.get_step_state(),
@@ -369,9 +344,6 @@ impl StepStateProvider for Step {
impl WidgetRef for Step {
fn render_ref(&self, area: Rect, buf: &mut Buffer) {
match self {
Step::Windows(widget) => {
widget.render_ref(area, buf);
}
Step::Welcome(widget) => {
widget.render_ref(area, buf);
}
@@ -451,7 +423,6 @@ pub(crate) async fn run_onboarding_app(
}
Ok(OnboardingResult {
directory_trust_decision: onboarding_screen.directory_trust_decision(),
windows_install_selected: onboarding_screen.windows_install_selected(),
should_exit: onboarding_screen.should_exit(),
})
}

View File

@@ -1,205 +0,0 @@
use std::path::PathBuf;
use codex_core::config::edit::ConfigEditsBuilder;
use crossterm::event::KeyCode;
use crossterm::event::KeyEvent;
use crossterm::event::KeyEventKind;
use ratatui::buffer::Buffer;
use ratatui::layout::Rect;
use ratatui::prelude::Widget;
use ratatui::style::Color;
use ratatui::style::Stylize;
use ratatui::text::Line;
use ratatui::widgets::Paragraph;
use ratatui::widgets::WidgetRef;
use ratatui::widgets::Wrap;
use crate::onboarding::onboarding_screen::KeyboardHandler;
use crate::onboarding::onboarding_screen::StepStateProvider;
use super::onboarding_screen::StepState;
pub(crate) const WSL_INSTRUCTIONS: &str = r#"Install WSL2 by opening PowerShell as Administrator and running:
# Install WSL using the default Linux distribution (Ubuntu).
# See https://learn.microsoft.com/en-us/windows/wsl/install for more info
wsl --install
# Restart your computer, then start a shell inside of Windows Subsystem for Linux
wsl
# Install Node.js in WSL via nvm
# Documentation: https://learn.microsoft.com/en-us/windows/dev-environment/javascript/nodejs-on-wsl
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/master/install.sh | bash && export NVM_DIR="$HOME/.nvm" && \. "$NVM_DIR/nvm.sh"
nvm install 22
# Install and run Codex in WSL
npm install --global @openai/codex
codex
# Additional details and instructions for how to install and run Codex in WSL:
https://developers.openai.com/codex/windows"#;
pub(crate) struct WindowsSetupWidget {
pub codex_home: PathBuf,
pub selection: Option<WindowsSetupSelection>,
pub highlighted: WindowsSetupSelection,
pub error: Option<String>,
exit_requested: bool,
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum WindowsSetupSelection {
Continue,
Install,
}
impl WindowsSetupWidget {
pub fn new(codex_home: PathBuf) -> Self {
Self {
codex_home,
selection: None,
highlighted: WindowsSetupSelection::Install,
error: None,
exit_requested: false,
}
}
fn handle_continue(&mut self) {
self.highlighted = WindowsSetupSelection::Continue;
match ConfigEditsBuilder::new(&self.codex_home)
.set_windows_wsl_setup_acknowledged(true)
.apply_blocking()
{
Ok(()) => {
self.selection = Some(WindowsSetupSelection::Continue);
self.exit_requested = false;
self.error = None;
}
Err(err) => {
tracing::error!("Failed to persist Windows onboarding acknowledgement: {err:?}");
self.error = Some(format!("Failed to update config: {err}"));
self.selection = None;
}
}
}
fn handle_install(&mut self) {
self.highlighted = WindowsSetupSelection::Install;
self.selection = Some(WindowsSetupSelection::Install);
self.exit_requested = true;
}
pub fn exit_requested(&self) -> bool {
self.exit_requested
}
}
impl WidgetRef for &WindowsSetupWidget {
fn render_ref(&self, area: Rect, buf: &mut Buffer) {
let mut lines: Vec<Line> = vec![
Line::from(vec![
"> ".into(),
"To use all Codex features, we recommend running Codex in Windows Subsystem for Linux (WSL2)".bold(),
]),
Line::from(vec![" ".into(), "WSL allows Codex to run Agent mode in a sandboxed environment with better data protections in place.".into()]),
Line::from(vec![" ".into(), "Learn more: https://developers.openai.com/codex/windows".into()]),
Line::from(""),
];
let create_option =
|idx: usize, option: WindowsSetupSelection, text: &str| -> Line<'static> {
if self.highlighted == option {
Line::from(format!("> {}. {text}", idx + 1)).cyan()
} else {
Line::from(format!(" {}. {}", idx + 1, text))
}
};
lines.push(create_option(
0,
WindowsSetupSelection::Install,
"Exit and install WSL2",
));
lines.push(create_option(
1,
WindowsSetupSelection::Continue,
"Continue anyway",
));
lines.push("".into());
if let Some(error) = &self.error {
lines.push(Line::from(format!(" {error}")).fg(Color::Red));
lines.push("".into());
}
lines.push(Line::from(vec![" Press Enter to continue".dim()]));
Paragraph::new(lines)
.wrap(Wrap { trim: false })
.render(area, buf);
}
}
impl KeyboardHandler for WindowsSetupWidget {
fn handle_key_event(&mut self, key_event: KeyEvent) {
if key_event.kind == KeyEventKind::Release {
return;
}
match key_event.code {
KeyCode::Up | KeyCode::Char('k') => {
self.highlighted = WindowsSetupSelection::Install;
}
KeyCode::Down | KeyCode::Char('j') => {
self.highlighted = WindowsSetupSelection::Continue;
}
KeyCode::Char('1') => self.handle_install(),
KeyCode::Char('2') => self.handle_continue(),
KeyCode::Enter => match self.highlighted {
WindowsSetupSelection::Install => self.handle_install(),
WindowsSetupSelection::Continue => self.handle_continue(),
},
_ => {}
}
}
}
impl StepStateProvider for WindowsSetupWidget {
fn get_step_state(&self) -> StepState {
match self.selection {
Some(WindowsSetupSelection::Continue) => StepState::Hidden,
Some(WindowsSetupSelection::Install) => StepState::Complete,
None => StepState::InProgress,
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;
#[test]
fn windows_step_hidden_after_continue() {
let temp_dir = TempDir::new().expect("temp dir");
let mut widget = WindowsSetupWidget::new(temp_dir.path().to_path_buf());
assert_eq!(widget.get_step_state(), StepState::InProgress);
widget.handle_continue();
assert_eq!(widget.get_step_state(), StepState::Hidden);
assert!(!widget.exit_requested());
}
#[test]
fn windows_step_complete_after_install_selection() {
let temp_dir = TempDir::new().expect("temp dir");
let mut widget = WindowsSetupWidget::new(temp_dir.path().to_path_buf());
widget.handle_install();
assert_eq!(widget.get_step_state(), StepState::Complete);
assert!(widget.exit_requested());
}
}

View File

@@ -0,0 +1,144 @@
use std::ffi::OsString;
use std::path::Path;
use crate::GitToolingError;
use crate::operations::ensure_git_repository;
use crate::operations::resolve_head;
use crate::operations::resolve_repository_root;
use crate::operations::run_git_for_stdout;
/// Returns the merge-base commit between `HEAD` and the provided branch, if both exist.
///
/// The function mirrors `git merge-base HEAD <branch>` but returns `Ok(None)` when
/// the repository has no `HEAD` yet or when the branch cannot be resolved.
pub fn merge_base_with_head(
repo_path: &Path,
branch: &str,
) -> Result<Option<String>, GitToolingError> {
ensure_git_repository(repo_path)?;
let repo_root = resolve_repository_root(repo_path)?;
let head = match resolve_head(repo_root.as_path())? {
Some(head) => head,
None => return Ok(None),
};
let branch_ref = match run_git_for_stdout(
repo_root.as_path(),
vec![
OsString::from("rev-parse"),
OsString::from("--verify"),
OsString::from(branch),
],
None,
) {
Ok(rev) => rev,
Err(GitToolingError::GitCommand { .. }) => return Ok(None),
Err(other) => return Err(other),
};
let merge_base = run_git_for_stdout(
repo_root.as_path(),
vec![
OsString::from("merge-base"),
OsString::from(head),
OsString::from(branch_ref),
],
None,
)?;
Ok(Some(merge_base))
}
#[cfg(test)]
mod tests {
use super::merge_base_with_head;
use crate::GitToolingError;
use pretty_assertions::assert_eq;
use std::path::Path;
use std::process::Command;
use tempfile::tempdir;
fn run_git_in(repo_path: &Path, args: &[&str]) {
let status = Command::new("git")
.current_dir(repo_path)
.args(args)
.status()
.expect("git command");
assert!(status.success(), "git command failed: {args:?}");
}
fn run_git_stdout(repo_path: &Path, args: &[&str]) -> String {
let output = Command::new("git")
.current_dir(repo_path)
.args(args)
.output()
.expect("git command");
assert!(output.status.success(), "git command failed: {args:?}");
String::from_utf8_lossy(&output.stdout).trim().to_string()
}
fn init_test_repo(repo_path: &Path) {
run_git_in(repo_path, &["init", "--initial-branch=main"]);
run_git_in(repo_path, &["config", "core.autocrlf", "false"]);
}
fn commit(repo_path: &Path, message: &str) {
run_git_in(
repo_path,
&[
"-c",
"user.name=Tester",
"-c",
"user.email=test@example.com",
"commit",
"-m",
message,
],
);
}
#[test]
fn merge_base_returns_shared_commit() -> Result<(), GitToolingError> {
let temp = tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::write(repo.join("base.txt"), "base\n")?;
run_git_in(repo, &["add", "base.txt"]);
commit(repo, "base commit");
run_git_in(repo, &["checkout", "-b", "feature"]);
std::fs::write(repo.join("feature.txt"), "feature change\n")?;
run_git_in(repo, &["add", "feature.txt"]);
commit(repo, "feature commit");
run_git_in(repo, &["checkout", "main"]);
std::fs::write(repo.join("main.txt"), "main change\n")?;
run_git_in(repo, &["add", "main.txt"]);
commit(repo, "main commit");
run_git_in(repo, &["checkout", "feature"]);
let expected = run_git_stdout(repo, &["merge-base", "HEAD", "main"]);
let merge_base = merge_base_with_head(repo, "main")?;
assert_eq!(merge_base, Some(expected));
Ok(())
}
#[test]
fn merge_base_returns_none_when_branch_missing() -> Result<(), GitToolingError> {
let temp = tempdir()?;
let repo = temp.path();
init_test_repo(repo);
std::fs::write(repo.join("tracked.txt"), "tracked\n")?;
run_git_in(repo, &["add", "tracked.txt"]);
commit(repo, "initial");
let merge_base = merge_base_with_head(repo, "missing-branch")?;
assert_eq!(merge_base, None);
Ok(())
}
}

View File

@@ -2,6 +2,7 @@ use std::fmt;
use std::path::PathBuf;
mod apply;
mod branch;
mod errors;
mod ghost_commits;
mod operations;
@@ -13,6 +14,7 @@ pub use apply::apply_git_patch;
pub use apply::extract_paths_from_patch;
pub use apply::parse_git_apply_output;
pub use apply::stage_paths;
pub use branch::merge_base_with_head;
pub use errors::GitToolingError;
pub use ghost_commits::CreateGhostCommitOptions;
pub use ghost_commits::create_ghost_commit;

View File

@@ -122,6 +122,11 @@ impl Tokenizer {
}
}
impl fmt::Debug for Tokenizer {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "Tokenizer {{ inner: <CoreBPE> }}")
}
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -1,6 +1,7 @@
use crate::token::world_sid;
use crate::winutil::to_wide;
use anyhow::Result;
use std::collections::HashMap;
use std::collections::HashSet;
use std::ffi::c_void;
use std::path::Path;
@@ -275,6 +276,35 @@ pub fn audit_everyone_writable(
);
Ok(Vec::new())
}
fn normalize_windows_path_for_display(p: impl AsRef<Path>) -> String {
let canon = dunce::canonicalize(p.as_ref()).unwrap_or_else(|_| p.as_ref().to_path_buf());
canon.display().to_string().replace('/', "\\")
}
pub fn world_writable_warning_details(
codex_home: impl AsRef<Path>,
) -> Option<(Vec<String>, usize, bool)> {
let cwd = match std::env::current_dir() {
Ok(cwd) => cwd,
Err(_) => return Some((Vec::new(), 0, true)),
};
let env_map: HashMap<String, String> = std::env::vars().collect();
match audit_everyone_writable(&cwd, &env_map, Some(codex_home.as_ref())) {
Ok(paths) if paths.is_empty() => None,
Ok(paths) => {
let as_strings: Vec<String> = paths
.iter()
.map(normalize_windows_path_for_display)
.collect();
let sample_paths: Vec<String> = as_strings.iter().take(3).cloned().collect();
let extra_count = as_strings.len().saturating_sub(sample_paths.len());
Some((sample_paths, extra_count, false))
}
Err(_) => Some((Vec::new(), 0, true)),
}
}
// Fast mask-based check: does the DACL contain any ACCESS_ALLOWED ACE for
// Everyone that includes generic or specific write bits? Skips inherit-only
// ACEs (do not apply to the current object).

View File

@@ -6,6 +6,8 @@ macro_rules! windows_modules {
windows_modules!(acl, allow, audit, cap, env, logging, policy, token, winutil);
#[cfg(target_os = "windows")]
pub use audit::world_writable_warning_details;
#[cfg(target_os = "windows")]
pub use windows_impl::preflight_audit_everyone_writable;
#[cfg(target_os = "windows")]
@@ -18,6 +20,8 @@ pub use stub::preflight_audit_everyone_writable;
#[cfg(not(target_os = "windows"))]
pub use stub::run_windows_sandbox_capture;
#[cfg(not(target_os = "windows"))]
pub use stub::world_writable_warning_details;
#[cfg(not(target_os = "windows"))]
pub use stub::CaptureResult;
#[cfg(target_os = "windows")]
@@ -455,4 +459,10 @@ mod stub {
) -> Result<CaptureResult> {
bail!("Windows sandbox is only available on Windows")
}
pub fn world_writable_warning_details(
_codex_home: impl AsRef<Path>,
) -> Option<(Vec<String>, usize, bool)> {
None
}
}

View File

@@ -925,6 +925,7 @@ Valid values:
| `model_provider` | string | Provider id from `model_providers` (default: `openai`). |
| `model_context_window` | number | Context window tokens. |
| `model_max_output_tokens` | number | Max output tokens. |
| `tool_output_token_limit` | number | Token budget for stored function/tool outputs in history (default: 2,560 tokens). |
| `approval_policy` | `untrusted` \| `on-failure` \| `on-request` \| `never` | When to prompt for approval. |
| `sandbox_mode` | `read-only` \| `workspace-write` \| `danger-full-access` | OS sandbox policy. |
| `sandbox_workspace_write.writable_roots` | array<string> | Extra writable roots in workspacewrite. |

View File

@@ -33,6 +33,7 @@ model_provider = "openai"
# model_context_window = 128000 # tokens; default: auto for model
# model_max_output_tokens = 8192 # tokens; default: auto for model
# model_auto_compact_token_limit = 0 # disable/override auto; default: model family specific
# tool_output_token_limit = 10000 # tokens stored per tool output; default: 10000 for gpt-5.1-codex
################################################################################
# Reasoning & Verbosity (Responses API capable models)

View File

@@ -8,7 +8,7 @@ In 2021, OpenAI released Codex, an AI system designed to generate code from natu
### Which models are supported?
We recommend using Codex with GPT-5 Codex, our best coding model. The default reasoning level is medium, and you can upgrade to high for complex tasks with the `/model` command.
We recommend using Codex with GPT-5.1 Codex, our best coding model. The default reasoning level is medium, and you can upgrade to high for complex tasks with the `/model` command.
You can also use older models by using API-based auth and launching codex with the `--model` flag.