Compare commits

...

5 Commits

Author SHA1 Message Date
jimmyfraiture
63222a45b5 One more step 2025-10-08 13:31:35 +01:00
jimmyfraiture
cf880c77d6 Fix 1 2025-10-08 11:51:32 +01:00
jimmyfraiture
39bec6b245 Merge remote-tracking branch 'origin/main' into jif/subsession-1
# Conflicts:
#	codex-rs/core/src/model_family.rs
#	codex-rs/core/src/tools/spec.rs
2025-10-08 11:35:02 +01:00
jimmyfraiture
ad8e6db038 Add test 2025-10-06 16:39:47 +01:00
jimmyfraiture
4967cfbebd V1 2025-10-06 15:53:38 +01:00
18 changed files with 1529 additions and 9 deletions

View File

@@ -6,6 +6,15 @@ You are Codex, based on GPT-5. You are running as a coding agent in the Codex CL
- Always set the `workdir` param when using the shell function. Do not use `cd` unless absolutely necessary.
- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
## Subsessions
- Favor the subsession tools when a focused task would otherwise cost significant context in the main conversation. This is especially useful when you need to read a lot of text/logs for a very temporary step in your main task.
- Spawn a subsession to run or repair tests, handle lint fixes, or perform targeted investigations required before moving further (e.g., "find where X is defined") and wait for its summary before continuing.
- When delegating, state the goal clearly so the subsession can act autonomously, and return to the main session once the requested work is complete.
- EVERY test/fix loop must be run within a subsession. Even small once
- Before running any long command (fmt/fix/test) or reading large files, open a subsession dedicated to that task, wait for its summary, and only then continue in the main thread. The goal of the main thread is to stay a light orchestrator.
- Example workflow: call `create_session` with a clear instruction such as "Run cargo test -p codex-core and fix all failures," monitor progress with `wait_session`, it will return subsessions summary and keep progressing from there. If the subsession seems to be blocked, you can stop it with `cancel_session`.
## Editing constraints
- Default to ASCII when editing or creating files. Only introduce non-ASCII or other Unicode characters when there is a clear justification and the file already uses them.

185
codex-rs/core/session.md Normal file
View File

@@ -0,0 +1,185 @@
# SubSessions: Spawn and Await Child Conversations
This document proposes a first design to let a model spawn a new conversation (a new “session”) from within an existing `codex-core` session, and later await its completion to retrieve the final assistant message.
## Goals
- Give the model two new capabilities via tools:
- `create_session(session_type, prompt) -> { session_id }`
- `wait_session(session_id, timeout_ms) -> { result }`
- The spawned child session runs independently and returns ONLY its last assistant message as the result.
- Allow the caller to customize the childs developer prompt, model, and tools.
- Keep parent and child conversations isolated in history, state, and rollout.
- Make it easy to add specialized child session profiles (e.g., linter fixer, math solver).
## Non-goals (initial version):
- Cross-session message streaming to the parent while the child is running.
- Bidirectional piping of tool output between sessions.
- Multi-turn orchestration inside the child beyond the standard Codex task loop.
## User-Facing Interface (tools)
- `create_session`
- Inputs:
- `session_type: SessionType` — an enum string (phase 1 presets below). In phase 1, this maps to a fixed developer prompt + model; tools are inherited from the parents normal configuration.
- `prompt: String` — initial user message for the child.
- Output: `{ session_id: String }` — a UUID v4 formatted string. Internally this will be the child `ConversationId`.
- `wait_session`
- Inputs:
- `session_id: String`
- `timeout_ms: i32` — total time to wait; `<= 0` means “do not wait”.
- Output: `{ result: String }` — last assistant message of the child session. If the child failed or timed out, return a tool error.
- `cancel_session`
- Inputs:
- `session_id: String`
- Output: `{ cancelled: boolean }` — true if the child was still running and is now cancelled. Errors if unknown `session_id`.
Notes:
- These are exposed as function tools so the model can orchestrate subworkflows.
- `create_session` returns immediately after queuing the child run.
- `wait_session` may block the current turn until completion or timeout.
## HighLevel Flow
1. Parent model calls `create_session(...)` during a turn.
2. Core spawns a new Codex conversation using a derived `Config`:
- Phase 1 mapping from `session_type` enum to a profile with:
- `base_instructions` override set from the profiles developer prompt.
- `model` set from the profiles configured model.
- Tools: in phase 1, inherit the parents normal tool configuration unchanged.
- Inherit `cwd`, `sandbox_policy`, and telemetry from parent.
- Approval policy: in phase 1, child sessions run without interactive approvals (see “Approvals & Safety (Phase 1)” below).
3. Core starts the child by submitting a single `UserInput` (or `UserTurn`) with `prompt`.
4. A background driver consumes the childs events until `TaskComplete`, capturing `last_agent_message`.
5. `create_session` returns `{ session_id }` to the parent turn.
6. Later, the parent calls `wait_session(session_id, timeout_ms)` to obtain the `{ result }` string (the childs final assistant message), or a timeout/error.
7. The parent may cancel a running child with `cancel_session(session_id)`.
## Architecture
### New Runtime Service: SubsessionManager
Add a persession orchestrator responsible for child conversations:
- Stored on `SessionServices` as `subsessions: SubsessionManager`.
- Holds `HashMap<ConversationId, ChildState>` guarded by `tokio::sync::Mutex`.
- Spawns and monitors child conversations; exposes:
- `spawn_child(config, prompt) -> ConversationId`
- `wait_child(id, timeout) -> Result<String, ChildError>`
- `cancel_child(id) -> bool`
- `abort_all_children()` for cleanup on parent drop/interrupt.
Child state lifecycle:
- `Pending { handle }` — background task running; `handle` joins to finish.
- `Done { result: Option<String> }` — captured last assistant message; `None` if no assistant message was produced.
- `Failed { error }` — terminal error captured as string.
### Using Existing Building Blocks
- Conversation creation: reuse `ConversationManager::new_conversation(config)` internally or a lighter inline `Codex::spawn`, then wrap in `CodexConversation`.
- Child run loop: submit a `UserInput`/`UserTurn` with the provided `prompt` and consume events until `EventMsg::TaskComplete(TaskCompleteEvent)`; use the embedded `last_agent_message` as the result. Fall back to scanning the final turns `ResponseItem`s if needed.
- Rollout: each child conversation records its own rollout file via its own `RolloutRecorder`. For origin, see “SessionSource for SubSessions” below.
- Parent observability: use `EventMsg::BackgroundEvent` to optionally notify the parent UI when a child is created/completed.
### Session types (Phase 1)
Represented as an enum string from the model, mapped serverside in `codex-core` to profiles:
- `tester` — a strict, concise testwriting assistant.
- `mathematician` — a reasoningoptimized assistant focused on math problems.
- `linter_fixer` — an assistant focused on fixing lint issues.
- `default` — fallback; mirrors parents model and uses a small generic task prompt.
Each profile supplies:
- `developer_instructions: String` — appended as base instructions override.
- `model_name: String` — full model id to use in the child.
Tools are inherited from the parent in phase 1; we will add perprofile tool curation in phase 2.
### Tools and Configuration Mapping (Phase 2 future)
We will add optional perprofile tool surface selection by mapping a profiles `tools` to a `ToolsConfig` subset (mirroring `core/src/tools/spec.rs`). MCP tool allowlists would also be supported.
## Error Handling and Timeouts
- `create_session` errors if the child fails to spawn; otherwise always returns a `session_id`.
- `wait_session`:
- If `timeout_ms <= 0`, behave as a nonblocking check: return an error if not completed; otherwise return the result.
- If timed out, return a tool error like: `"session {id} did not complete within {timeout_ms}ms"`.
- If the child task fails, return the captured error string.
## Isolation, Security, and Policies
- The child inherits the parents `cwd` and `sandbox_policy`. Shell execution remains sandboxed as in the parent.
- The childs tool surface is restricted to `SessionType.tools`.
- No history is shared; the child starts with initial context built from its own `ConfigureSession` only (environment context and developer instructions).
### Approvals & Safety (Phase 1)
- Child sessions do not request interactive approvals in phase 1.
- Implementation: force the childs `approval_policy` to a noninteractive mode (e.g., equivalent of “never escalate”), regardless of the parent sessions policy.
- We can add optin approval behaviors to specific session types in phase 2.
## Events and Rollout
- Each child conversation has its own rollout path under `sessions/YYYY/MM/DD/...` with its own `SessionMeta` and `TurnContext` items.
- Parent session emit `BackgroundEvent` messages:
- On create: `"spawned child session {id} with model {model_name}"`.
- On complete: `"child session {id} completed"`.
- On cancel: `child session {id} cancelled`.
### SessionSource for SubSessions
Today, `SessionMeta.source` distinguishes origins like `Cli`, `VSCode`, `Exec`, and `Mcp` (see `protocol/src/protocol.rs`). Adding `SessionSource::SubSession` would:
- Let rollouts clearly identify runs that were spawned by another session, enabling filtering, analytics, and UI affordances (e.g., “show only child runs”).
- Help group parent/child runs in future UX without relying on naming conventions or directory structure.
Tradeoffs:
- Requires updating the `protocol` crate (Rust + generated TS) and any consumers that switch over `SessionSource`.
- Backward compatibility: default unknown values to `Unknown` in older clients; new servers can safely emit `subsession`.
Phase 1 proposal: keep using an existing source (e.g., `Exec`) for minimal surface change, but reserve the enum value and wire it shortly after to avoid churn across downstreams.
## Relationship to Review Threads (Phase 3)
- Review mode today uses an isolated inmemory thread (no parent history) inside the same session/task, then emits `ExitedReviewMode` with structured output.
- Subsessions generalize isolation by giving a fully separate conversation with its own lifecycle, model, and tool surface, and an explicit await mechanism.
- We can later reimplement review as a preconfigured `SessionType` template if desired. (phase 3)
## Testing Strategy
- Unit tests for `SubsessionManager`:
- Spawns a child, captures `last_agent_message`, handles failure.
- Timeout behavior and nonblocking checks.
- `abort_all_children()` on parent drop/interrupt.
- Integration tests exercising the tools:
- Model calls `create_session` then `wait_session` and receives the child result.
- Cancellation via `cancel_session`.
- Deeper subsession spawns a subsession itself.
## Incremental Implementation Plan (Phase 1)
1. Add `subsessions` module to `codex-core` with `SubsessionManager` on `SessionServices`; APIs: spawn/wait/cancel/abort_all.
2. Implement background driver that runs a child conversation to `TaskComplete` and stores `{ result }`.
3. Add tool specs and handlers:
- `create_session(session_type, prompt)` → spawn via profile mapping and return id.
- `wait_session(session_id, timeout_ms)` → await completion with timeout and return result.
- `cancel_session(session_id)` → cancel a running child.
4. Phase 1: inherit tools from parent; force approval policy to noninteractive.
5. Emit optional `BackgroundEvent` diagnostics.
6. Add enum profiles and developer prompts for initial types (tester, mathematician, linter_fixer, default).
## Module Layout (Phase 1)
Inside `codex-core`:
- `core/src/subsessions/mod.rs` — manager, child state, profile mappings.
- `core/src/tools/handlers/subsessions.rs` — handlers for `create_session`, `wait_session`, `cancel_session`.
- Minimal changes in `core/src/state/service.rs` to attach the manager.
- No changes required to `tui` for phase 1; no UI exposure.
## Coding Guidelines Note
- Reuse existing primitives and flows (Codex spawn, submission loop, rollout).
- Prefer refactoring shared logic over duplicating or introducing adhoc hacks; any unavoidable interim workaround should be accompanied by a clear TODO comment and a followup refactor task.

View File

@@ -451,6 +451,10 @@ impl ModelClient {
pub fn get_auth_manager(&self) -> Option<Arc<AuthManager>> {
self.auth_manager.clone()
}
pub fn get_config(&self) -> Arc<Config> {
self.config.clone()
}
}
enum StreamAttemptError {

View File

@@ -98,6 +98,7 @@ use crate::rollout::RolloutRecorderParams;
use crate::shell;
use crate::state::ActiveTurn;
use crate::state::SessionServices;
use crate::subsessions::SubsessionManager;
use crate::tasks::CompactTask;
use crate::tasks::RegularTask;
use crate::tasks::ReviewTask;
@@ -474,6 +475,7 @@ impl Session {
turn_context.cwd.clone(),
config.codex_linux_sandbox_exe.clone(),
)),
subsessions: SubsessionManager::new(),
};
let sess = Arc::new(Session {
@@ -1087,6 +1089,10 @@ impl Session {
pub async fn interrupt_task(self: &Arc<Self>) {
info!("interrupt received: abort current task, if any");
self.abort_all_tasks(TurnAbortReason::Interrupted).await;
self.services
.subsessions
.abort_all_children(Arc::clone(self))
.await;
}
fn interrupt_task_sync(&self) {
@@ -1443,6 +1449,10 @@ async fn submission_loop(
}
Op::Shutdown => {
sess.abort_all_tasks(TurnAbortReason::Interrupted).await;
sess.services
.subsessions
.abort_all_children(Arc::clone(&sess))
.await;
info!("Shutting down Codex instance");
// Gracefully flush and shutdown rollout recorder on session end so tests
@@ -2773,6 +2783,7 @@ mod tests {
turn_context.cwd.clone(),
None,
)),
subsessions: SubsessionManager::new(),
};
let session = Session {
conversation_id,
@@ -2846,6 +2857,7 @@ mod tests {
config.cwd.clone(),
None,
)),
subsessions: SubsessionManager::new(),
};
let session = Arc::new(Session {
conversation_id,

View File

@@ -64,6 +64,7 @@ pub(crate) mod safety;
pub mod seatbelt;
pub mod shell;
pub mod spawn;
pub(crate) mod subsessions;
pub mod terminal;
mod tools;
pub mod turn_diff_tracker;

View File

@@ -100,6 +100,7 @@ pub fn find_family_for_model(mut slug: &str) -> Option<ModelFamily> {
supports_reasoning_summaries: true,
uses_local_shell_tool: true,
needs_special_apply_patch_instructions: true,
experimental_supported_tools: vec!["subsession".to_string()],
)
} else if slug.starts_with("gpt-4.1") {
model_family!(
@@ -123,6 +124,8 @@ pub fn find_family_for_model(mut slug: &str) -> Option<ModelFamily> {
"list_dir".to_string(),
"read_file".to_string(),
"test_sync_tool".to_string(),
"test_sync_tool".to_string(),
"subsession".to_string(),
],
supports_parallel_tool_calls: true,
)
@@ -139,6 +142,7 @@ pub fn find_family_for_model(mut slug: &str) -> Option<ModelFamily> {
"grep_files".to_string(),
"list_dir".to_string(),
"read_file".to_string(),
"subsession".to_string(),
],
supports_parallel_tool_calls: true,
)
@@ -151,6 +155,9 @@ pub fn find_family_for_model(mut slug: &str) -> Option<ModelFamily> {
reasoning_summary_format: ReasoningSummaryFormat::Experimental,
base_instructions: GPT_5_CODEX_INSTRUCTIONS.to_string(),
apply_patch_tool_type: Some(ApplyPatchToolType::Freeform),
experimental_supported_tools: vec![
"subsession".to_string(),
],
)
} else if slug.starts_with("gpt-5") {
model_family!(

View File

@@ -2,8 +2,10 @@ use crate::RolloutRecorder;
use crate::exec_command::ExecSessionManager;
use crate::executor::Executor;
use crate::mcp_connection_manager::McpConnectionManager;
use crate::subsessions::SubsessionManager;
use crate::unified_exec::UnifiedExecSessionManager;
use crate::user_notification::UserNotifier;
use std::sync::Arc;
use tokio::sync::Mutex;
pub(crate) struct SessionServices {
@@ -15,4 +17,5 @@ pub(crate) struct SessionServices {
pub(crate) user_shell: crate::shell::Shell,
pub(crate) show_raw_agent_reasoning: bool,
pub(crate) executor: Executor,
pub(crate) subsessions: Arc<SubsessionManager>,
}

View File

@@ -0,0 +1,45 @@
use codex_protocol::ConversationId;
use thiserror::Error;
#[derive(Debug, Error, Clone, PartialEq, Eq)]
pub enum SubsessionError {
#[error("unknown session {session_id}")]
UnknownSession { session_id: String },
#[error("session {session_id} is still running")]
Pending { session_id: String },
#[error("session {session_id} timed out after {timeout_ms}ms")]
Timeout { session_id: String, timeout_ms: u64 },
#[error("failed to spawn child session: {message}")]
SpawnFailed { message: String },
#[error("child session {session_id} cancelled")]
Cancelled { session_id: String },
#[error("missing auth manager for child sessions")]
MissingAuthManager,
}
impl SubsessionError {
pub(crate) fn unknown(id: &ConversationId) -> Self {
Self::UnknownSession {
session_id: id.to_string(),
}
}
pub(crate) fn pending(id: &ConversationId) -> Self {
Self::Pending {
session_id: id.to_string(),
}
}
pub(crate) fn cancelled(id: &ConversationId) -> Self {
Self::Cancelled {
session_id: id.to_string(),
}
}
pub(crate) fn timeout(id: &ConversationId, timeout_ms: u64) -> Self {
Self::Timeout {
session_id: id.to_string(),
timeout_ms,
}
}
}

View File

@@ -0,0 +1,462 @@
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;
use codex_protocol::ConversationId;
use codex_protocol::protocol::AgentMessageEvent;
use codex_protocol::protocol::Event;
use codex_protocol::protocol::EventMsg;
use codex_protocol::protocol::InputItem;
use codex_protocol::protocol::TaskCompleteEvent;
use tokio::sync::Mutex;
use tokio::sync::Notify;
use tokio::sync::watch;
use tokio::task::JoinHandle;
use tracing::warn;
use crate::codex::Codex;
use crate::codex::Session;
use crate::codex::TurnContext;
use crate::config::Config;
use crate::model_family::find_family_for_model;
use crate::protocol::AskForApproval;
use crate::protocol::InitialHistory;
use crate::protocol::Op;
use crate::protocol::SessionSource;
use crate::subsessions::error::SubsessionError;
use crate::subsessions::profile::SessionType;
use crate::subsessions::profile::SubsessionProfile;
pub(crate) type ChildResult = Result<Option<String>, SubsessionError>;
#[derive(Debug)]
enum ChildStatus {
Pending,
Done(Option<String>),
Failed(SubsessionError),
Cancelled,
}
struct ChildRecord {
status: Mutex<ChildStatus>,
notify: Notify,
cancel_tx: watch::Sender<bool>,
handle: Mutex<Option<JoinHandle<()>>>,
}
impl ChildRecord {
fn new(cancel_tx: watch::Sender<bool>) -> Self {
Self {
status: Mutex::new(ChildStatus::Pending),
notify: Notify::new(),
cancel_tx,
handle: Mutex::new(None),
}
}
async fn update(&self, status: ChildStatus) {
let mut guard = self.status.lock().await;
*guard = status;
self.notify.notify_waiters();
}
async fn status(&self) -> ChildStatus {
let guard = self.status.lock().await;
match &*guard {
ChildStatus::Pending => ChildStatus::Pending,
ChildStatus::Done(value) => ChildStatus::Done(value.clone()),
ChildStatus::Failed(err) => ChildStatus::Failed(err.clone()),
ChildStatus::Cancelled => ChildStatus::Cancelled,
}
}
async fn set_handle(&self, handle: JoinHandle<()>) {
let mut guard = self.handle.lock().await;
*guard = Some(handle);
}
async fn send_cancel(&self) {
if self.cancel_tx.send(true).is_err() {
warn!("subsession cancellation receiver already dropped");
}
let mut guard = self.handle.lock().await;
let _ = guard.take();
// Drop the handle so the task can observe the cancellation signal and shut down cleanly.
}
}
pub(crate) struct SubsessionManager {
children: Mutex<HashMap<ConversationId, Arc<ChildRecord>>>,
}
impl SubsessionManager {
pub(crate) fn new() -> Arc<Self> {
Arc::new(Self {
children: Mutex::new(HashMap::new()),
})
}
pub(crate) async fn spawn_child(
self: &Arc<Self>,
session: Arc<Session>,
turn: Arc<TurnContext>,
session_type: SessionType,
prompt: String,
) -> Result<ConversationId, SubsessionError> {
let profile = SubsessionProfile::for_session_type(session_type);
let parent_config = turn.client.get_config();
let child_config = build_child_config(parent_config.as_ref(), turn.as_ref(), &profile);
let model_name = child_config.model.clone();
let auth_manager = turn
.client
.get_auth_manager()
.ok_or(SubsessionError::MissingAuthManager)?;
let (cancel_tx, cancel_rx) = watch::channel(false);
let manager = Arc::clone(self);
let spawn_result = Codex::spawn(
child_config,
auth_manager,
InitialHistory::New,
SessionSource::Exec,
)
.await
.map_err(|err| SubsessionError::SpawnFailed {
message: format!("{err:#}"),
})?;
let conversation_id = spawn_result.conversation_id;
let codex = spawn_result.codex;
let record = Arc::new(ChildRecord::new(cancel_tx));
{
let mut guard = manager.children.lock().await;
guard.insert(conversation_id, Arc::clone(&record));
}
session
.notify_background_event(
"subsessions",
format!("spawned child session {conversation_id} with model {model_name}"),
)
.await;
let driver_session = Arc::clone(&session);
let driver_conversation_id = conversation_id;
let handle = tokio::spawn(async move {
let result = run_child_conversation(
driver_session.clone(),
codex,
driver_conversation_id,
prompt,
cancel_rx,
)
.await;
let status = match result {
Ok(value) => ChildStatus::Done(value),
Err(err) => match err {
SubsessionError::Cancelled { .. } => ChildStatus::Cancelled,
other => ChildStatus::Failed(other),
},
};
manager.finish_child(conversation_id, status).await;
});
record.set_handle(handle).await;
Ok(conversation_id)
}
pub(crate) async fn wait_child(
&self,
conversation_id: &ConversationId,
timeout: Option<Duration>,
) -> Result<Option<String>, SubsessionError> {
let record = self
.lookup(conversation_id)
.await
.ok_or_else(|| SubsessionError::unknown(conversation_id))?;
let mut status = record.status().await;
if matches!(status, ChildStatus::Pending) {
match timeout {
Some(duration) if duration == Duration::ZERO => {
return Err(SubsessionError::pending(conversation_id));
}
Some(duration) => {
let notified = tokio::time::timeout(duration, record.notify.notified()).await;
if notified.is_err() {
return Err(SubsessionError::timeout(
conversation_id,
duration.as_millis() as u64,
));
}
}
None => record.notify.notified().await,
}
status = record.status().await;
}
match status {
ChildStatus::Pending => Err(SubsessionError::pending(conversation_id)),
ChildStatus::Done(result) => Ok(result),
ChildStatus::Cancelled => Err(SubsessionError::cancelled(conversation_id)),
ChildStatus::Failed(err) => Err(err),
}
}
pub(crate) async fn cancel_child(
&self,
conversation_id: &ConversationId,
session: Arc<Session>,
) -> Result<bool, SubsessionError> {
let record = match self.lookup(conversation_id).await {
Some(rec) => rec,
None => return Err(SubsessionError::unknown(conversation_id)),
};
match record.status().await {
ChildStatus::Pending => {
record.send_cancel().await;
record.update(ChildStatus::Cancelled).await;
session
.notify_background_event(
"subsessions",
format!("child session {conversation_id} cancelled"),
)
.await;
Ok(true)
}
_ => Ok(false),
}
}
pub(crate) async fn abort_all_children(&self, session: Arc<Session>) {
if self.cancel_pending_children().await {
session
.notify_background_event("subsessions", "aborted all child sessions")
.await;
}
}
async fn finish_child(&self, conversation_id: ConversationId, status: ChildStatus) {
if let Some(record) = self.lookup(&conversation_id).await {
record.update(status).await;
} else {
warn!(%conversation_id, "dropping result for unknown child session");
}
}
async fn lookup(&self, conversation_id: &ConversationId) -> Option<Arc<ChildRecord>> {
let guard = self.children.lock().await;
guard.get(conversation_id).cloned()
}
async fn cancel_pending_children(&self) -> bool {
let children = {
let guard = self.children.lock().await;
guard.values().cloned().collect::<Vec<_>>()
};
let mut cancelled_any = false;
for child in children {
if matches!(child.status().await, ChildStatus::Pending) {
child.send_cancel().await;
child.update(ChildStatus::Cancelled).await;
cancelled_any = true;
}
}
cancelled_any
}
}
async fn run_child_conversation(
session: Arc<Session>,
mut codex: Codex,
conversation_id: ConversationId,
prompt: String,
mut cancel_rx: watch::Receiver<bool>,
) -> ChildResult {
let submit_input = Op::UserInput {
items: vec![InputItem::Text {
text: prompt.clone(),
}],
};
if let Err(err) = codex.submit(submit_input).await {
return Err(SubsessionError::SpawnFailed {
message: format!("failed to submit child input: {err:#}"),
});
}
let mut last_agent_message: Option<String> = None;
loop {
tokio::select! {
changed = cancel_rx.changed() => {
if changed.is_ok() && *cancel_rx.borrow() {
let _ = codex.submit(Op::Shutdown).await;
return Err(SubsessionError::cancelled(&conversation_id));
}
}
event = codex.next_event() => {
let event = event.map_err(|err| SubsessionError::SpawnFailed {
message: format!("child session stream error: {err:#}"),
})?;
match handle_child_event(
&session,
&mut codex,
&conversation_id,
event,
&mut last_agent_message,
)
.await? {
EventProgress::Continue => continue,
EventProgress::Completed => return Ok(last_agent_message.clone()),
}
}
}
}
}
enum EventProgress {
Continue,
Completed,
}
async fn handle_child_event(
session: &Arc<Session>,
codex: &mut Codex,
conversation_id: &ConversationId,
event: Event,
last_agent_message: &mut Option<String>,
) -> Result<EventProgress, SubsessionError> {
match event.msg {
EventMsg::AgentMessage(AgentMessageEvent { message }) => {
*last_agent_message = Some(message);
Ok(EventProgress::Continue)
}
EventMsg::TaskComplete(TaskCompleteEvent {
last_agent_message: msg,
}) => {
if msg.is_some() {
*last_agent_message = msg;
}
let _ = codex.submit(Op::Shutdown).await;
session
.notify_background_event(
"subsessions",
format!("child session {conversation_id} completed"),
)
.await;
Ok(EventProgress::Continue)
}
EventMsg::ShutdownComplete => Ok(EventProgress::Completed),
EventMsg::Error(err) => Err(SubsessionError::SpawnFailed {
message: format!("child session error: {}", err.message),
}),
_ => Ok(EventProgress::Continue),
}
}
fn build_child_config(
parent_config: &Config,
turn: &TurnContext,
profile: &SubsessionProfile,
) -> Config {
let mut config = parent_config.clone();
config.cwd = turn.cwd.clone();
config.approval_policy = AskForApproval::Never;
config.sandbox_policy = turn.sandbox_policy.clone();
config.shell_environment_policy = turn.shell_environment_policy.clone();
config.base_instructions = Some(profile.developer_instructions.to_string());
config.model = profile
.model_name
.map(std::string::ToString::to_string)
.unwrap_or_else(|| parent_config.model.clone());
if let Some(family) = find_family_for_model(&config.model) {
config.model_family = family;
}
config.model_reasoning_effort = turn.client.get_reasoning_effort();
config.model_reasoning_summary = turn.client.get_reasoning_summary();
config
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
use tokio::sync::oneshot;
use tokio::sync::watch;
use tokio::time::timeout;
#[tokio::test]
async fn cancel_pending_children_only_updates_pending_records() {
let manager = SubsessionManager::new();
let (pending_tx, _pending_rx) = watch::channel(false);
let pending_record = Arc::new(ChildRecord::new(pending_tx));
let pending_id = ConversationId::default();
{
let mut guard = manager.children.lock().await;
guard.insert(pending_id, Arc::clone(&pending_record));
}
let (done_tx, _done_rx) = watch::channel(false);
let done_record = Arc::new(ChildRecord::new(done_tx));
done_record
.update(ChildStatus::Done(Some("final".to_string())))
.await;
let done_id = ConversationId::new();
{
let mut guard = manager.children.lock().await;
guard.insert(done_id, Arc::clone(&done_record));
}
let cancelled = manager.cancel_pending_children().await;
assert!(cancelled, "pending record should be cancelled");
assert!(matches!(
pending_record.status().await,
ChildStatus::Cancelled
));
match done_record.status().await {
ChildStatus::Done(value) => assert_eq!(value.as_deref(), Some("final")),
status => panic!("expected done status, got {status:?}"),
}
}
#[tokio::test]
async fn send_cancel_allows_task_to_observe_shutdown() {
let (cancel_tx, mut cancel_rx) = watch::channel(false);
let record = Arc::new(ChildRecord::new(cancel_tx));
let (observed_tx, observed_rx) = oneshot::channel();
let handle = tokio::spawn(async move {
loop {
if *cancel_rx.borrow() {
let _ = observed_tx.send(());
break;
}
if cancel_rx.changed().await.is_err() {
break;
}
}
});
record.set_handle(handle).await;
record.send_cancel().await;
let observed = timeout(Duration::from_millis(200), observed_rx).await;
assert!(
observed.is_ok(),
"task should observe cancellation before the handle is dropped"
);
assert!(
observed.unwrap().is_ok(),
"task should report cancellation observation"
);
}
}

View File

@@ -0,0 +1,7 @@
mod error;
mod manager;
mod profile;
pub(crate) use error::SubsessionError;
pub(crate) use manager::SubsessionManager;
pub(crate) use profile::SessionType;

View File

@@ -0,0 +1,109 @@
use crate::config::GPT_5_CODEX_MEDIUM_MODEL;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum SessionType {
Tester,
Mathematician,
LinterFixer,
Default,
}
impl SessionType {
pub fn as_str(self) -> &'static str {
match self {
SessionType::Tester => "tester",
SessionType::Mathematician => "mathematician",
SessionType::LinterFixer => "linter_fixer",
SessionType::Default => "default",
}
}
}
impl std::fmt::Display for SessionType {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str((*self).as_str())
}
}
impl std::str::FromStr for SessionType {
type Err = ();
fn from_str(value: &str) -> Result<Self, Self::Err> {
match value {
"tester" => Ok(SessionType::Tester),
"mathematician" => Ok(SessionType::Mathematician),
"linter_fixer" => Ok(SessionType::LinterFixer),
"default" => Ok(SessionType::Default),
_ => Err(()),
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct SubsessionProfile {
pub session_type: SessionType,
pub developer_instructions: &'static str,
pub model_name: Option<&'static str>,
}
impl SubsessionProfile {
pub fn for_session_type(session_type: SessionType) -> Self {
match session_type {
SessionType::Tester => Self {
session_type,
developer_instructions: LINTER_FIXER_PROMPT,
model_name: Some(GPT_5_CODEX_MEDIUM_MODEL),
},
SessionType::Mathematician => Self {
session_type,
developer_instructions: LINTER_FIXER_PROMPT,
model_name: Some(GPT_5_CODEX_MEDIUM_MODEL),
},
SessionType::LinterFixer => Self {
session_type,
developer_instructions: LINTER_FIXER_PROMPT,
model_name: Some(GPT_5_CODEX_MEDIUM_MODEL),
},
SessionType::Default => Self {
session_type,
developer_instructions: DEFAULT_PROMPT,
model_name: None,
},
}
}
}
const MAIN_PROMPT: &str = include_str!("../../gpt_5_codex_prompt.md");
const TESTER_PROMPT: &str = "\
You are a focused software testing assistant. Generate precise, minimal, and \
actionable tests that directly validate the described behavior. When clarifying \
requirements, ask only what is necessary.";
const MATHEMATICIAN_PROMPT: &str = "\
You are a detail-oriented mathematical reasoning assistant. Solve problems with \
clear derivations, keep intermediate notes concise, and prefer exact symbolic \
results when practical.";
const LINTER_FIXER_PROMPT: &str = include_str!("profiles/linter.md");
const DEFAULT_PROMPT: &str = "\
You are a compact subsession assistant. Provide direct, implementation-ready \
answers for the given request without rehashing unrelated project context.";
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn session_type_roundtrips() {
for ty in [
SessionType::Tester,
SessionType::Mathematician,
SessionType::LinterFixer,
SessionType::Default,
] {
let parsed: SessionType = ty.as_str().parse().expect("parse");
assert_eq!(parsed, ty);
}
}
}

View File

@@ -0,0 +1,57 @@
You are Codex, based on GPT-5. You are running as a coding agent in the Codex CLI on a user's computer.
## General
- The arguments to `shell` will be passed to execvp(). Most terminal commands should be prefixed with ["bash", "-lc"].
- Always set the `workdir` param when using the shell function. Do not use `cd` unless absolutely necessary.
- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)
## Editing constraints
- Default to ASCII when editing or creating files. Only introduce non-ASCII or other Unicode characters when there is a clear justification and the file already uses them.
- Add succinct code comments that explain what is going on if code is not self-explanatory. You should not add comments like "Assigns the value to the variable", but a brief comment might be useful ahead of a complex code block that the user would otherwise have to spend time parsing out. Usage of these comments should be rare.
- Try to use apply_patch for single file edits, but it is fine to explore other options to make the edit if it does not work well. Do not use apply_patch for changes that are auto-generated (i.e. generating package.json or running a lint or format command like gofmt) or when scripting is more efficient (such as search and replacing a string across a codebase).
- You may be in a dirty git worktree.
* NEVER revert existing changes you did not make unless explicitly requested, since these changes were made by the user.
* If asked to make a commit or code edits and there are unrelated changes to your work or changes that you didn't make in those files, don't revert those changes.
* If the changes are in files you've touched recently, you should read carefully and understand how you can work with the changes rather than reverting them.
* If the changes are in unrelated files, just ignore them and don't revert them.
- While you are working, you might notice unexpected changes that you didn't make. If this happens, STOP IMMEDIATELY and ask the user how they would like to proceed.
- **NEVER** use destructive commands like `git reset --hard` or `git checkout --` unless specifically requested or approved by the user.
## Presenting your work and final message
You are producing plain text that will later be styled by the CLI. Follow these rules exactly. Formatting should make results easy to scan, but not feel mechanical. Use judgment to decide how much structure adds value.
- Default: be very concise; friendly coding teammate tone.
- Ask only when needed; suggest ideas; mirror the user's style.
- For substantial work, summarize clearly; follow finalanswer formatting.
- Skip heavy formatting for simple confirmations.
- Don't dump large files you've written; reference paths only.
- No "save/copy this file" - User is on the same machine.
- Offer logical next steps (tests, commits, build) briefly; add verify steps if you couldn't do something.
- For code changes:
* Lead with a quick explanation of the change, and then give more details on the context covering where and why a change was made. Do not start this explanation with "summary", just jump right in.
* If there are natural next steps the user may want to take, suggest them at the end of your response. Do not make suggestions if there are no natural next steps.
* When suggesting multiple options, use numeric lists for the suggestions so the user can quickly respond with a single number.
- The user does not command execution outputs. When asked to show the output of a command (e.g. `git show`), relay the important details in your answer or summarize the key lines so the user understands the result.
### Final answer structure and style guidelines
- Plain text; CLI handles styling. Use structure only when it helps scanability.
- Headers: optional; short Title Case (1-3 words) wrapped in **…**; no blank line before the first bullet; add only if they truly help.
- Bullets: use - ; merge related points; keep to one line when possible; 46 per list ordered by importance; keep phrasing consistent.
- Monospace: backticks for commands/paths/env vars/code ids and inline examples; use for literal keyword bullets; never combine with **.
- Code samples or multi-line snippets should be wrapped in fenced code blocks; include an info string as often as possible.
- Structure: group related bullets; order sections general → specific → supporting; for subsections, start with a bolded keyword bullet, then items; match complexity to the task.
- Tone: collaborative, concise, factual; present tense, active voice; selfcontained; no "above/below"; parallel wording.
- Don'ts: no nested bullets/hierarchies; no ANSI codes; don't cram unrelated keywords; keep keyword lists short—wrap/reformat if long; avoid naming formatting styles in answers.
- Adaptation: code explanations → precise, structured with code refs; simple tasks → lead with outcome; big changes → logical walkthrough + rationale + next actions; casual one-offs → plain sentences, no headers/bullets.
- File References: When referencing files in your response, make sure to include the relevant start line and always follow the below rules:
* Use inline code to make file paths clickable.
* Each reference should have a stand alone path. Even if it's the same file.
* Accepted: absolute, workspacerelative, a/ or b/ diff prefixes, or bare filename/suffix.
* Line/column (1based, optional): :line[:column] or #Lline[Ccolumn] (column defaults to 1).
* Do not use URIs like file://, vscode://, or https://.
* Do not provide range of lines
* Examples: src/app.ts, src/app.ts:42, b/server/index.js#L10, C:\repo\project\main.rs:12:5

View File

@@ -6,6 +6,7 @@ mod mcp;
mod plan;
mod read_file;
mod shell;
mod subsessions;
mod test_sync;
mod unified_exec;
mod view_image;
@@ -20,6 +21,7 @@ pub use mcp::McpHandler;
pub use plan::PlanHandler;
pub use read_file::ReadFileHandler;
pub use shell::ShellHandler;
pub use subsessions::SubsessionsHandler;
pub use test_sync::TestSyncHandler;
pub use unified_exec::UnifiedExecHandler;
pub use view_image::ViewImageHandler;

View File

@@ -0,0 +1,181 @@
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use codex_protocol::ConversationId;
use serde::Deserialize;
use crate::function_tool::FunctionCallError;
use crate::subsessions::SessionType;
use crate::subsessions::SubsessionError;
use crate::tools::context::ToolInvocation;
use crate::tools::context::ToolOutput;
use crate::tools::context::ToolPayload;
use crate::tools::registry::ToolHandler;
use crate::tools::registry::ToolKind;
pub struct SubsessionsHandler;
#[derive(Deserialize)]
struct CreateSessionArgs {
session_type: String,
prompt: String,
}
#[derive(Deserialize)]
struct WaitSessionArgs {
session_id: String,
timeout_ms: Option<i32>,
}
#[derive(Deserialize)]
struct CancelSessionArgs {
session_id: String,
}
#[async_trait]
impl ToolHandler for SubsessionsHandler {
fn kind(&self) -> ToolKind {
ToolKind::Function
}
async fn handle(&self, invocation: ToolInvocation) -> Result<ToolOutput, FunctionCallError> {
let ToolInvocation {
session,
turn,
tool_name,
payload,
..
} = invocation;
let arguments = match payload {
ToolPayload::Function { arguments } => arguments,
_ => {
return Err(FunctionCallError::RespondToModel(format!(
"{tool_name} received unsupported payload"
)));
}
};
match tool_name.as_str() {
"create_session" => {
let args: CreateSessionArgs = parse_args(&arguments)?;
handle_create_session(session, turn, args).await
}
"wait_session" => {
let args: WaitSessionArgs = parse_args(&arguments)?;
handle_wait_session(session, args).await
}
"cancel_session" => {
let args: CancelSessionArgs = parse_args(&arguments)?;
handle_cancel_session(session, args).await
}
_ => Err(FunctionCallError::RespondToModel(format!(
"unsupported subsession tool {tool_name}"
))),
}
}
}
fn parse_args<T: for<'de> Deserialize<'de>>(arguments: &str) -> Result<T, FunctionCallError> {
serde_json::from_str(arguments).map_err(|err| {
FunctionCallError::RespondToModel(format!("failed to parse arguments: {err}"))
})
}
async fn handle_create_session(
session: Arc<crate::codex::Session>,
turn: Arc<crate::codex::TurnContext>,
args: CreateSessionArgs,
) -> Result<ToolOutput, FunctionCallError> {
let session_type = args.session_type.parse::<SessionType>().map_err(|_| {
FunctionCallError::RespondToModel(format!("unknown session_type {}", args.session_type))
})?;
let subsessions = Arc::clone(&session.services.subsessions);
let conversation_id = subsessions
.spawn_child(session, turn, session_type, args.prompt)
.await
.map_err(map_subsession_err)?;
let payload = serde_json::json!({ "session_id": conversation_id.to_string() });
Ok(ToolOutput::Function {
content: payload.to_string(),
success: Some(true),
})
}
async fn handle_wait_session(
session: Arc<crate::codex::Session>,
args: WaitSessionArgs,
) -> Result<ToolOutput, FunctionCallError> {
let conversation_id = parse_session_id(&args.session_id)?;
let subsessions = Arc::clone(&session.services.subsessions);
let timeout = args.timeout_ms.map(|value| {
if value > 0 {
Duration::from_millis(value as u64)
} else {
Duration::ZERO
}
});
let result = subsessions
.wait_child(&conversation_id, timeout)
.await
.map_err(map_subsession_err)?;
let payload = serde_json::json!({ "result": result });
Ok(ToolOutput::Function {
content: payload.to_string(),
success: Some(true),
})
}
async fn handle_cancel_session(
session: Arc<crate::codex::Session>,
args: CancelSessionArgs,
) -> Result<ToolOutput, FunctionCallError> {
let conversation_id = parse_session_id(&args.session_id)?;
let subsessions = Arc::clone(&session.services.subsessions);
let cancelled = subsessions
.cancel_child(&conversation_id, Arc::clone(&session))
.await
.map_err(map_subsession_err)?;
let payload = serde_json::json!({ "cancelled": cancelled });
Ok(ToolOutput::Function {
content: payload.to_string(),
success: Some(true),
})
}
fn parse_session_id(value: &str) -> Result<ConversationId, FunctionCallError> {
ConversationId::from_string(value).map_err(|err| {
FunctionCallError::RespondToModel(format!("invalid session_id {value}: {err}"))
})
}
fn map_subsession_err(err: SubsessionError) -> FunctionCallError {
FunctionCallError::RespondToModel(err.to_string())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn rejects_invalid_session_id() {
let err = parse_session_id("not-a-uuid").expect_err("invalid id");
let FunctionCallError::RespondToModel(message) = err else {
panic!("expected respond error");
};
assert!(message.contains("invalid session_id"));
}
#[test]
fn parses_valid_session_id() {
let id = ConversationId::default();
let parsed = parse_session_id(&id.to_string()).expect("parse id");
assert_eq!(parsed.to_string(), id.to_string());
}
}

View File

@@ -13,6 +13,8 @@ use serde_json::json;
use std::collections::BTreeMap;
use std::collections::HashMap;
const SUBSESSION_TOOL_FLAG: &str = "subsession";
#[derive(Debug, Clone)]
pub enum ConfigShellToolType {
Default,
@@ -258,6 +260,88 @@ fn create_view_image_tool() -> ToolSpec {
})
}
fn create_create_session_tool() -> ToolSpec {
let mut properties = BTreeMap::new();
properties.insert(
"session_type".to_string(),
JsonSchema::String {
description: Some(
"Session profile to use (tester, mathematician, linter_fixer, default)".to_string(),
),
},
);
properties.insert(
"prompt".to_string(),
JsonSchema::String {
description: Some("Initial user prompt for the child session".to_string()),
},
);
ToolSpec::Function(ResponsesApiTool {
name: "create_session".to_string(),
description: "Spawn a child Codex session with the requested profile.".to_string(),
strict: false,
parameters: JsonSchema::Object {
properties,
required: Some(vec!["session_type".to_string(), "prompt".to_string()]),
additional_properties: Some(false.into()),
},
})
}
fn create_wait_session_tool() -> ToolSpec {
let mut properties = BTreeMap::new();
properties.insert(
"session_id".to_string(),
JsonSchema::String {
description: Some(
"Identifier returned from create_session identifying the child session".to_string(),
),
},
);
properties.insert(
"timeout_ms".to_string(),
JsonSchema::Number {
description: Some(
"Milliseconds to wait; values <= 0 check status without waiting.".to_string(),
),
},
);
ToolSpec::Function(ResponsesApiTool {
name: "wait_session".to_string(),
description: "Wait for a child session to finish and return its final assistant message."
.to_string(),
strict: false,
parameters: JsonSchema::Object {
properties,
required: Some(vec!["session_id".to_string()]),
additional_properties: Some(false.into()),
},
})
}
fn create_cancel_session_tool() -> ToolSpec {
let mut properties = BTreeMap::new();
properties.insert(
"session_id".to_string(),
JsonSchema::String {
description: Some("Identifier for the child session to cancel".to_string()),
},
);
ToolSpec::Function(ResponsesApiTool {
name: "cancel_session".to_string(),
description: "Cancel a pending child session if it is still running.".to_string(),
strict: false,
parameters: JsonSchema::Object {
properties,
required: Some(vec!["session_id".to_string()]),
additional_properties: Some(false.into()),
},
})
}
fn create_test_sync_tool() -> ToolSpec {
let mut properties = BTreeMap::new();
properties.insert(
@@ -680,6 +764,10 @@ pub(crate) fn build_specs(
let apply_patch_handler = Arc::new(ApplyPatchHandler);
let view_image_handler = Arc::new(ViewImageHandler);
let mcp_handler = Arc::new(McpHandler);
let has_subsession_tool = config
.experimental_supported_tools
.iter()
.any(|tool| tool == SUBSESSION_TOOL_FLAG);
if config.experimental_unified_exec_tool {
builder.push_spec(create_unified_exec_tool());
@@ -710,6 +798,17 @@ pub(crate) fn build_specs(
builder.register_handler("container.exec", shell_handler.clone());
builder.register_handler("local_shell", shell_handler);
if has_subsession_tool {
use crate::tools::handlers::SubsessionsHandler;
let subsessions_handler = Arc::new(SubsessionsHandler);
builder.push_spec(create_create_session_tool());
builder.push_spec(create_wait_session_tool());
builder.push_spec(create_cancel_session_tool());
builder.register_handler("create_session", subsessions_handler.clone());
builder.register_handler("wait_session", subsessions_handler.clone());
builder.register_handler("cancel_session", subsessions_handler);
}
if config.plan_tool {
builder.push_spec(PLAN_TOOL.clone());
builder.register_handler("update_plan", plan_handler);
@@ -858,7 +957,15 @@ mod tests {
assert_eq_tool_names(
&tools,
&["unified_exec", "update_plan", "web_search", "view_image"],
&[
"unified_exec",
"create_session",
"wait_session",
"cancel_session",
"update_plan",
"web_search",
"view_image",
],
);
}
@@ -935,6 +1042,11 @@ mod tests {
.any(|tool| tool_name(&tool.spec) == "grep_files")
);
assert!(tools.iter().any(|tool| tool_name(&tool.spec) == "list_dir"));
assert!(
tools
.iter()
.any(|tool| tool_name(&tool.spec) == "create_session")
);
}
#[test]
@@ -998,8 +1110,9 @@ mod tests {
],
);
let tool = find_tool(&tools, "test_server/do_something_cool");
assert_eq!(
tools[3].spec,
tool.spec,
ToolSpec::Function(ResponsesApiTool {
name: "test_server/do_something_cool".to_string(),
parameters: JsonSchema::Object {
@@ -1167,8 +1280,9 @@ mod tests {
],
);
let tool = find_tool(&tools, "dash/search");
assert_eq!(
tools[4].spec,
tool.spec,
ToolSpec::Function(ResponsesApiTool {
name: "dash/search".to_string(),
parameters: JsonSchema::Object {
@@ -1233,8 +1347,9 @@ mod tests {
"dash/paginate",
],
);
let tool = find_tool(&tools, "dash/paginate");
assert_eq!(
tools[4].spec,
tool.spec,
ToolSpec::Function(ResponsesApiTool {
name: "dash/paginate".to_string(),
parameters: JsonSchema::Object {
@@ -1297,8 +1412,9 @@ mod tests {
"dash/tags",
],
);
let tool = find_tool(&tools, "dash/tags");
assert_eq!(
tools[4].spec,
tool.spec,
ToolSpec::Function(ResponsesApiTool {
name: "dash/tags".to_string(),
parameters: JsonSchema::Object {
@@ -1364,8 +1480,9 @@ mod tests {
"dash/value",
],
);
let tool = find_tool(&tools, "dash/value");
assert_eq!(
tools[4].spec,
tool.spec,
ToolSpec::Function(ResponsesApiTool {
name: "dash/value".to_string(),
parameters: JsonSchema::Object {
@@ -1469,8 +1586,9 @@ mod tests {
],
);
let tool = find_tool(&tools, "test_server/do_something_cool");
assert_eq!(
tools[4].spec,
tool.spec,
ToolSpec::Function(ResponsesApiTool {
name: "test_server/do_something_cool".to_string(),
parameters: JsonSchema::Object {

View File

@@ -25,6 +25,7 @@ mod seatbelt;
mod shell_serialization;
mod stream_error_allows_next_turn;
mod stream_no_completed;
mod subsessions;
mod tool_harness;
mod tool_parallelism;
mod tools;

View File

@@ -93,7 +93,12 @@ async fn model_selects_expected_tools() {
let codex_tools = collect_tool_identifiers_for_model("codex-mini-latest").await;
assert_eq!(
codex_tools,
vec!["local_shell".to_string()],
vec![
"local_shell".to_string(),
"create_session".to_string(),
"wait_session".to_string(),
"cancel_session".to_string(),
],
"codex-mini-latest should expose the local shell tool",
);
@@ -107,7 +112,21 @@ async fn model_selects_expected_tools() {
let gpt5_codex_tools = collect_tool_identifiers_for_model("gpt-5-codex").await;
assert_eq!(
gpt5_codex_tools,
vec!["shell".to_string(), "apply_patch".to_string(),],
vec!["shell".to_string(), "apply_patch".to_string()],
"gpt-5-codex should expose the apply_patch tool",
);
let test_codex_tools = collect_tool_identifiers_for_model("test-gpt-5-codex").await;
assert_eq!(
test_codex_tools,
vec![
"shell".to_string(),
"create_session".to_string(),
"wait_session".to_string(),
"cancel_session".to_string(),
"read_file".to_string(),
"test_sync_tool".to_string(),
],
"test-gpt-5-codex should expose subsession helpers along with test utilities",
);
}

View File

@@ -0,0 +1,298 @@
use anyhow::Result;
use codex_core::model_family::find_family_for_model;
use codex_core::protocol::AskForApproval;
use codex_core::protocol::EventMsg;
use codex_core::protocol::InputItem;
use codex_core::protocol::Op;
use codex_core::protocol::SandboxPolicy;
use codex_protocol::config_types::ReasoningSummary;
use core_test_support::responses;
use core_test_support::responses::ev_assistant_message;
use core_test_support::responses::ev_completed;
use core_test_support::responses::ev_response_created;
use core_test_support::responses::sse;
use core_test_support::responses::start_mock_server;
use core_test_support::skip_if_no_network;
use core_test_support::test_codex::TestCodex;
use core_test_support::test_codex::test_codex;
use core_test_support::wait_for_event;
use serde_json::Value;
use tokio::test;
use wiremock::matchers::any;
use wiremock::matchers::body_string_contains;
#[allow(clippy::expect_used)]
async fn collect_tool_names(model: &str) -> Result<Vec<String>> {
let server = start_mock_server().await;
let model_owned = model.to_string();
let TestCodex {
codex,
cwd,
session_configured,
..
} = test_codex()
.with_config(move |config| {
config.model = model_owned.clone();
config.model_family =
find_family_for_model(&model_owned).expect("model family available for test");
})
.build(&server)
.await?;
let response = sse(vec![
ev_response_created("resp-1"),
ev_assistant_message("msg-1", "done"),
ev_completed("resp-1"),
]);
responses::mount_sse_once_match(&server, any(), response).await;
codex
.submit(Op::UserTurn {
items: vec![InputItem::Text {
text: "ping".into(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_configured.model.clone(),
effort: None,
summary: ReasoningSummary::Auto,
})
.await?;
wait_for_event(&codex, |event| matches!(event, EventMsg::TaskComplete(_))).await;
let requests = server.received_requests().await.expect("recorded requests");
let first_body = requests
.first()
.ok_or_else(|| anyhow::anyhow!("expected at least one request"))?
.body_json::<Value>()?;
let tool_names = first_body
.get("tools")
.and_then(|tools| tools.as_array())
.map(|entries| {
entries
.iter()
.filter_map(|entry| entry.get("name").and_then(Value::as_str))
.map(str::to_string)
.collect::<Vec<_>>()
})
.unwrap_or_default();
Ok(tool_names)
}
#[test(flavor = "multi_thread", worker_threads = 2)]
async fn codex_models_expose_subsession_tools() -> Result<()> {
skip_if_no_network!(Ok(()));
let tools = collect_tool_names("codex-mini-latest").await?;
assert!(
tools.contains(&"create_session".to_string())
&& tools.contains(&"wait_session".to_string())
&& tools.contains(&"cancel_session".to_string()),
"expected subsession tool trio in {tools:?}"
);
Ok(())
}
#[test(flavor = "multi_thread", worker_threads = 2)]
async fn test_models_expose_subsession_tools() -> Result<()> {
skip_if_no_network!(Ok(()));
let tools = collect_tool_names("test-gpt-5-codex").await?;
assert!(
tools.contains(&"create_session".to_string())
&& tools.contains(&"wait_session".to_string())
&& tools.contains(&"cancel_session".to_string()),
"expected subsession tool trio in {tools:?}"
);
Ok(())
}
#[test(flavor = "multi_thread", worker_threads = 2)]
async fn gpt5_codex_models_do_not_expose_subsession_tools() -> Result<()> {
skip_if_no_network!(Ok(()));
let tools = collect_tool_names("gpt-5-codex").await?;
assert!(
!tools.contains(&"create_session".to_string()),
"unexpected subsession tools in {tools:?}"
);
Ok(())
}
#[test(flavor = "multi_thread", worker_threads = 2)]
async fn subsession_can_apply_patch_to_workspace() -> Result<()> {
skip_if_no_network!(Ok(()));
let server = start_mock_server().await;
let mut builder = test_codex().with_config(|config| {
// Ensure subsession tools are exposed and apply_patch is available in child.
config.model = "test-gpt-5-codex".to_string();
config.model_family =
find_family_for_model("test-gpt-5-codex").expect("model family available for test");
config.include_apply_patch_tool = true;
});
let TestCodex {
codex,
cwd,
session_configured,
..
} = builder.build(&server).await?;
// Parent turn 1: ask to spawn a subsession to create a file.
// The parent model will call create_session with a prompt instructing the child.
let parent_first = sse(vec![
ev_response_created("resp-parent-1"),
responses::ev_function_call(
"create-session-1",
"create_session",
&serde_json::json!({
"session_type": "default",
"prompt": "Create a file named subsession.txt with the exact contents 'Hello from subsession'",
})
.to_string(),
),
ev_completed("resp-parent-1"),
]);
responses::mount_sse_once_match(&server, any(), parent_first).await;
// Child turn 1: upon spawn, the child will call apply_patch to create the file.
// Match on the subsession instructions to route this to the child conversation.
let child_first = sse(vec![
ev_response_created("resp-child-1"),
responses::ev_apply_patch_function_call(
"apply-patch-child-1",
r#"*** Begin Patch
*** Add File: subsession.txt
+Hello from subsession
*** End Patch"#,
),
ev_completed("resp-child-1"),
]);
responses::mount_sse_once_match(
&server,
body_string_contains("You are a compact subsession assistant"),
child_first,
)
.await;
// Parent follow-up: after the tool result is returned, the parent may send a
// subsequent request. Provide a simple assistant message to close the turn.
let parent_second = sse(vec![
ev_assistant_message("msg-parent-2", "subsession started"),
ev_completed("resp-parent-2"),
]);
responses::mount_sse_once_match(
&server,
body_string_contains("\"function_call_output\""),
parent_second,
)
.await;
// Child follow-up: after apply_patch executes, the child continues and then finishes.
let child_second = sse(vec![
ev_assistant_message("msg-child-2", "done"),
ev_completed("resp-child-2"),
]);
responses::mount_sse_once_match(
&server,
body_string_contains("You are a compact subsession assistant"),
child_second,
)
.await;
// Kick off the parent turn which should spawn the subsession.
let session_model = session_configured.model.clone();
codex
.submit(Op::UserTurn {
items: vec![InputItem::Text {
text: "please spawn a subsession to create a file".into(),
}],
final_output_json_schema: None,
cwd: cwd.path().to_path_buf(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::DangerFullAccess,
model: session_model,
effort: None,
summary: ReasoningSummary::Auto,
})
.await?;
// Capture the child session id from the background event.
let mut child_id: Option<String> = None;
wait_for_event(&codex, |event| match event {
EventMsg::BackgroundEvent(ev) => {
if let Some((_, after)) = ev.message.split_once("spawned child session ")
&& let Some((id, _)) = after.split_once(' ')
{
child_id = Some(id.to_string());
return true;
}
false
}
_ => false,
})
.await;
// Wait for the child to complete and emit its final background event.
// This makes the file write deterministic before we assert.
let _ = wait_for_event(&codex, |event| match event {
EventMsg::BackgroundEvent(ev) => {
if let Some(id) = child_id.as_deref() {
return ev
.message
.contains(&format!("child session {id} completed"));
}
false
}
_ => false,
})
.await;
// Debug: inspect recorded requests to confirm routing during failures.
if let Some(requests) = server.received_requests().await {
for (i, req) in requests.iter().enumerate() {
if let Ok(body) = req.body_json::<Value>() {
let instr = body
.get("instructions")
.and_then(Value::as_str)
.unwrap_or("");
let has_apply = body
.get("tools")
.and_then(Value::as_array)
.map(|a| {
a.iter().any(|t| {
t.get("function")
.and_then(|f| f.get("name"))
.and_then(Value::as_str)
== Some("apply_patch")
|| t.get("name").and_then(Value::as_str) == Some("apply_patch")
})
})
.unwrap_or(false);
let has_fn_output = body
.get("input")
.and_then(Value::as_array)
.map(|a| {
a.iter().any(|it| {
it.get("type").and_then(Value::as_str) == Some("function_call_output")
})
})
.unwrap_or(false);
eprintln!(
"req#{i}: instr_has_subsession_prompt={} tools_include_apply_patch={} has_fn_output={}",
instr.contains("compact subsession assistant"),
has_apply,
has_fn_output,
);
}
}
}
// Verify the file created by the subsession exists with the expected contents.
let created_path = cwd.path().join("subsession.txt");
let contents = std::fs::read_to_string(&created_path)
.unwrap_or_else(|e| panic!("failed reading {}: {e}", created_path.display()));
assert_eq!(contents, "Hello from subsession\n");
Ok(())
}