Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)

## Summary
- Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
tags from normal assistant output.
- Persist plan items and rebuild them on resume so proposed plans show
in thread history.
- Wire plan items/deltas through app-server protocol v2 and render a
dedicated proposed-plan view in the TUI, including the “Implement this
plan?” prompt only when a plan item is present.

## Changes

### Core (`codex-rs/core`)
- Added a generic, line-based tag parser that buffers each line until it
can disprove a tag prefix; implements auto-close on `finish()` for
unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
- Refactored proposed plan parsing to wrap the generic parser.
`codex-rs/core/src/proposed_plan_parser.rs`
- In plan mode, stream assistant deltas as:
  - **Normal text** → `AgentMessageContentDelta`
  - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
  (`codex-rs/core/src/codex.rs`)
- Final plan item content is derived from the completed assistant
message (authoritative), not necessarily the concatenated deltas.
- Strips `<proposed_plan>` blocks from assistant text in plan mode so
tags don’t appear in normal messages.
(`codex-rs/core/src/stream_events_utils.rs`)
- Persist `ItemCompleted` events only for plan items for rollout replay.
(`codex-rs/core/src/rollout/policy.rs`)
- Guard `update_plan` tool in Plan Mode with a clear error message.
(`codex-rs/core/src/tools/handlers/plan.rs`)
- Updated Plan Mode prompt to:  
  - keep `<proposed_plan>` out of non-final reasoning/preambles  
  - require exact tag formatting  
  - allow only one `<proposed_plan>` block per turn  
  (`codex-rs/core/templates/collaboration_mode/plan.md`)

### Protocol / App-server protocol
- Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
(`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
- Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
EXPERIMENTAL markers and note that deltas may not match the final plan
item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
- Added plan delta route in app-server protocol common mapping.
(`codex-rs/app-server-protocol/src/protocol/common.rs`)
- Rebuild plan items from persisted `ItemCompleted` events on resume.
(`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)

### App-server
- Forward plan deltas to v2 clients and map core plan items to v2 plan
items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
`codex-rs/app-server/src/codex_message_processor.rs`)
- Added v2 plan item tests.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)

### TUI
- Added a dedicated proposed plan history cell with special background
and padding, and moved “• Proposed Plan” outside the highlighted block.
(`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
- Only show “Implement this plan?” when a plan item exists.
(`codex-rs/tui/src/chatwidget.rs`,
`codex-rs/tui/src/chatwidget/tests.rs`)

<img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
/>

### Docs / Misc
- Updated protocol docs to mention plan deltas.
(`codex-rs/docs/protocol_v1.md`)
- Minor plumbing updates in exec/debug clients to tolerate plan deltas.
(`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)

## Tests
- Added core integration tests:
  - Plan mode strips plan from agent messages.
  - Missing `</proposed_plan>` closes at end-of-message.  
  (`codex-rs/core/tests/suite/items.rs`)
- Added unit tests for generic tag parser (prefix buffering, non-tag
lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
- Existing app-server plan item tests in v2.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)

## Notes / Behavior
- Plan output no longer appears in standard assistant text in Plan Mode;
it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
- The final plan item content is authoritative and may diverge from
streamed deltas (documented as experimental).
- Reasoning summaries are not filtered; prompt instructs the model not
to include `<proposed_plan>` outside the final plan message.

## Codex Author
`codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
This commit is contained in:
Charley Cunningham
2026-01-30 10:59:30 -08:00
committed by GitHub
parent 40bf11bd52
commit ec4a2d07e4
36 changed files with 2021 additions and 42 deletions

View File

@@ -30,6 +30,7 @@ use crate::rollout::session_index;
use crate::stream_events_utils::HandleOutputCtx;
use crate::stream_events_utils::handle_non_tool_response_item;
use crate::stream_events_utils::handle_output_item_done;
use crate::stream_events_utils::last_assistant_message_from_item;
use crate::terminal;
use crate::transport_manager::TransportManager;
use crate::truncate::TruncationPolicy;
@@ -44,6 +45,7 @@ use codex_protocol::config_types::Settings;
use codex_protocol::config_types::WebSearchMode;
use codex_protocol::dynamic_tools::DynamicToolResponse;
use codex_protocol::dynamic_tools::DynamicToolSpec;
use codex_protocol::items::PlanItem;
use codex_protocol::items::TurnItem;
use codex_protocol::items::UserMessageItem;
use codex_protocol::models::BaseInstructions;
@@ -127,6 +129,9 @@ use crate::mentions::collect_explicit_app_paths;
use crate::mentions::collect_tool_mentions_from_messages;
use crate::model_provider_info::CHAT_WIRE_API_DEPRECATION_SUMMARY;
use crate::project_doc::get_user_instructions;
use crate::proposed_plan_parser::ProposedPlanParser;
use crate::proposed_plan_parser::ProposedPlanSegment;
use crate::proposed_plan_parser::extract_proposed_plan_text;
use crate::protocol::AgentMessageContentDeltaEvent;
use crate::protocol::AgentReasoningSectionBreakEvent;
use crate::protocol::ApplyPatchApprovalRequestEvent;
@@ -139,6 +144,7 @@ use crate::protocol::EventMsg;
use crate::protocol::ExecApprovalRequestEvent;
use crate::protocol::McpServerRefreshConfig;
use crate::protocol::Op;
use crate::protocol::PlanDeltaEvent;
use crate::protocol::RateLimitSnapshot;
use crate::protocol::ReasoningContentDeltaEvent;
use crate::protocol::ReasoningRawContentDeltaEvent;
@@ -482,6 +488,7 @@ pub(crate) struct TurnContext {
pub(crate) developer_instructions: Option<String>,
pub(crate) compact_prompt: Option<String>,
pub(crate) user_instructions: Option<String>,
pub(crate) collaboration_mode_kind: ModeKind,
pub(crate) personality: Option<Personality>,
pub(crate) approval_policy: AskForApproval,
pub(crate) sandbox_policy: SandboxPolicy,
@@ -682,6 +689,7 @@ impl Session {
developer_instructions: session_configuration.developer_instructions.clone(),
compact_prompt: session_configuration.compact_prompt.clone(),
user_instructions: session_configuration.user_instructions.clone(),
collaboration_mode_kind: session_configuration.collaboration_mode.mode,
personality: session_configuration.personality,
approval_policy: session_configuration.approval_policy.value(),
sandbox_policy: session_configuration.sandbox_policy.get().clone(),
@@ -3196,6 +3204,7 @@ async fn spawn_review_thread(
developer_instructions: None,
user_instructions: None,
compact_prompt: parent_turn_context.compact_prompt.clone(),
collaboration_mode_kind: parent_turn_context.collaboration_mode_kind,
personality: parent_turn_context.personality,
approval_policy: parent_turn_context.approval_policy,
sandbox_policy: parent_turn_context.sandbox_policy.clone(),
@@ -3310,6 +3319,7 @@ pub(crate) async fn run_turn(
let total_usage_tokens = sess.get_total_token_usage().await;
let event = EventMsg::TurnStarted(TurnStartedEvent {
model_context_window: turn_context.client.get_model_context_window(),
collaboration_mode_kind: turn_context.collaboration_mode_kind,
});
sess.send_event(&turn_context, event).await;
if total_usage_tokens >= auto_compact_limit {
@@ -3759,6 +3769,381 @@ struct SamplingRequestResult {
last_agent_message: Option<String>,
}
/// Ephemeral per-response state for streaming a single proposed plan.
/// This is intentionally not persisted or stored in session/state since it
/// only exists while a response is actively streaming. The final plan text
/// is extracted from the completed assistant message.
/// Tracks a single proposed plan item across a streaming response.
struct ProposedPlanItemState {
item_id: String,
started: bool,
completed: bool,
}
/// Per-item plan parsers so we can buffer text while detecting `<proposed_plan>`
/// tags without ever mixing buffered lines across item ids.
struct PlanParsers {
assistant: HashMap<String, ProposedPlanParser>,
}
impl PlanParsers {
fn new() -> Self {
Self {
assistant: HashMap::new(),
}
}
fn assistant_parser_mut(&mut self, item_id: &str) -> &mut ProposedPlanParser {
self.assistant
.entry(item_id.to_string())
.or_insert_with(ProposedPlanParser::new)
}
fn take_assistant_parser(&mut self, item_id: &str) -> Option<ProposedPlanParser> {
self.assistant.remove(item_id)
}
fn drain_assistant_parsers(&mut self) -> Vec<(String, ProposedPlanParser)> {
self.assistant.drain().collect()
}
}
/// Aggregated state used only while streaming a plan-mode response.
/// Includes per-item parsers, deferred agent message bookkeeping, and the plan item lifecycle.
struct PlanModeStreamState {
/// Per-item parsers for assistant streams in plan mode.
plan_parsers: PlanParsers,
/// Agent message items started by the model but deferred until we see non-plan text.
pending_agent_message_items: HashMap<String, TurnItem>,
/// Agent message items whose start notification has been emitted.
started_agent_message_items: HashSet<String>,
/// Leading whitespace buffered until we see non-whitespace text for an item.
leading_whitespace_by_item: HashMap<String, String>,
/// Tracks plan item lifecycle while streaming plan output.
plan_item_state: ProposedPlanItemState,
}
impl PlanModeStreamState {
fn new(turn_id: &str) -> Self {
Self {
plan_parsers: PlanParsers::new(),
pending_agent_message_items: HashMap::new(),
started_agent_message_items: HashSet::new(),
leading_whitespace_by_item: HashMap::new(),
plan_item_state: ProposedPlanItemState::new(turn_id),
}
}
}
impl ProposedPlanItemState {
fn new(turn_id: &str) -> Self {
Self {
item_id: format!("{turn_id}-plan"),
started: false,
completed: false,
}
}
async fn start(&mut self, sess: &Session, turn_context: &TurnContext) {
if self.started || self.completed {
return;
}
self.started = true;
let item = TurnItem::Plan(PlanItem {
id: self.item_id.clone(),
text: String::new(),
});
sess.emit_turn_item_started(turn_context, &item).await;
}
async fn push_delta(&mut self, sess: &Session, turn_context: &TurnContext, delta: &str) {
if self.completed {
return;
}
if delta.is_empty() {
return;
}
let event = PlanDeltaEvent {
thread_id: sess.conversation_id.to_string(),
turn_id: turn_context.sub_id.clone(),
item_id: self.item_id.clone(),
delta: delta.to_string(),
};
sess.send_event(turn_context, EventMsg::PlanDelta(event))
.await;
}
async fn complete_with_text(
&mut self,
sess: &Session,
turn_context: &TurnContext,
text: String,
) {
if self.completed || !self.started {
return;
}
self.completed = true;
let item = TurnItem::Plan(PlanItem {
id: self.item_id.clone(),
text,
});
sess.emit_turn_item_completed(turn_context, item).await;
}
}
/// In plan mode we defer agent message starts until the parser emits non-plan
/// text. The parser buffers each line until it can rule out a tag prefix, so
/// plan-only outputs never show up as empty assistant messages.
async fn maybe_emit_pending_agent_message_start(
sess: &Session,
turn_context: &TurnContext,
state: &mut PlanModeStreamState,
item_id: &str,
) {
if state.started_agent_message_items.contains(item_id) {
return;
}
if let Some(item) = state.pending_agent_message_items.remove(item_id) {
sess.emit_turn_item_started(turn_context, &item).await;
state
.started_agent_message_items
.insert(item_id.to_string());
}
}
/// Agent messages are text-only today; concatenate all text entries.
fn agent_message_text(item: &codex_protocol::items::AgentMessageItem) -> String {
item.content
.iter()
.map(|entry| match entry {
codex_protocol::items::AgentMessageContent::Text { text } => text.as_str(),
})
.collect()
}
/// Split the stream into normal assistant text vs. proposed plan content.
/// Normal text becomes AgentMessage deltas; plan content becomes PlanDelta +
/// TurnItem::Plan.
async fn handle_plan_segments(
sess: &Session,
turn_context: &TurnContext,
state: &mut PlanModeStreamState,
item_id: &str,
segments: Vec<ProposedPlanSegment>,
) {
for segment in segments {
match segment {
ProposedPlanSegment::Normal(delta) => {
if delta.is_empty() {
continue;
}
let has_non_whitespace = delta.chars().any(|ch| !ch.is_whitespace());
if !has_non_whitespace && !state.started_agent_message_items.contains(item_id) {
let entry = state
.leading_whitespace_by_item
.entry(item_id.to_string())
.or_default();
entry.push_str(&delta);
continue;
}
let delta = if !state.started_agent_message_items.contains(item_id) {
if let Some(prefix) = state.leading_whitespace_by_item.remove(item_id) {
format!("{prefix}{delta}")
} else {
delta
}
} else {
delta
};
maybe_emit_pending_agent_message_start(sess, turn_context, state, item_id).await;
let event = AgentMessageContentDeltaEvent {
thread_id: sess.conversation_id.to_string(),
turn_id: turn_context.sub_id.clone(),
item_id: item_id.to_string(),
delta,
};
sess.send_event(turn_context, EventMsg::AgentMessageContentDelta(event))
.await;
}
ProposedPlanSegment::ProposedPlanStart => {
if !state.plan_item_state.completed {
state.plan_item_state.start(sess, turn_context).await;
}
}
ProposedPlanSegment::ProposedPlanDelta(delta) => {
if !state.plan_item_state.completed {
if !state.plan_item_state.started {
state.plan_item_state.start(sess, turn_context).await;
}
state
.plan_item_state
.push_delta(sess, turn_context, &delta)
.await;
}
}
ProposedPlanSegment::ProposedPlanEnd => {}
}
}
}
/// Flush any buffered proposed-plan segments when a specific assistant message ends.
async fn flush_proposed_plan_segments_for_item(
sess: &Session,
turn_context: &TurnContext,
state: &mut PlanModeStreamState,
item_id: &str,
) {
let Some(mut parser) = state.plan_parsers.take_assistant_parser(item_id) else {
return;
};
let segments = parser.finish();
if segments.is_empty() {
return;
}
handle_plan_segments(sess, turn_context, state, item_id, segments).await;
}
/// Flush any remaining assistant plan parsers when the response completes.
async fn flush_proposed_plan_segments_all(
sess: &Session,
turn_context: &TurnContext,
state: &mut PlanModeStreamState,
) {
for (item_id, mut parser) in state.plan_parsers.drain_assistant_parsers() {
let segments = parser.finish();
if segments.is_empty() {
continue;
}
handle_plan_segments(sess, turn_context, state, &item_id, segments).await;
}
}
/// Emit completion for plan items by parsing the finalized assistant message.
async fn maybe_complete_plan_item_from_message(
sess: &Session,
turn_context: &TurnContext,
state: &mut PlanModeStreamState,
item: &ResponseItem,
) {
if let ResponseItem::Message { role, content, .. } = item
&& role == "assistant"
{
let mut text = String::new();
for entry in content {
if let ContentItem::OutputText { text: chunk } = entry {
text.push_str(chunk);
}
}
if let Some(plan_text) = extract_proposed_plan_text(&text) {
if !state.plan_item_state.started {
state.plan_item_state.start(sess, turn_context).await;
}
state
.plan_item_state
.complete_with_text(sess, turn_context, plan_text)
.await;
}
}
}
/// Emit a completed agent message in plan mode, respecting deferred starts.
async fn emit_agent_message_in_plan_mode(
sess: &Session,
turn_context: &TurnContext,
agent_message: codex_protocol::items::AgentMessageItem,
state: &mut PlanModeStreamState,
) {
let agent_message_id = agent_message.id.clone();
let text = agent_message_text(&agent_message);
if text.trim().is_empty() {
state.pending_agent_message_items.remove(&agent_message_id);
state.started_agent_message_items.remove(&agent_message_id);
return;
}
maybe_emit_pending_agent_message_start(sess, turn_context, state, &agent_message_id).await;
if !state
.started_agent_message_items
.contains(&agent_message_id)
{
let start_item = state
.pending_agent_message_items
.remove(&agent_message_id)
.unwrap_or_else(|| {
TurnItem::AgentMessage(codex_protocol::items::AgentMessageItem {
id: agent_message_id.clone(),
content: Vec::new(),
})
});
sess.emit_turn_item_started(turn_context, &start_item).await;
state
.started_agent_message_items
.insert(agent_message_id.clone());
}
sess.emit_turn_item_completed(turn_context, TurnItem::AgentMessage(agent_message))
.await;
state.started_agent_message_items.remove(&agent_message_id);
}
/// Emit completion for a plan-mode turn item, handling agent messages specially.
async fn emit_turn_item_in_plan_mode(
sess: &Session,
turn_context: &TurnContext,
turn_item: TurnItem,
previously_active_item: Option<&TurnItem>,
state: &mut PlanModeStreamState,
) {
match turn_item {
TurnItem::AgentMessage(agent_message) => {
emit_agent_message_in_plan_mode(sess, turn_context, agent_message, state).await;
}
_ => {
if previously_active_item.is_none() {
sess.emit_turn_item_started(turn_context, &turn_item).await;
}
sess.emit_turn_item_completed(turn_context, turn_item).await;
}
}
}
/// Handle a completed assistant response item in plan mode, returning true if handled.
async fn handle_assistant_item_done_in_plan_mode(
sess: &Session,
turn_context: &TurnContext,
item: &ResponseItem,
state: &mut PlanModeStreamState,
previously_active_item: Option<&TurnItem>,
last_agent_message: &mut Option<String>,
) -> bool {
if let ResponseItem::Message { role, .. } = item
&& role == "assistant"
{
maybe_complete_plan_item_from_message(sess, turn_context, state, item).await;
if let Some(turn_item) = handle_non_tool_response_item(item, true).await {
emit_turn_item_in_plan_mode(
sess,
turn_context,
turn_item,
previously_active_item,
state,
)
.await;
}
sess.record_conversation_items(turn_context, std::slice::from_ref(item))
.await;
if let Some(agent_message) = last_assistant_message_from_item(item, true) {
*last_agent_message = Some(agent_message);
}
return true;
}
false
}
async fn drain_in_flight(
in_flight: &mut FuturesOrdered<BoxFuture<'static, CodexResult<ResponseInputItem>>>,
sess: Arc<Session>,
@@ -3795,10 +4180,6 @@ async fn try_run_sampling_request(
prompt: &Prompt,
cancellation_token: CancellationToken,
) -> CodexResult<SamplingRequestResult> {
// TODO: If we need to guarantee the persisted mode always matches the prompt used for this
// turn, capture it in TurnContext at creation time. Using SessionConfiguration here avoids
// duplicating model settings on TurnContext, but a later Op could update the session config
// before this write occurs.
let collaboration_mode = sess.current_collaboration_mode().await;
let rollout_item = RolloutItem::TurnContext(TurnContextItem {
cwd: turn_context.cwd.clone(),
@@ -3843,6 +4224,8 @@ async fn try_run_sampling_request(
let mut last_agent_message: Option<String> = None;
let mut active_item: Option<TurnItem> = None;
let mut should_emit_turn_diff = false;
let plan_mode = turn_context.collaboration_mode_kind == ModeKind::Plan;
let mut plan_mode_state = plan_mode.then(|| PlanModeStreamState::new(&turn_context.sub_id));
let receiving_span = trace_span!("receiving_stream");
let outcome: CodexResult<SamplingRequestResult> = loop {
let handle_responses = trace_span!(
@@ -3881,6 +4264,33 @@ async fn try_run_sampling_request(
ResponseEvent::Created => {}
ResponseEvent::OutputItemDone(item) => {
let previously_active_item = active_item.take();
if let Some(state) = plan_mode_state.as_mut() {
if let Some(previous) = previously_active_item.as_ref() {
let item_id = previous.id();
if matches!(previous, TurnItem::AgentMessage(_)) {
flush_proposed_plan_segments_for_item(
&sess,
&turn_context,
state,
&item_id,
)
.await;
}
}
if handle_assistant_item_done_in_plan_mode(
&sess,
&turn_context,
&item,
state,
previously_active_item.as_ref(),
&mut last_agent_message,
)
.await
{
continue;
}
}
let mut ctx = HandleOutputCtx {
sess: sess.clone(),
turn_context: turn_context.clone(),
@@ -3900,8 +4310,17 @@ async fn try_run_sampling_request(
needs_follow_up |= output_result.needs_follow_up;
}
ResponseEvent::OutputItemAdded(item) => {
if let Some(turn_item) = handle_non_tool_response_item(&item).await {
sess.emit_turn_item_started(&turn_context, &turn_item).await;
if let Some(turn_item) = handle_non_tool_response_item(&item, plan_mode).await {
if let Some(state) = plan_mode_state.as_mut()
&& matches!(turn_item, TurnItem::AgentMessage(_))
{
let item_id = turn_item.id();
state
.pending_agent_message_items
.insert(item_id, turn_item.clone());
} else {
sess.emit_turn_item_started(&turn_context, &turn_item).await;
}
active_item = Some(turn_item);
}
}
@@ -3925,6 +4344,9 @@ async fn try_run_sampling_request(
response_id: _,
token_usage,
} => {
if let Some(state) = plan_mode_state.as_mut() {
flush_proposed_plan_segments_all(&sess, &turn_context, state).await;
}
sess.update_token_usage_info(&turn_context, token_usage.as_ref())
.await;
should_emit_turn_diff = true;
@@ -3940,14 +4362,25 @@ async fn try_run_sampling_request(
// In review child threads, suppress assistant text deltas; the
// UI will show a selection popup from the final ReviewOutput.
if let Some(active) = active_item.as_ref() {
let event = AgentMessageContentDeltaEvent {
thread_id: sess.conversation_id.to_string(),
turn_id: turn_context.sub_id.clone(),
item_id: active.id(),
delta: delta.clone(),
};
sess.send_event(&turn_context, EventMsg::AgentMessageContentDelta(event))
.await;
let item_id = active.id();
if let Some(state) = plan_mode_state.as_mut()
&& matches!(active, TurnItem::AgentMessage(_))
{
let segments = state
.plan_parsers
.assistant_parser_mut(&item_id)
.parse(&delta);
handle_plan_segments(&sess, &turn_context, state, &item_id, segments).await;
} else {
let event = AgentMessageContentDeltaEvent {
thread_id: sess.conversation_id.to_string(),
turn_id: turn_context.sub_id.clone(),
item_id,
delta,
};
sess.send_event(&turn_context, EventMsg::AgentMessageContentDelta(event))
.await;
}
} else {
error_or_panic("OutputTextDelta without active item".to_string());
}