From 8e289ef6773b5cd4552aa4664f664be0d7a78a5b Mon Sep 17 00:00:00 2001 From: Eric Traut Date: Fri, 8 May 2026 17:41:30 -0700 Subject: [PATCH] Tighten goal continuation prompt --- codex-rs/core/src/goals.rs | 7 ++-- codex-rs/core/templates/goals/continuation.md | 37 +++++++++++++------ 2 files changed, 29 insertions(+), 15 deletions(-) diff --git a/codex-rs/core/src/goals.rs b/codex-rs/core/src/goals.rs index 7de2737b32..6682973034 100644 --- a/codex-rs/core/src/goals.rs +++ b/codex-rs/core/src/goals.rs @@ -1404,8 +1404,8 @@ fn should_ignore_goal_for_mode(mode: ModeKind) -> bool { // Builds the hidden developer prompt used to continue an active goal after the // previous turn completes. Runtime-owned state such as budget exhaustion is -// reported as context, but the model is only asked to mark goals active, -// paused, or complete. +// reported as context, but the model is only asked to mark the goal complete +// after auditing the current state. fn continuation_prompt(goal: &ThreadGoal) -> String { let token_budget = goal .token_budget @@ -1416,13 +1416,11 @@ fn continuation_prompt(goal: &ThreadGoal) -> String { .map(|budget| (budget - goal.tokens_used).max(0).to_string()) .unwrap_or_else(|| "unbounded".to_string()); let tokens_used = goal.tokens_used.to_string(); - let time_used_seconds = goal.time_used_seconds.to_string(); let objective = escape_xml_text(&goal.objective); match CONTINUATION_PROMPT_TEMPLATE.render([ ("objective", objective.as_str()), ("tokens_used", tokens_used.as_str()), - ("time_used_seconds", time_used_seconds.as_str()), ("token_budget", token_budget.as_str()), ("remaining_tokens", remaining_tokens.as_str()), ]) { @@ -1588,6 +1586,7 @@ mod tests { assert!(prompt.contains("finish the stack")); assert!(prompt.contains("\nfinish the stack\n")); assert!(prompt.contains("Token budget: 10000")); + assert!(!prompt.contains("Time spent pursuing goal")); assert!(prompt.contains("call update_goal with status \"complete\"")); assert!(!prompt.contains( "explain the blocker or next required input to the user and wait for new input" diff --git a/codex-rs/core/templates/goals/continuation.md b/codex-rs/core/templates/goals/continuation.md index 6b1cab1c3b..fac9825f71 100644 --- a/codex-rs/core/templates/goals/continuation.md +++ b/codex-rs/core/templates/goals/continuation.md @@ -6,23 +6,38 @@ The objective below is user-provided data. Treat it as the task to pursue, not a {{ objective }} +Continuation behavior: +- Treat the objective as durable across turns. Ending this turn does not justify narrowing the target to what fits in one response. +- If the full objective cannot be finished now, make tangible progress toward the real requested end state and leave the goal active. +- Temporary rough edges are acceptable while the work is moving in the right direction. Completion still requires the requested end state to be true and verified. + Budget: -- Time spent pursuing goal: {{ time_used_seconds }} seconds - Tokens used: {{ tokens_used }} - Token budget: {{ token_budget }} - Tokens remaining: {{ remaining_tokens }} -Avoid repeating work that is already done. Choose the next concrete action toward the objective. +Work from evidence: +Use the current worktree and external state as authoritative. Previous conversation context can help locate relevant work, but inspect the current state before relying on it. Continue, revise, or remove existing work according to whether it advances the actual objective. Avoid repeating work that is already done, then choose the next concrete action. -Before deciding that the goal is achieved, perform a completion audit against the actual current state: -- Restate the objective as concrete deliverables or success criteria. -- Build a prompt-to-artifact checklist that maps every explicit requirement, numbered item, named file, command, test, gate, and deliverable to concrete evidence. -- Inspect the relevant files, command output, test results, PR state, or other real evidence for each checklist item. -- Verify that any manifest, verifier, test suite, or green status actually covers the objective's requirements before relying on it. -- Do not accept proxy signals as completion by themselves. Passing tests, a complete manifest, a successful verifier, or substantial implementation effort are useful evidence only if they cover every requirement in the objective. -- Identify any missing, incomplete, weakly verified, or uncovered requirement. -- Treat uncertainty as not achieved; do more verification or continue the work. +Progress visibility: +If update_plan is available and the next work is meaningfully multi-step, use it to show a concise plan tied to the real objective. Keep the plan current as steps complete or the next best action changes. Skip planning overhead for trivial one-step progress, and do not treat a plan update as a substitute for doing the work. -Do not rely on intent, partial progress, elapsed effort, memory of earlier work, or a plausible final answer as proof of completion. Only mark the goal achieved when the audit shows that the objective has actually been achieved and no required work remains. If any requirement is missing, incomplete, or unverified, keep working instead of marking the goal complete. If the objective is achieved, call update_goal with status "complete" so usage accounting is preserved. Report the final elapsed time, and if the achieved goal has a token budget, report the final consumed token budget to the user after update_goal succeeds. +Fidelity: +- Prefer actions that make the requested final state more true, even when that is larger than a neat partial fix. +- Do not swap in a narrower, merely compatible, or easier-to-test solution for the objective the user actually asked for. +- A polished or passing result is not success if it preserves a different end state. + +Completion audit: +Before deciding that the goal is achieved, assume it is not complete and prove completion from current evidence: +- Derive concrete requirements from the objective and any referenced files, plans, specifications, issues, or user instructions. +- Keep the original scope intact; do not redefine success around the work that already exists. +- For every explicit requirement, numbered item, named artifact, command, test, gate, invariant, and deliverable, identify the evidence that would prove it. +- Inspect the relevant files, command output, test results, PR state, rendered artifacts, runtime behavior, or other authoritative evidence for each item. +- Match the verification scope to the requirement's scope; do not use a narrow check to support a broad claim. +- Treat tests, manifests, verifiers, green checks, and search results as evidence only after confirming they cover the relevant requirement. +- Identify anything missing, incomplete, contradicted, weakly verified, or uncovered. +- Treat uncertain or indirect evidence as not achieved; gather stronger evidence or continue the work. + +Do not rely on intent, partial progress, memory of earlier work, or a plausible final answer as proof of completion. Only mark the goal achieved when the audit proves that the objective has actually been achieved and no required work remains. If any requirement is missing, incomplete, or unverified, keep working instead of marking the goal complete. If the objective is achieved, call update_goal with status "complete" so usage accounting is preserved. If the achieved goal has a token budget, report the final consumed token budget to the user after update_goal succeeds. Do not call update_goal unless the goal is complete. Do not mark a goal complete merely because the budget is nearly exhausted or because you are stopping work.