## Why
Users have requested the ability to edit a goal's objective after a goal
has been created. This PR exposes a new `/goal edit` command in the TUI
to address this request.
In the process of implementing this, I also noticed an existing bug in
the goal runtime. When a goal's objective is updated through the
`thread/goal/set` app server API, the goal runtime didn't emit a new
steering prompt to tell the agent about the new objective. This PR also
fixes this hole.
## What Changed
- Adds `/goal edit` in the TUI, opening an edit box prefilled with the
current goal objective.
- Keeps active and paused goals in their current state, resets completed
goals to active, keeps budget-limited goals budget-limited, and
preserves the existing token budget.
- Changes the existing `thread/goal/set` behavior so editing an
objective preserves goal accounting instead of resetting it. The older
reset-on-new-objective behavior was left over from before
`thread/goal/clear`; clients that need to reset accounting can now clear
the existing goal and create a new one.
- Reuses the existing goal set API path; this does not add or change
app-server protocol surface area.
- Adds a dedicated goal runtime steering prompt when an externally
persisted goal mutation changes the objective, so active turns receive
the updated objective.
## Validation
- Make sure `/goal edit` returns an error if no goal currently exists
- Make sure `/goal edit` displays an edit box that can be optionally
canceled with no side effects
- Make sure that an edited goal results in a steer so the agent starts
pursuing the new objective
- Make sure the new objective is reflected in the goal if you use
`/goal` to display the goal summary
- Make sure that `/goal edit` doesn't reset the token budget, time/token
accounting on the updated goal
## Summary
This PR updates the goal continuation prompt to address feedback from
early adopters. There are two primary changes:
1. Goal continuation and budget-limit steering prompts now use hidden
user-context messages instead of hidden developer messages.
2. The goal continuation prompt is refined to improve the model's
ability to fully complete the active goal rather than stop at a smaller
or merely passing subset.
The user-message transition is important for two reasons. First, it
eliminates an issue where older steering messages could be responded to
again after a new turn. Second, it works better with compaction because
user messages are treated differently from developer messages during
compaction.
The prompt refinements make persistence explicit, ground work in current
evidence, encourage `update_plan` for multi-step progress visibility,
and require stronger completion audits before calling `update_goal`. It
also removes the elapsed-time reporting in the prompt; I saw evidence
that this was causing the model to shortcut work as it became nervous
about time.
These changes were tested with evals. Chriss4123 has also been running
independent evals in
[#19910](https://github.com/openai/codex/issues/19910), and many of the
improvements in this PR were suggested by him.
## Verification
- Tested with evals.
- Added and updated focused `codex-core` coverage for hidden goal user
context, continuation and budget-limit request shape, prompt rendering,
and objective delimiter escaping.
Tool suggest still misfires when model needs tool_search, updating the
prompts to further disambiguate it:
- [x] rename it from `tool_suggest` to `request_plugin_install`
- [x] rephrase "suggestion" to "install" in the tool descriptions.
- [x] disambiguate "the tool" vs "the plugin/connector".
Tested with the Codex App and verified it still works.
## Why
`/goal` is supposed to keep Codex working until the goal is actually
done. The previous continuation logic had two ways to stop early: the
continuation prompt told the model to wait for new input when it felt
blocked, and the runtime suppressed another continuation turn after a
continuation finished without any tool calls.
That made goals stop short even when the agent could still keep making
progress (I received a few reports of this from users). It also relied
on a brittle heuristic that treated "no registry tool calls" as
equivalent to "should stop."
## What changed
- removed the continuation prompt sentence that told the model to stop
and wait for new input when it could not continue productively
- removed the goal runtime suppression heuristic that stopped
auto-continuation after a no-tool continuation turn
- deleted the continuation-activity bookkeeping and left `tool_calls` as
telemetry only
- added focused regressions for the two intended behaviors: completed
no-tool continuation turns still continue, while `request_user_input`
keeps the existing turn open instead of spawning a new continuation
## Summary
- Tighten `tool_suggest` guidance so it prefers explicit plugin install
requests, while still allowing a connector install when the relevant
plugin is already installed and a needed connector from that plugin is
missing.
- Tell the model not to call `tool_suggest` in parallel with other
tools.
## Testing
- `cargo test -p codex-tools tool_suggest`
- `cargo test -p codex-core tool_suggest`
## Why
This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc
The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
As a follow-up we will need to add support for user's edition while
`morpheus` is running
## What Changed
- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.
## Verification
- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.
---------
Co-authored-by: Codex <noreply@openai.com>
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.
## Why
A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.
## What changed
- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.
## Verification
- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
## Summary
- set the realtime v2 server VAD silence delay to 500ms
- update the default realtime 1.5 backend prompt to the v4 text
- keep the session payload and prompt rendering tests aligned with those
changes
## Why
- the VAD change gives the voice path a longer pause before ending the
user's turn
- the prompt change makes the default bundled realtime prompt match the
current v4 content
## Validation
- `cargo +1.93.0 test -p codex-core realtime_prompt --manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
--manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-app-server --test all
'suite::v2::realtime_conversation::realtime_webrtc_start_emits_sdp_notification'
--manifest-path /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml
-- --exact`
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
Encourages realtime prompt handling to delegate user requests to the
backend agent by default when repo inspection, commands, implementation,
or validation may help.
Co-authored-by: Codex <noreply@openai.com>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
- Adds a core-owned realtime backend prompt template and preparation
path.
- Makes omitted realtime start prompts use the core default, while null
or empty prompts intentionally send empty instructions.
- Covers the core realtime path and app-server v2 path with integration
coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- split `models-manager` out of `core` and add `ModelsManagerConfig`
plus `Config::to_models_manager_config()` so model metadata paths stop
depending on `core::Config`
- move login-owned/auth-owned code out of `core` into `codex-login`,
move model provider config into `codex-model-provider-info`, move API
bridge mapping into `codex-api`, move protocol-owned types/impls into
`codex-protocol`, and move response debug helpers into a dedicated
`response-debug-context` crate
- move feedback tag emission into `codex-feedback`, relocate tests to
the crates that now own the code, and keep broad temporary re-exports so
this PR avoids a giant import-only rewrite
## Major moves and decisions
- created `codex-models-manager` as the owner for model
cache/catalog/config/model info logic, including the new
`ModelsManagerConfig` struct
- created `codex-model-provider-info` as the owner for provider config
parsing/defaults and kept temporary `codex-login`/`codex-core`
re-exports for old import paths
- moved `api_bridge` error mapping + `CoreAuthProvider` into
`codex-api`, while `codex-login::api_bridge` temporarily re-exports
those symbols and keeps the `auth_provider_from_auth` wrapper
- moved `auth_env_telemetry` and `provider_auth` ownership to
`codex-login`
- moved `CodexErr` ownership to `codex-protocol::error`, plus
`StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
protocol-owned modules
- created `codex-response-debug-context` for
`extract_response_debug_context`, `telemetry_transport_error_message`,
and related response-debug plumbing instead of leaving that behavior in
`core`
- moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
`emit_feedback_request_tags_with_auth_env` to `codex-feedback`
- deferred removal of temporary re-exports and the mechanical import
rewrites to a stacked follow-up PR so this PR stays reviewable
## Test moves
- moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
`login/tests/suite/auth_refresh.rs`
- moved text encoding coverage from
`core/tests/suite/text_encoding_fix.rs` to
`protocol/src/exec_output_tests.rs`
- moved model info override coverage from
`core/tests/suite/model_info_overrides.rs` to
`models-manager/src/model_info_overrides_tests.rs`
---------
Co-authored-by: Codex <noreply@openai.com>
- [x] Polish tool suggest prompts to distinguish between missing
connectors and discoverable plugins, and be very precise about the
triggering conditions.
- rename the multi-agent tool name the model sees to wait_agent
- update the model-facing prompts and tool descriptions to match
---------
Co-authored-by: Codex <noreply@openai.com>
- [x] Add mentions of connectors because model always think in connector
terms in its CoT.
- [x] Suppress list_mcp_resources in favor of tool search for available
apps.
## Summary
- update `codex-rs/core/templates/memories/stage_one_system.md` so phase
1 captures stronger user-preference signals, richer task summaries, and
cwd provenance without branch-specific fields
- update `codex-rs/core/templates/memories/consolidation.md` so phase 2
keeps separate sections for user preferences, reusable knowledge, and
failure shields while staying cwd-aware but branchless
- document the `codex` prompt-template maintenance rule in
`codex-rs/core/src/memories/README.md`: the undated templates are
canonical here and should be edited in place
## Testing
- cargo test -p codex-core memories --manifest-path codex-rs/Cargo.toml
## Why
to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.
## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
## Why
- tighten Codex memory-read behavior around stale facts and conflicting
memory
- encode the risk-of-drift vs verification-effort decision rule directly
in the read-path prompt
- make partial stale-detail updates explicit so correcting only the
answer is not treated as sufficient
## What changed
- update `codex-rs/core/templates/memories/read_path.md`
- add guidance for when to verify cheap local facts vs when to answer
from older memory with visible provenance
- strengthen same-turn `MEMORY.md` updates when stored concrete details
are stale
## Notes
- this is based on some staleness eval work
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags
## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
Add a stream parser to extract citations (and others) from a stream.
This support cases where markers are split in differen tokens.
Codex never manage to make this code work so everything was done
manually. Please review correctly and do not touch this part of the code
without a very clear understanding of it
## Summary
- tighten the memory-use decision boundary so agents skip memory only
for clearly self-contained asks
- make the quick memory pass more explicit and bounded (including a
lightweight search budget)
- add structured `<memory_citation>` requirements and examples for final
replies
- clarify memory update guidance and end-state wording for memory lookup
## Why
The previous template was directionally correct, but still left room for
inconsistent memory lookup behavior and citation formatting. This change
makes the default behavior, quick-pass scope, and citation output
contract much more explicit.
## Testing
- not run (prompt/template text change only)
Co-authored-by: jif-oai <jif@openai.com>