## Summary
- Tighten `tool_suggest` guidance so it prefers explicit plugin install
requests, while still allowing a connector install when the relevant
plugin is already installed and a needed connector from that plugin is
missing.
- Tell the model not to call `tool_suggest` in parallel with other
tools.
## Testing
- `cargo test -p codex-tools tool_suggest`
- `cargo test -p codex-core tool_suggest`
## Why
This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc
The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
As a follow-up we will need to add support for user's edition while
`morpheus` is running
## What Changed
- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.
## Verification
- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.
---------
Co-authored-by: Codex <noreply@openai.com>
Adds the core runtime behavior for active goals on top of the model
tools from PR 3.
## Why
A long-running goal should be a core runtime concern, not something
every client has to implement. Core owns the turn lifecycle, tool
completion boundaries, interruptions, resume behavior, and token usage,
so it is the right place to account progress, enforce budgets, and
decide when to continue work.
## What changed
- Centralized goal lifecycle side effects behind
`Session::goal_runtime_apply(GoalRuntimeEvent::...)`.
- Starts goal continuation turns only when the session is idle; pending
user input and mailbox work take priority.
- Accounts token and wall-clock usage at turn, tool, mutation,
interrupt, and resume boundaries; `get_thread_goal` remains read-only.
- Preserves sub-second wall-clock remainder across accounting boundaries
so long-running goals do not drift downward over time.
- Treats token budget exhaustion as a soft stop by marking the goal
`budget_limited` and injecting wrap-up steering instead of aborting the
active turn.
- Suppresses budget steering when `update_goal` marks a goal complete.
- Pauses active goals on interrupt and auto-reactivates paused goals
when a thread resumes outside plan mode.
- Suppresses repeated automatic continuation when a continuation turn
makes no tool calls.
- Added continuation and budget-limit prompt templates.
## Verification
- Added focused core coverage for continuation scheduling, accounting
boundaries, budget-limit steering, completion accounting, interrupt
pause behavior, resume auto-activation, and wall-clock remainder
accounting.
## Summary
- set the realtime v2 server VAD silence delay to 500ms
- update the default realtime 1.5 backend prompt to the v4 text
- keep the session payload and prompt rendering tests aligned with those
changes
## Why
- the VAD change gives the voice path a longer pause before ending the
user's turn
- the prompt change makes the default bundled realtime prompt match the
current v4 content
## Validation
- `cargo +1.93.0 test -p codex-core realtime_prompt --manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-api
realtime_v2_session_update_includes_background_agent_tool_and_handoff_output_item
--manifest-path
/tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml`
- `CARGO_TARGET_DIR=/tmp/codex-pr-v4-target cargo +1.93.0 test -p
codex-app-server --test all
'suite::v2::realtime_conversation::realtime_webrtc_start_emits_sdp_notification'
--manifest-path /tmp/codex-realtime-v2-vad-prompt-v4/codex-rs/Cargo.toml
-- --exact`
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
Encourages realtime prompt handling to delegate user requests to the
backend agent by default when repo inspection, commands, implementation,
or validation may help.
Co-authored-by: Codex <noreply@openai.com>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
- Adds a core-owned realtime backend prompt template and preparation
path.
- Makes omitted realtime start prompts use the core default, while null
or empty prompts intentionally send empty instructions.
- Covers the core realtime path and app-server v2 path with integration
coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- split `models-manager` out of `core` and add `ModelsManagerConfig`
plus `Config::to_models_manager_config()` so model metadata paths stop
depending on `core::Config`
- move login-owned/auth-owned code out of `core` into `codex-login`,
move model provider config into `codex-model-provider-info`, move API
bridge mapping into `codex-api`, move protocol-owned types/impls into
`codex-protocol`, and move response debug helpers into a dedicated
`response-debug-context` crate
- move feedback tag emission into `codex-feedback`, relocate tests to
the crates that now own the code, and keep broad temporary re-exports so
this PR avoids a giant import-only rewrite
## Major moves and decisions
- created `codex-models-manager` as the owner for model
cache/catalog/config/model info logic, including the new
`ModelsManagerConfig` struct
- created `codex-model-provider-info` as the owner for provider config
parsing/defaults and kept temporary `codex-login`/`codex-core`
re-exports for old import paths
- moved `api_bridge` error mapping + `CoreAuthProvider` into
`codex-api`, while `codex-login::api_bridge` temporarily re-exports
those symbols and keeps the `auth_provider_from_auth` wrapper
- moved `auth_env_telemetry` and `provider_auth` ownership to
`codex-login`
- moved `CodexErr` ownership to `codex-protocol::error`, plus
`StreamOutput`, `bytes_to_string_smart`, and network policy helpers to
protocol-owned modules
- created `codex-response-debug-context` for
`extract_response_debug_context`, `telemetry_transport_error_message`,
and related response-debug plumbing instead of leaving that behavior in
`core`
- moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and
`emit_feedback_request_tags_with_auth_env` to `codex-feedback`
- deferred removal of temporary re-exports and the mechanical import
rewrites to a stacked follow-up PR so this PR stays reviewable
## Test moves
- moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to
`login/tests/suite/auth_refresh.rs`
- moved text encoding coverage from
`core/tests/suite/text_encoding_fix.rs` to
`protocol/src/exec_output_tests.rs`
- moved model info override coverage from
`core/tests/suite/model_info_overrides.rs` to
`models-manager/src/model_info_overrides_tests.rs`
---------
Co-authored-by: Codex <noreply@openai.com>
- [x] Polish tool suggest prompts to distinguish between missing
connectors and discoverable plugins, and be very precise about the
triggering conditions.
- rename the multi-agent tool name the model sees to wait_agent
- update the model-facing prompts and tool descriptions to match
---------
Co-authored-by: Codex <noreply@openai.com>
- [x] Add mentions of connectors because model always think in connector
terms in its CoT.
- [x] Suppress list_mcp_resources in favor of tool search for available
apps.
## Summary
- update `codex-rs/core/templates/memories/stage_one_system.md` so phase
1 captures stronger user-preference signals, richer task summaries, and
cwd provenance without branch-specific fields
- update `codex-rs/core/templates/memories/consolidation.md` so phase 2
keeps separate sections for user preferences, reusable knowledge, and
failure shields while staying cwd-aware but branchless
- document the `codex` prompt-template maintenance rule in
`codex-rs/core/src/memories/README.md`: the undated templates are
canonical here and should be edited in place
## Testing
- cargo test -p codex-core memories --manifest-path codex-rs/Cargo.toml
## Why
to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.
## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
## Why
- tighten Codex memory-read behavior around stale facts and conflicting
memory
- encode the risk-of-drift vs verification-effort decision rule directly
in the read-path prompt
- make partial stale-detail updates explicit so correcting only the
answer is not treated as sufficient
## What changed
- update `codex-rs/core/templates/memories/read_path.md`
- add guidance for when to verify cheap local facts vs when to answer
from older memory with visible provenance
- strengthen same-turn `MEMORY.md` updates when stored concrete details
are stale
## Notes
- this is based on some staleness eval work
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags
## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
Add a stream parser to extract citations (and others) from a stream.
This support cases where markers are split in differen tokens.
Codex never manage to make this code work so everything was done
manually. Please review correctly and do not touch this part of the code
without a very clear understanding of it
## Summary
- tighten the memory-use decision boundary so agents skip memory only
for clearly self-contained asks
- make the quick memory pass more explicit and bounded (including a
lightweight search budget)
- add structured `<memory_citation>` requirements and examples for final
replies
- clarify memory update guidance and end-state wording for memory lookup
## Why
The previous template was directionally correct, but still left room for
inconsistent memory lookup behavior and citation formatting. This change
makes the default behavior, quick-pass scope, and citation output
contract much more explicit.
## Testing
- not run (prompt/template text change only)
Co-authored-by: jif-oai <jif@openai.com>
## Summary
- Add `rollout_summary_file: <generated>.md` to each thread header in
`raw_memories.md` so Phase 2 can reliably reference the canonical
rollout summary filename.
- Update the memory prompts/templates (`stage_one_system`,
`consolidation`, `read_path`) for the new task-oriented raw-memory /
MEMORY.md schema and stronger consolidation guidance.
## Details
- `codex-rs/core/src/memories/storage.rs`
- Writes the generated `rollout_summary_file` path into the per-thread
metadata header when rebuilding `raw_memories.md`.
- `codex-rs/core/src/memories/tests.rs`
- Verifies the canonical `rollout_summary_file` header is present and
ordered after `updated_at`/`cwd` in `raw_memories.md`.
- Verifies task-structured raw-memory content is preserved while the
canonical header is added.
- `codex-rs/core/templates/memories/*.md`
- Updates the stage-1 raw-memory format to task-grouped sections
(`task`, `task_group`, `task_outcome`).
- Updates Phase 2 consolidation guidance around recency (`updated_at`),
task-oriented `MEMORY.md` blocks, and richer evidence-backed
consolidation.
- Tweaks the quick memory pass wording to emphasize topics/workflows in
addition to keywords.
## Testing
- `cargo test -p codex-core memories`
## Summary
- Require revised `<proposed_plan>` blocks in the same planning session
to be complete replacements, not partial/delta plans.
- Scope that cumulative replacement rule to the current planning session
only.
- Clarify that after leaving Plan mode (for example switching to Default
mode to implement) or when explicitly asked for a new plan, the model
should produce a new self-contained plan without inheriting prior plan
blocks unless requested.
## Testing
- Not run (prompt/template text-only change).