Commit Graph

1637 Commits

Author SHA1 Message Date
xl-openai
fdd0cd1de9 feat: support multiple rate limits (#11260)
Added multi-limit support end-to-end by carrying limit_name in
rate-limit snapshots and handling multiple buckets instead of only
codex.
Extended /usage client parsing to consume additional_rate_limits
Updated TUI /status and in-memory state to store/render per-limit
snapshots
Extended app-server rate-limit read response: kept rate_limits and added
rate_limits_by_name.
Adjusted usage-limit error messaging for non-default codex limit buckets
2026-02-10 20:09:31 -08:00
Celia Chen
641d5268fa chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.

This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.

This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.

example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```

backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.
2026-02-11 03:56:01 +00:00
pakrym-oai
4473147985 Do not resend output items in incremental websockets connections (#11383)
In the incremental websocket output items are already part of the
context, no need to send them again and duplicate.
2026-02-10 19:38:08 -08:00
Dylan Hurd
cc8c293378 fix(exec-policy) No empty command lists (#11397)
## Summary
This should rarely, if ever, happen in practice. But regardless, we
should never provide an empty list of `commands` to ExecPolicy. This PR
is almost entirely adding test around these cases.

## Testing
- [x] Adds a bunch of unit tests for this
2026-02-10 19:22:23 -08:00
Michael Bolin
b68a84ee8e Remove deterministic_process_ids feature to avoid duplicate codex-core builds (#11393)
## Why

`codex-core` enabled `deterministic_process_ids` through a self
dev-dependency.
That forced a second feature-resolved build of the same crate, which
increased
compile time and test latency.

## What Changed

- Removed the `deterministic_process_ids` feature from
`codex-rs/core/Cargo.toml`.
- Removed the self dev-dependency on `codex-core` that enabled that
feature.
- Removed the Bazel `deterministic_process_ids` crate feature for
`codex-core`.
- Added a test-only `AtomicBool` override in unified exec process-id
allocation.
- Added a test-support setter for that override and re-exported it from
`codex-core`.
- Enabled deterministic process IDs in integration tests via
`core_test_support` ctor.

## Behavior

- Production behavior remains random process IDs.
- Unit tests remain deterministic via `cfg(test)`.
- Integration tests remain deterministic via explicit test-support
initialization.

## Validation

- `just fmt`
- `cargo test -p codex-core unified_exec::`
- `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
- `cargo tree -p codex-core -e features` (verified the removed feature
path)
2026-02-10 19:07:01 -08:00
pakrym-oai
c68999ee6d Prefer websocket transport when model opts in (#11386)
Summary
- add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
in all fixtures and constructors
- wire the new flag into websocket selection so models that opt in
always use websocket transport even when the feature gate is off

Testing
- Not run (not requested)
2026-02-10 18:50:48 -08:00
Michael Bolin
d44f4205fb chore: rename codex-command to codex-shell-command (#11378)
This addresses some post-merge feedback on
https://github.com/openai/codex/pull/11361:

- crate rename
- reuse `detect_shell_type()` utility
2026-02-10 17:03:46 -08:00
jif-oai
87bbfc50a1 feat: prevent double backfill (#11377)
## Summary

Add a DB-backed lease to prevent duplicate `.sqlite` backfill workers
from running concurrently.

### What changed
- Added StateRuntime::try_claim_backfill(lease_seconds) that atomically
claims backfill only when:
  - backfill is not complete, and
  - no fresh running worker currently owns it.
- Updated backfill_sessions to use the claim API and exit early when
another worker already holds the lease.
- Added runtime tests covering:
  - singleton claim behavior,
  - stale lease takeover,
  - claim blocked after complete.
- Set backfill lease to 900s in production and 1s in tests.

### Why

This avoids duplicate backfill work and reduces backfill status churn
under concurrent startup, while preserving
current best-effort fallback behavior.
2026-02-11 00:24:20 +00:00
jif-oai
674799d356 feat: mem v2 - PR6 (consolidation) (#11374) 2026-02-11 00:02:57 +00:00
jif-oai
2c9be54c9a feat: mem v2 - PR5 (#11372) 2026-02-10 23:22:55 +00:00
viyatb-oai
1d47927aa0 Enable SOCKS defaults for common local network proxy use cases (#11362)
## Summary
- enable local-use defaults in network proxy settings: SOCKS5 on, SOCKS5
UDP on, upstream proxying on, and local binding on
- add a regression test that asserts the full
`NetworkProxySettings::default()` baseline
- Fixed managed listener reservation behavior.
Before: we always reserved a loopback SOCKS listener, even when
enable_socks5 = false.
Now: SOCKS listener is only reserved when SOCKS is enabled.
- Fixed /debug-config env output for SOCKS-disabled sessions.
ALL_PROXY now shows the HTTP proxy URL when SOCKS is disabled (instead
of incorrectly showing socks5h://...).


## Validation
- just fmt
- cargo test -p codex-network-proxy
- cargo clippy -p codex-network-proxy --all-targets
2026-02-10 15:13:52 -08:00
jif-oai
623d3f4071 feat: mem v2 - PR4 (#11369)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 23:10:35 +00:00
Michael Bolin
d8f9bb65e2 # Split command parsing/safety out of codex-core into new codex-command (#11361)
`codex-core` had accumulated command parsing and command safety logic
(`bash`, `powershell`, `parse_command`, and `command_safety`) that is
logically cohesive but orthogonal to most core session/runtime logic.
Keeping this code in `codex-core` made the crate increasingly monolithic
and raised iteration cost for unrelated core changes.

This change extracts that surface into a dedicated crate,
`codex-command`, while preserving existing `codex_core::...` call sites
via re-exports.

## Why this refactor

During analysis, command parsing/safety stood out as a good first split
because it has:

- a clear domain boundary (shell parsing + safety classification)
- relatively self-contained dependencies (notably `tree-sitter` /
`tree-sitter-bash`)
- a meaningful standalone test surface (`134` tests moved with the
crate)
- many downstream uses that benefit from independent compilation and
caching

The practical problem was build latency from a large `codex-core`
compile/test graph. Clean-build timings before and after this split
showed measurable wins:

- `cargo check -p codex-core`: `57.08s` -> `53.54s` (~`6.2%` faster)
- `cargo test -p codex-core --no-run`: `2m39.9s` -> `2m20s` (~`12.4%`
faster)
- `codex-core lib` compile unit: `57.18s` -> `49.67s` (~`13.1%` faster)
- `codex-core lib(test)` compile unit: `60.87s` -> `53.21s` (~`12.6%`
faster)

This gives a concrete reduction in core build overhead without changing
behavior.

## What changed

### New crate

- Added `codex-rs/command` as workspace crate `codex-command`.
- Added:
  - `command/src/lib.rs`
  - `command/src/bash.rs`
  - `command/src/powershell.rs`
  - `command/src/parse_command.rs`
  - `command/src/command_safety/*`
  - `command/src/shell_detect.rs`
  - `command/BUILD.bazel`

### Code moved out of `codex-core`

- Moved modules from `core/src` into `command/src`:
  - `bash.rs`
  - `powershell.rs`
  - `parse_command.rs`
  - `command_safety/*`

### Dependency graph updates

- Added workspace member/dependency entries for `codex-command` in
`codex-rs/Cargo.toml`.
- Added `codex-command` dependency to `codex-rs/core/Cargo.toml`.
- Removed `tree-sitter` and `tree-sitter-bash` from `codex-core` direct
deps (now owned by `codex-command`).

### API compatibility for callers

To avoid immediate downstream churn, `codex-core` now re-exports the
moved modules/functions:

- `codex_command::bash`
- `codex_command::powershell`
- `codex_command::parse_command`
- `codex_command::is_safe_command`
- `codex_command::is_dangerous_command`

This keeps existing `codex_core::...` paths working while enabling
gradual migration to direct `codex-command` usage.

### Internal decoupling detail

- Added `command::shell_detect` so moved `bash`/`powershell` logic no
longer depends on core shell internals.
- Adjusted PowerShell helper visibility in `codex-command` for existing
core test usage (`UTF8` prefix helper + executable discovery functions).

## Validation

- `just fmt`
- `just fix -p codex-command -p codex-core`
- `cargo test -p codex-command` (`134` passed)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-core shell_command_handler`

## Notes / follow-up

This commit intentionally prioritizes boundary extraction and
compatibility. A follow-up can migrate downstream crates to depend
directly on `codex-command` (instead of through `codex-core` re-exports)
to realize additional incremental build wins.
2026-02-10 14:43:16 -08:00
jif-oai
3419660767 feat: mem v2 - PR3 (#11366)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 22:12:50 +00:00
jif-oai
0229dc5ccf feat: mem v2 - PR2 (#11365)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 21:50:53 +00:00
jif-oai
07da740c8a feat: mem v2 - PR1 (#11364)
# Memories migration plan (simplified global workflow)

## Target behavior

- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.

## Current code map

- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.

## PR plan

## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)

- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.

Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.

## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)

- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.

Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.

## PR 3: Remove per-cwd memories and move to one global memory root

- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).

Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.

## PR 4: Phase-2 global lock simplification and cleanup

- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.

Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.

## PR 5: Final cleanup and docs

- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.

Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.

## Notes on sequencing

- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
2026-02-10 21:29:06 +00:00
jif-oai
a6e9469fa4 chore: unify memory job flow (#11334) 2026-02-10 20:26:39 +00:00
Ahmed Ibrahim
5e01450963 Strip unsupported images from prompt history to guard against model switch (#11349)
- Make `ContextManager::for_prompt` modality-aware and strip input_image
content when the active model is text-only.
- Added a test for multi-model -> text-only model switch
2026-02-10 11:58:00 -08:00
iceweasel-oai
82f93a13b2 include sandbox (seatbelt, elevated, etc.) as in turn metadata header (#10946)
This will help us understand retention/usage for folks who use the
Windows (or any other) sandboxes
2026-02-10 19:50:07 +00:00
viyatb-oai
62d0f302fd fix(core): canonicalize wrapper approvals and support heredoc prefix … (#10941)
## Summary
- Reduced repeated approvals for equivalent wrapper commands and fixed
execpolicy matching for heredoc-style shell invocations, with minimal
behavior change and fail-closed defaults.

## Fixes
1. Canonicalized approval matching for wrappers so equivalent commands
map to the same approval intent.
2. Added heredoc-aware prefix extraction for execpolicy so commands like
`python3 <<'PY' ... PY` match rules such as `prefix_rule(["python3"],
...)`.
3. Kept fallback behavior conservative: if parsing is ambiguous,
existing prompt behavior is preserved.

## Edge Cases Covered
- Wrapper path/name differences: `/bin/bash` vs `bash`, `/bin/zsh` vs
`zsh`.
- Shell modes: `-c` and `-lc`.
- Heredoc forms: quoted delimiter (`<<'PY'`) and unquoted delimiter (`<<
PY`).
- Multi-command heredoc scripts are rejected by the fallback
- Non-heredoc redirections (`>`, etc.) are not treated as heredoc prefix
matches.
- Complex scripts still fall back to prior behavior rather than
expanding permissions.

---------

Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>
2026-02-10 11:46:40 -08:00
pakrym-oai
e4b5384539 Extract tool building (#11337)
Make it clear what input go into building tools and allow for easy reuse
for pre-warm request
2026-02-10 11:45:23 -08:00
Ahmed Ibrahim
9c4656000f Sanitize MCP image output for text-only models (#11346)
- Replace image blocks in MCP tool results with a text placeholder when
the active model does not accept image input.
- Add an e2e rmcp test to verify sanitized tool output is what gets sent
back to the model.
2026-02-10 11:25:32 -08:00
Ahmed Ibrahim
6e96e4837e Always expose view_image and return unsupported image-input error (#11336)
- Keep `view_image` in the advertised tool list for all models.
- Return a clear error when the current model does not support image
inputs, and cover it with a unit test.
2026-02-10 11:25:12 -08:00
jif-oai
847a6092e6 fix: reduce usage of open_if_present (#11344) 2026-02-10 19:25:07 +00:00
pakrym-oai
0639c33892 Compare full request for websockets incrementality (#11343)
Tools can dynamically change mid-turn now. We need to be more thorough
about reusing incremental connections.
2026-02-10 19:14:36 +00:00
Michael Bolin
548afa5749 core: remove stale apply_patch SandboxPolicy TODO in seatbelt (#11345)
The `TODO` in `core/src/seatbelt.rs` claimed that `apply_patch` still needed to honor `SandboxPolicy`. That was true when the comment was added, but it is no longer true.

Analysis:
- The TODO was introduced in #1762, when seatbelt code was split out of `exec.rs`.
- `apply_patch` sandboxing was later implemented in #1705.
- Today, `apply_patch` calls are routed through the tool orchestrator and delegated to `ApplyPatchRuntime`, which executes via `execute_env()` using the active sandbox attempt policy.
- On macOS, the sandbox transform path for that execution still builds seatbelt args with `create_seatbelt_command_args(command, policy, sandbox_policy_cwd)`, so the same `SandboxPolicy` gates `apply_patch` writes and network behavior.

Because this behavior is already enforced, the TODO is stale and removing it avoids implying missing sandbox coverage where none exists.

No functional behavior change; comment-only cleanup.
2026-02-10 19:10:02 +00:00
guinness-oai
099ed802b2 Treat first rollout session_meta as canonical thread identity (#11241)
During thread/fork, the new rollout includes the fork’s own session_meta
plus copied history that can contain older session_meta entries from the
source thread. thread/list was overwriting metadata on later
session_meta lines, so a fork could be reported with the source thread’s
thread_id. This fix only uses the first session_meta, so the fork keeps
its own ID.
2026-02-10 10:32:11 -08:00
Matthew Zeng
48e415bdef [apps] Improve app installation flow. (#11249)
- [x] Add buttons to start the installation flow and verify installation
completes.
- [x] Hard refresh apps list when the /apps view opens.
2026-02-10 17:59:43 +00:00
Shijie Rao
c4b771a16f Fix: update parallel tool call exec approval to approve on request id (#11162)
### Summary

In parallel tool call, exec command approvals were not approved at
request level but at a turn level. i.e. when a single request is
approved, the system currently treats all requests in turn as approved.

### Before

https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8

### After

https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc
2026-02-10 09:38:00 -08:00
pakrym-oai
3322b99900 Remove ApiPrompt (#11265)
Keep things simple and build a full Responses API request request right
in the model client
2026-02-10 16:12:31 +00:00
jif-oai
e57892b211 feat: phase 2 consolidation (#11306)
Consolidation phase of memories

Cleaning and better handling of concurrency
2026-02-10 14:31:16 +00:00
jif-oai
d735df1f50 Extract hooks into dedicated crate (#11311)
Summary
- move `core/src/hooks` implementation into a new `codex-hooks` crate
with its own manifest
- update `codex-rs` workspace and `codex-core` crate to depend on the
extracted `hooks` crate and wire up the shared APIs
- ensure references, modules, and lockfile reflect the new crate layout

Testing
- Not run (not requested)
2026-02-10 13:42:17 +00:00
jif-oai
1d5eba0090 feat: align memory phase 1 and make it stronger (#11300)
## Align with the new phase-1 design

Basically we know run phase 1 in parallel by considering:
* Max 64 rollouts
* Max 1 month old
* Consider the most recent first

This PR also adds stronger parallelization capabilities by detecting
stale jobs, retry policies, ownership of computation to prevent double
computations etc etc
2026-02-10 13:42:09 +00:00
jif-oai
223fadc760 Fix spawn_agent input type (#11304) 2026-02-10 12:16:39 +00:00
jif-oai
87ccc5bbae feat: add connector capabilities to sub-agents (#11191) 2026-02-10 11:53:01 +00:00
jif-oai
6049ff02a0 memories: add extraction and prompt module foundation (#11200)
## Summary
- add the new `core/src/memories` module (phase-one parsing, rollout
filtering, storage, selection, prompts)
- add Askama-backed memory templates for stage-one input/system and
consolidation prompts
- add module tests for parsing, filtering, path bucketing, and summary
maintenance

## Testing
- just fmt
- cargo test -p codex-core --lib memories::
2026-02-10 10:10:24 +00:00
Michael Bolin
44ebf4588f feat: retain NetworkProxy, when appropriate (#11207)
As of this PR, `SessionServices` retains a
`Option<StartedNetworkProxy>`, if appropriate.

Now the `network` field on `Config` is `Option<NetworkProxySpec>`
instead of `Option<NetworkProxy>`.

Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
create the `StartedNetworkProxy`, which is a new struct that retains the
`NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
when it is dropped.)

The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
the appropriate places.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
* #11285
* __->__ #11207
2026-02-10 02:09:23 -08:00
alexsong-oai
9fded117ac feat: support configurable metric_exporter (#10940) 2026-02-10 08:14:28 +00:00
viyatb-oai
3391e5ea86 feat(sandbox): enforce proxy-aware network routing in sandbox (#11113)
## Summary
- expand proxy env injection to cover common tool env vars
(`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY`/`NO_PROXY` families +
tool-specific variants)
- harden macOS Seatbelt network policy generation to route through
inferred loopback proxy endpoints and fail closed when proxy env is
malformed
- thread proxy-aware Linux sandbox flags and add minimal bwrap netns
isolation hook for restricted non-proxy runs
- add/refresh tests for proxy env wiring, Seatbelt policy generation,
and Linux sandbox argument wiring
2026-02-10 07:44:21 +00:00
alexsong-oai
91704c5672 feat: add SkillPolicy to skill metadata and support allow_implicit_invocation (#11244)
Tested by setting the policy in agents/openai.yaml to true, false, and
leaving it unset (default).
```
policy:
  allow_implicit_invocation: false
```
<img width="847" height="289" alt="Screenshot 2026-02-09 at 3 42 41 PM"
src="https://github.com/user-attachments/assets/d3476264-3355-47cf-894a-4ffba53e3481"
/>
2026-02-09 23:13:27 -08:00
Matthew Zeng
005e040f97 [apps] Add thread_id param to optionally load thread config for apps feature check. (#11279)
- [x] Add thread_id param to optionally load thread config for apps
feature check
2026-02-09 23:10:26 -08:00
Eric Traut
bb974c78de Disable dynamic model refresh for custom model providers (#11239)
The dynamic model refresh feature (`https://api.openai.com/v1/models`
endpoint) is currently gated on a runtime check for an auth method other
than API Key. It should be gated on a check specifically for ChatGPT
Auth because some custom model providers (e.g. for local models) use no
auth mechanism. A call to `self.auth_manager.auth_mode()` will return
`None` in this case.

Addresses #11213
2026-02-09 21:36:09 -08:00
Owen Lin
53741013ab fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
…ount_id and chatgpt_plan_type

### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012

Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).

The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
2026-02-09 20:48:58 -08:00
Michael Bolin
862ab63071 chore: change ConfigState so it no longer depends on a single config.toml file for reloading (#11262)
If anything, it should depend on `ConfigLayerStack`.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11262).
* #11207
* __->__ #11262
2026-02-09 19:26:39 -08:00
Ahmed Ibrahim
d1df3bd63b Revert "Revert "Update models.json"" (#11256)
Reverts openai/codex#11255
2026-02-09 19:22:41 -08:00
Ahmed Ibrahim
a1abd53b6a Remove offline fallback for models (#11238)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-02-09 16:58:54 -08:00
Dylan Hurd
d65f09b913 fix(feature) UnderDevelopment feature must be off (#11242)
## Summary
1. Bump RemoteModels to Stable
2. Assert that all UnderDevelopment features are off by default

## Testing
- [x] Added unit test
2026-02-09 15:14:15 -08:00
Ahmed Ibrahim
481145e959 Use longest remote model prefix matching (#11228)
Match model metadata by longest matching remote slug prefix before local
fallback.

- Update `get_model_info` to prefer the most specific remote slug prefix
for the requested model.
- Add an integration test to assert `gpt-5.3-codex-test` resolves to
`gpt-5.3-codex` over `gpt-5.3`.
2026-02-09 15:05:56 -08:00
Matthew Zeng
d90df4761b [apps] Add gated instructions for Apps. (#10924)
- [x] Add gated instructions for Apps.
2026-02-09 14:48:09 -08:00
jif-oai
ffd4bd345c feat: tie shell snapshot to cwd (#11231)
Fix for this: https://github.com/openai/codex/issues/11223

Basically we tie the shell snapshot to a `cwd` to handle `cwd`-based env
setups
2026-02-09 22:14:39 +00:00