mirror of
https://github.com/openai/codex.git
synced 2026-05-15 00:32:51 +00:00
## Why SQLite state was still being opened from consumer paths, including lazy `OnceCell`-backed thread-store call sites. That let one process construct multiple state DB connections for the same Codex home, which makes SQLite lock contention and `database is locked` failures much easier to hit. State DB lifetime should be chosen by main-like entrypoints and tests, then passed through explicitly. Consumers should use the supplied `Option<StateDbHandle>` or `StateDbHandle` and keep their existing filesystem fallback or error behavior when no handle is available. The startup path also needs to keep the rollout crate in charge of SQLite state initialization. Opening `codex_state::StateRuntime` directly bypasses rollout metadata backfill, so entrypoints should initialize through `codex_rollout::state_db` and receive a handle only after required rollout backfills have completed. ## What Changed - Initialize the state DB in main-like entrypoints for CLI, TUI, app-server, exec, MCP server, and the thread-manager sample. - Pass `Option<StateDbHandle>` through `ThreadManager`, `LocalThreadStore`, app-server processors, TUI app wiring, rollout listing/recording, personality migration, shell snapshot cleanup, session-name lookup, and memory/device-key consumers. - Remove the lazy local state DB wrapper from the thread store so non-test consumers use only the supplied handle or their existing fallback path. - Make `codex_rollout::state_db::init` the local state startup path: it opens/migrates SQLite, runs rollout metadata backfill when needed, waits for concurrent backfill workers up to a bounded timeout, verifies completion, and then returns the initialized handle. - Keep optional/non-owning SQLite helpers, such as remote TUI local reads, as open-only paths that do not run startup backfill. - Switch app-server startup from direct `codex_state::StateRuntime::init` to the rollout state initializer so app-server cannot skip rollout backfill. - Collapse split rollout lookup/list APIs so callers use the normal methods with an optional state handle instead of `_with_state_db` variants. - Restore `getConversationSummary(ThreadId)` to delegate through `ThreadStore::read_thread` instead of a LocalThreadStore-specific rollout path special case. - Keep DB-backed rollout path lookup keyed on the DB row and file existence, without imposing the filesystem filename convention on existing DB rows. - Verify readable DB-backed rollout paths against `session_meta.id` before returning them, so a stale SQLite row that points at another thread's JSONL falls back to filesystem search and read-repairs the DB row. - Keep `debug prompt-input` filesystem-only so a one-off debug command does not initialize or backfill SQLite state just to print prompt input. - Keep goal-session test Codex homes alive only in the goal-specific helper, rather than leaking tempdirs from the shared session test helper. - Update tests and call sites to pass explicit state handles where DB behavior is expected and explicit `None` where filesystem-only behavior is intended. ## Validation - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-thread-store -p codex-app-server -p codex-core -p codex-tui -p codex-exec -p codex-cli --tests` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout state_db_` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout find_thread_path -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout try_init_ -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-rollout` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo clippy -p codex-rollout --lib -- -D warnings` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store read_thread_falls_back_when_sqlite_path_points_to_another_thread -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-thread-store` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core shell_snapshot` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all personality_migration` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find` - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find::find_prefers_sqlite_path_by_id -- --nocapture` - `RUST_MIN_STACK=8388608 CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core --test all rollout_list_find -- --nocapture` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-core interrupt_accounts_active_goal_before_pausing` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server get_auth_status -- --test-threads=1` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo test -p codex-app-server --lib` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db cargo check -p codex-rollout -p codex-app-server --tests` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-thread-store -p codex-core -p codex-app-server -p codex-tui -p codex-exec -p codex-cli` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout -p codex-app-server` - `CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-rollout` - `CODEX_SKIP_VENDORED_BWRAP=1 CARGO_TARGET_DIR=/tmp/codex-target-state-db just fix -p codex-core` - `just argument-comment-lint -p codex-core` - `just argument-comment-lint -p codex-rollout` Focused coverage added in `codex-rollout`: - `recorder::tests::state_db_init_backfills_before_returning` verifies the rollout metadata row exists before startup init returns. - `state_db::tests::try_init_waits_for_concurrent_startup_backfill` verifies startup waits for another worker to finish backfill instead of disabling the handle for the process. - `state_db::tests::try_init_times_out_waiting_for_stuck_startup_backfill` verifies startup does not hang indefinitely on a stuck backfill lease. - `tests::find_thread_path_accepts_existing_state_db_path_without_canonical_filename` verifies DB-backed lookup accepts valid existing rollout paths even when the filename does not include the thread UUID. - `tests::find_thread_path_falls_back_when_db_path_points_to_another_thread` verifies DB-backed lookup ignores a stale row whose existing path belongs to another thread and read-repairs the row after filesystem fallback. Focused coverage updated in `codex-core`: - `rollout_list_find::find_prefers_sqlite_path_by_id` now uses a DB-preferred rollout file with matching `session_meta.id`, so it still verifies that valid SQLite paths win without depending on stale/empty rollout contents. `cargo test -p codex-app-server thread_list_respects_search_term_filter -- --test-threads=1 --nocapture` was attempted locally but timed out waiting for the app-server test harness `initialize` response before reaching the changed thread-list code path. `bazel test //codex-rs/thread-store:thread-store-unit-tests --test_output=errors` was attempted locally after the thread-store fix, but this container failed before target analysis while fetching `v8+` through BuildBuddy/direct GitHub. The equivalent local crate coverage, including `cargo test -p codex-thread-store`, passes. A plain local `cargo check -p codex-rollout -p codex-app-server --tests` also requires system `libcap.pc` for `codex-linux-sandbox`; the follow-up app-server check above used `CODEX_SKIP_VENDORED_BWRAP=1` in this container.
1053 lines
34 KiB
Rust
1053 lines
34 KiB
Rust
use std::future::Future;
|
|
use std::io::ErrorKind;
|
|
use std::mem::swap;
|
|
use std::path::Path;
|
|
use std::path::PathBuf;
|
|
use std::process::Command;
|
|
use std::sync::Arc;
|
|
use std::sync::atomic::AtomicU64;
|
|
use std::sync::atomic::Ordering;
|
|
use std::time::Duration;
|
|
|
|
use anyhow::Context;
|
|
use anyhow::Result;
|
|
use anyhow::anyhow;
|
|
use codex_config::CloudRequirementsLoader;
|
|
use codex_core::CodexThread;
|
|
use codex_core::ThreadManager;
|
|
use codex_core::config::Config;
|
|
use codex_core::shell::Shell;
|
|
use codex_core::shell::get_shell_by_model_provided_path;
|
|
use codex_core::thread_store_from_config;
|
|
use codex_exec_server::CreateDirectoryOptions;
|
|
use codex_exec_server::ExecutorFileSystem;
|
|
use codex_exec_server::RemoveOptions;
|
|
use codex_features::Feature;
|
|
use codex_login::CodexAuth;
|
|
use codex_model_provider_info::ModelProviderInfo;
|
|
use codex_model_provider_info::built_in_model_providers;
|
|
use codex_models_manager::bundled_models_response;
|
|
use codex_protocol::config_types::ServiceTier;
|
|
use codex_protocol::models::PermissionProfile;
|
|
use codex_protocol::openai_models::ModelsResponse;
|
|
use codex_protocol::protocol::AskForApproval;
|
|
use codex_protocol::protocol::EventMsg;
|
|
use codex_protocol::protocol::Op;
|
|
use codex_protocol::protocol::RealtimeConversationVersion as RealtimeWsVersion;
|
|
use codex_protocol::protocol::SandboxPolicy;
|
|
use codex_protocol::protocol::SessionConfiguredEvent;
|
|
use codex_protocol::protocol::SessionSource;
|
|
use codex_protocol::protocol::TurnEnvironmentSelection;
|
|
use codex_protocol::user_input::UserInput;
|
|
use codex_utils_absolute_path::AbsolutePathBuf;
|
|
use futures::future::BoxFuture;
|
|
use serde_json::Value;
|
|
use tempfile::TempDir;
|
|
use wiremock::MockServer;
|
|
|
|
use crate::PathBufExt;
|
|
use crate::TempDirExt;
|
|
use crate::get_remote_test_env;
|
|
use crate::load_default_config_for_test;
|
|
use crate::load_default_config_for_test_with_cloud_requirements;
|
|
use crate::responses::WebSocketTestServer;
|
|
use crate::responses::output_value_to_text;
|
|
use crate::responses::start_mock_server;
|
|
use crate::streaming_sse::StreamingSseServer;
|
|
use crate::wait_for_event_match;
|
|
use crate::wait_for_event_with_timeout;
|
|
use wiremock::Match;
|
|
use wiremock::matchers::path_regex;
|
|
|
|
type ConfigMutator = dyn FnOnce(&mut Config) + Send;
|
|
type PreBuildHook = dyn FnOnce(&Path) + Send + 'static;
|
|
type WorkspaceSetup = dyn FnOnce(AbsolutePathBuf, Arc<dyn ExecutorFileSystem>) -> BoxFuture<'static, Result<()>>
|
|
+ Send;
|
|
const TEST_MODEL_WITH_EXPERIMENTAL_TOOLS: &str = "test-gpt-5.1-codex";
|
|
const REMOTE_EXEC_SERVER_URL_ENV_VAR: &str = "CODEX_TEST_REMOTE_EXEC_SERVER_URL";
|
|
static REMOTE_TEST_INSTANCE_COUNTER: AtomicU64 = AtomicU64::new(0);
|
|
const SUBMIT_TURN_COMPLETE_TIMEOUT: Duration = Duration::from_secs(30);
|
|
|
|
#[derive(Debug)]
|
|
pub struct TestEnv {
|
|
environment: codex_exec_server::Environment,
|
|
cwd: AbsolutePathBuf,
|
|
local_cwd_temp_dir: Option<Arc<TempDir>>,
|
|
remote_container_name: Option<String>,
|
|
}
|
|
|
|
impl TestEnv {
|
|
pub async fn local() -> Result<Self> {
|
|
let local_cwd_temp_dir = Arc::new(TempDir::new()?);
|
|
let cwd = local_cwd_temp_dir.abs();
|
|
let environment =
|
|
codex_exec_server::Environment::create_for_tests(/*exec_server_url*/ None)?;
|
|
Ok(Self {
|
|
environment,
|
|
cwd,
|
|
local_cwd_temp_dir: Some(local_cwd_temp_dir),
|
|
remote_container_name: None,
|
|
})
|
|
}
|
|
|
|
pub fn cwd(&self) -> &AbsolutePathBuf {
|
|
&self.cwd
|
|
}
|
|
|
|
pub fn environment(&self) -> &codex_exec_server::Environment {
|
|
&self.environment
|
|
}
|
|
|
|
pub fn exec_server_url(&self) -> Option<&str> {
|
|
self.environment.exec_server_url()
|
|
}
|
|
|
|
fn local_cwd_temp_dir(&self) -> Option<Arc<TempDir>> {
|
|
self.local_cwd_temp_dir.clone()
|
|
}
|
|
}
|
|
|
|
impl Drop for TestEnv {
|
|
fn drop(&mut self) {
|
|
if let Some(container_name) = &self.remote_container_name {
|
|
let script = format!("rm -rf {}", self.cwd.as_path().display());
|
|
let _ = docker_command_capture_stdout(["exec", container_name, "sh", "-lc", &script]);
|
|
}
|
|
}
|
|
}
|
|
|
|
pub async fn test_env() -> Result<TestEnv> {
|
|
match get_remote_test_env() {
|
|
Some(remote_env) => {
|
|
let websocket_url = remote_exec_server_url()?;
|
|
let environment =
|
|
codex_exec_server::Environment::create_for_tests(Some(websocket_url))?;
|
|
let cwd = remote_aware_cwd_path();
|
|
environment
|
|
.get_filesystem()
|
|
.create_directory(
|
|
&cwd,
|
|
CreateDirectoryOptions { recursive: true },
|
|
/*sandbox*/ None,
|
|
)
|
|
.await?;
|
|
Ok(TestEnv {
|
|
environment,
|
|
cwd,
|
|
local_cwd_temp_dir: None,
|
|
remote_container_name: Some(remote_env.container_name),
|
|
})
|
|
}
|
|
None => TestEnv::local().await,
|
|
}
|
|
}
|
|
|
|
fn remote_aware_cwd_path() -> AbsolutePathBuf {
|
|
PathBuf::from(format!(
|
|
"/tmp/codex-core-test-cwd-{}",
|
|
remote_test_instance_id()
|
|
))
|
|
.abs()
|
|
}
|
|
|
|
fn remote_exec_server_url() -> Result<String> {
|
|
let listen_url = std::env::var(REMOTE_EXEC_SERVER_URL_ENV_VAR).with_context(|| {
|
|
format!("{REMOTE_EXEC_SERVER_URL_ENV_VAR} must be set for remote tests")
|
|
})?;
|
|
let listen_url = listen_url.trim();
|
|
if listen_url.is_empty() {
|
|
return Err(anyhow!(
|
|
"{REMOTE_EXEC_SERVER_URL_ENV_VAR} must not be empty"
|
|
));
|
|
}
|
|
Ok(listen_url.to_string())
|
|
}
|
|
|
|
fn remote_test_instance_id() -> String {
|
|
let instance = REMOTE_TEST_INSTANCE_COUNTER.fetch_add(1, Ordering::Relaxed);
|
|
format!("{}-{instance}", std::process::id())
|
|
}
|
|
|
|
fn docker_command_capture_stdout<const N: usize>(args: [&str; N]) -> Result<String> {
|
|
let output = Command::new("docker")
|
|
.args(args)
|
|
.output()
|
|
.with_context(|| format!("run docker {args:?}"))?;
|
|
if !output.status.success() {
|
|
return Err(anyhow!(
|
|
"docker {:?} failed: stdout={} stderr={}",
|
|
args,
|
|
String::from_utf8_lossy(&output.stdout).trim(),
|
|
String::from_utf8_lossy(&output.stderr).trim()
|
|
));
|
|
}
|
|
String::from_utf8(output.stdout).context("docker stdout must be utf-8")
|
|
}
|
|
|
|
/// A collection of different ways the model can output an apply_patch call
|
|
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
|
|
pub enum ApplyPatchModelOutput {
|
|
Freeform,
|
|
Function,
|
|
Shell,
|
|
ShellViaHeredoc,
|
|
ShellCommandViaHeredoc,
|
|
}
|
|
|
|
/// A collection of different ways the model can output an apply_patch call
|
|
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)]
|
|
pub enum ShellModelOutput {
|
|
Shell,
|
|
ShellCommand,
|
|
LocalShell,
|
|
// UnifiedExec has its own set of tests
|
|
}
|
|
|
|
/// Returns the permission fields required by `Op::UserTurn` for tests that
|
|
/// construct the op directly.
|
|
pub fn turn_permission_fields(
|
|
permission_profile: PermissionProfile,
|
|
cwd: &Path,
|
|
) -> (SandboxPolicy, Option<PermissionProfile>) {
|
|
let sandbox_policy = permission_profile
|
|
.to_legacy_sandbox_policy(cwd)
|
|
.unwrap_or_else(|_| SandboxPolicy::new_read_only_policy());
|
|
(sandbox_policy, Some(permission_profile))
|
|
}
|
|
|
|
pub struct TestCodexBuilder {
|
|
config_mutators: Vec<Box<ConfigMutator>>,
|
|
auth: CodexAuth,
|
|
pre_build_hooks: Vec<Box<PreBuildHook>>,
|
|
workspace_setups: Vec<Box<WorkspaceSetup>>,
|
|
home: Option<Arc<TempDir>>,
|
|
cloud_requirements: Option<CloudRequirementsLoader>,
|
|
user_shell_override: Option<Shell>,
|
|
exec_server_url: Option<String>,
|
|
}
|
|
|
|
impl TestCodexBuilder {
|
|
pub fn with_config<T>(mut self, mutator: T) -> Self
|
|
where
|
|
T: FnOnce(&mut Config) + Send + 'static,
|
|
{
|
|
self.config_mutators.push(Box::new(mutator));
|
|
self
|
|
}
|
|
|
|
pub fn with_auth(mut self, auth: CodexAuth) -> Self {
|
|
self.auth = auth;
|
|
self
|
|
}
|
|
|
|
pub fn with_model(self, model: &str) -> Self {
|
|
let new_model = model.to_string();
|
|
self.with_config(move |config| {
|
|
config.model = Some(new_model);
|
|
})
|
|
}
|
|
|
|
pub fn with_pre_build_hook<F>(mut self, hook: F) -> Self
|
|
where
|
|
F: FnOnce(&Path) + Send + 'static,
|
|
{
|
|
self.pre_build_hooks.push(Box::new(hook));
|
|
self
|
|
}
|
|
|
|
pub fn with_workspace_setup<F, Fut>(mut self, setup: F) -> Self
|
|
where
|
|
F: FnOnce(AbsolutePathBuf, Arc<dyn ExecutorFileSystem>) -> Fut + Send + 'static,
|
|
Fut: Future<Output = Result<()>> + Send + 'static,
|
|
{
|
|
self.workspace_setups
|
|
.push(Box::new(move |cwd, fs| Box::pin(setup(cwd, fs))));
|
|
self
|
|
}
|
|
|
|
pub fn with_home(mut self, home: Arc<TempDir>) -> Self {
|
|
self.home = Some(home);
|
|
self
|
|
}
|
|
|
|
pub fn with_cloud_requirements(mut self, cloud_requirements: CloudRequirementsLoader) -> Self {
|
|
self.cloud_requirements = Some(cloud_requirements);
|
|
self
|
|
}
|
|
|
|
pub fn with_user_shell(mut self, user_shell: Shell) -> Self {
|
|
self.user_shell_override = Some(user_shell);
|
|
self
|
|
}
|
|
|
|
pub fn with_exec_server_url(mut self, exec_server_url: impl Into<String>) -> Self {
|
|
self.exec_server_url = Some(exec_server_url.into());
|
|
self
|
|
}
|
|
|
|
pub fn with_windows_cmd_shell(self) -> Self {
|
|
if cfg!(windows) {
|
|
self.with_user_shell(get_shell_by_model_provided_path(&PathBuf::from("cmd.exe")))
|
|
} else {
|
|
self
|
|
}
|
|
}
|
|
|
|
pub async fn build(&mut self, server: &wiremock::MockServer) -> anyhow::Result<TestCodex> {
|
|
let home = match self.home.clone() {
|
|
Some(home) => home,
|
|
None => Arc::new(TempDir::new()?),
|
|
};
|
|
let base_url = format!("{}/v1", server.uri());
|
|
let test_env = TestEnv::local().await?;
|
|
Box::pin(self.build_with_home_and_base_url(base_url, home, /*resume_from*/ None, test_env))
|
|
.await
|
|
}
|
|
|
|
pub async fn build_remote_aware(
|
|
&mut self,
|
|
server: &wiremock::MockServer,
|
|
) -> anyhow::Result<TestCodex> {
|
|
let home = match self.home.clone() {
|
|
Some(home) => home,
|
|
None => Arc::new(TempDir::new()?),
|
|
};
|
|
let base_url = format!("{}/v1", server.uri());
|
|
let test_env = test_env().await?;
|
|
Box::pin(self.build_with_home_and_base_url(base_url, home, /*resume_from*/ None, test_env))
|
|
.await
|
|
}
|
|
|
|
pub async fn build_with_streaming_server(
|
|
&mut self,
|
|
server: &StreamingSseServer,
|
|
) -> anyhow::Result<TestCodex> {
|
|
let base_url = server.uri();
|
|
let home = match self.home.clone() {
|
|
Some(home) => home,
|
|
None => Arc::new(TempDir::new()?),
|
|
};
|
|
let test_env = TestEnv::local().await?;
|
|
Box::pin(self.build_with_home_and_base_url(
|
|
format!("{base_url}/v1"),
|
|
home,
|
|
/*resume_from*/ None,
|
|
test_env,
|
|
))
|
|
.await
|
|
}
|
|
|
|
pub async fn build_with_websocket_server(
|
|
&mut self,
|
|
server: &WebSocketTestServer,
|
|
) -> anyhow::Result<TestCodex> {
|
|
let base_url = format!("{}/v1", server.uri());
|
|
let home = match self.home.clone() {
|
|
Some(home) => home,
|
|
None => Arc::new(TempDir::new()?),
|
|
};
|
|
let base_url_clone = base_url.clone();
|
|
self.config_mutators.push(Box::new(move |config| {
|
|
config.model_provider.base_url = Some(base_url_clone);
|
|
config.model_provider.supports_websockets = true;
|
|
config.experimental_realtime_ws_model = Some("realtime-test-model".to_string());
|
|
config.realtime.version = RealtimeWsVersion::V1;
|
|
}));
|
|
let test_env = TestEnv::local().await?;
|
|
Box::pin(self.build_with_home_and_base_url(base_url, home, /*resume_from*/ None, test_env))
|
|
.await
|
|
}
|
|
|
|
pub async fn resume(
|
|
&mut self,
|
|
server: &wiremock::MockServer,
|
|
home: Arc<TempDir>,
|
|
rollout_path: PathBuf,
|
|
) -> anyhow::Result<TestCodex> {
|
|
let base_url = format!("{}/v1", server.uri());
|
|
let test_env = TestEnv::local().await?;
|
|
Box::pin(self.build_with_home_and_base_url(base_url, home, Some(rollout_path), test_env))
|
|
.await
|
|
}
|
|
|
|
async fn build_with_home_and_base_url(
|
|
&mut self,
|
|
base_url: String,
|
|
home: Arc<TempDir>,
|
|
resume_from: Option<PathBuf>,
|
|
test_env: TestEnv,
|
|
) -> anyhow::Result<TestCodex> {
|
|
let (config, fallback_cwd) = self
|
|
.prepare_config(base_url, &home, test_env.cwd().clone())
|
|
.await?;
|
|
let exec_server_url = self
|
|
.exec_server_url
|
|
.clone()
|
|
.or_else(|| test_env.exec_server_url().map(str::to_owned));
|
|
let local_runtime_paths = codex_exec_server::ExecServerRuntimePaths::new(
|
|
std::env::current_exe()?,
|
|
/*codex_linux_sandbox_exe*/ None,
|
|
)?;
|
|
let environment_manager = Arc::new(
|
|
codex_exec_server::EnvironmentManager::create_for_tests(
|
|
exec_server_url,
|
|
local_runtime_paths,
|
|
)
|
|
.await,
|
|
);
|
|
let file_system = test_env.environment().get_filesystem();
|
|
let mut workspace_setups = vec![];
|
|
swap(&mut self.workspace_setups, &mut workspace_setups);
|
|
for setup in workspace_setups {
|
|
setup(config.cwd.clone(), Arc::clone(&file_system)).await?;
|
|
}
|
|
let cwd = test_env.local_cwd_temp_dir().unwrap_or(fallback_cwd);
|
|
Box::pin(self.build_from_config(
|
|
config,
|
|
cwd,
|
|
home,
|
|
resume_from,
|
|
test_env,
|
|
environment_manager,
|
|
))
|
|
.await
|
|
}
|
|
|
|
async fn build_from_config(
|
|
&mut self,
|
|
config: Config,
|
|
cwd: Arc<TempDir>,
|
|
home: Arc<TempDir>,
|
|
resume_from: Option<PathBuf>,
|
|
test_env: TestEnv,
|
|
environment_manager: Arc<codex_exec_server::EnvironmentManager>,
|
|
) -> anyhow::Result<TestCodex> {
|
|
let auth = self.auth.clone();
|
|
let state_db = codex_core::init_state_db(&config).await;
|
|
let thread_manager = if config.model_catalog.is_some() {
|
|
ThreadManager::new(
|
|
&config,
|
|
codex_core::test_support::auth_manager_from_auth(auth.clone()),
|
|
SessionSource::Exec,
|
|
Arc::clone(&environment_manager),
|
|
/*analytics_events_client*/ None,
|
|
thread_store_from_config(&config, state_db.clone()),
|
|
state_db.clone(),
|
|
)
|
|
} else {
|
|
codex_core::test_support::thread_manager_with_models_provider_home_and_state(
|
|
auth.clone(),
|
|
config.model_provider.clone(),
|
|
config.codex_home.to_path_buf(),
|
|
Arc::clone(&environment_manager),
|
|
state_db.clone(),
|
|
)
|
|
};
|
|
let thread_manager = Arc::new(thread_manager);
|
|
let user_shell_override = self.user_shell_override.clone();
|
|
|
|
let new_conversation = match (resume_from, user_shell_override) {
|
|
(Some(path), Some(user_shell_override)) => {
|
|
let auth_manager = codex_core::test_support::auth_manager_from_auth(auth);
|
|
Box::pin(
|
|
codex_core::test_support::resume_thread_from_rollout_with_user_shell_override(
|
|
thread_manager.as_ref(),
|
|
config.clone(),
|
|
path,
|
|
auth_manager,
|
|
user_shell_override,
|
|
),
|
|
)
|
|
.await?
|
|
}
|
|
(Some(path), None) => {
|
|
let auth_manager = codex_core::test_support::auth_manager_from_auth(auth);
|
|
Box::pin(thread_manager.resume_thread_from_rollout(
|
|
config.clone(),
|
|
path,
|
|
auth_manager,
|
|
/*parent_trace*/ None,
|
|
))
|
|
.await?
|
|
}
|
|
(None, Some(user_shell_override)) => {
|
|
Box::pin(
|
|
codex_core::test_support::start_thread_with_user_shell_override(
|
|
thread_manager.as_ref(),
|
|
config.clone(),
|
|
user_shell_override,
|
|
),
|
|
)
|
|
.await?
|
|
}
|
|
(None, None) => Box::pin(thread_manager.start_thread(config.clone())).await?,
|
|
};
|
|
|
|
Ok(TestCodex {
|
|
home,
|
|
cwd,
|
|
config,
|
|
codex: new_conversation.thread,
|
|
session_configured: new_conversation.session_configured,
|
|
thread_manager,
|
|
_test_env: test_env,
|
|
})
|
|
}
|
|
|
|
async fn prepare_config(
|
|
&mut self,
|
|
base_url: String,
|
|
home: &TempDir,
|
|
cwd_override: AbsolutePathBuf,
|
|
) -> anyhow::Result<(Config, Arc<TempDir>)> {
|
|
let model_provider = ModelProviderInfo {
|
|
base_url: Some(base_url),
|
|
// Most core tests use SSE-only mock servers, so keep websocket transport off unless
|
|
// a test explicitly opts into websocket coverage.
|
|
supports_websockets: false,
|
|
..built_in_model_providers(/*openai_base_url*/ None)["openai"].clone()
|
|
};
|
|
let cwd = Arc::new(TempDir::new()?);
|
|
for hook in self.pre_build_hooks.drain(..) {
|
|
hook(home.path());
|
|
}
|
|
let mut config = if let Some(cloud_requirements) = self.cloud_requirements.take() {
|
|
load_default_config_for_test_with_cloud_requirements(home, cloud_requirements).await
|
|
} else {
|
|
load_default_config_for_test(home).await
|
|
};
|
|
config.cwd = cwd_override;
|
|
config.model_provider = model_provider;
|
|
if let Ok(path) = codex_utils_cargo_bin::cargo_bin("codex") {
|
|
config.codex_self_exe = Some(path);
|
|
} else if let Ok(path) = codex_utils_cargo_bin::cargo_bin("codex-exec") {
|
|
// `codex-exec` also supports `--codex-run-as-apply-patch`, so use it
|
|
// when the multitool binary is not available in test builds.
|
|
config.codex_self_exe = Some(path);
|
|
} else if let Ok(exe) = std::env::current_exe()
|
|
&& let Some(bin_dir) = exe.parent().and_then(|parent| parent.parent())
|
|
{
|
|
let codex = bin_dir.join("codex");
|
|
let codex_exec = bin_dir.join("codex-exec");
|
|
if codex.is_file() {
|
|
config.codex_self_exe = Some(codex);
|
|
} else if codex_exec.is_file() {
|
|
config.codex_self_exe = Some(codex_exec);
|
|
}
|
|
}
|
|
|
|
let mut mutators = vec![];
|
|
swap(&mut self.config_mutators, &mut mutators);
|
|
for mutator in mutators {
|
|
mutator(&mut config);
|
|
}
|
|
ensure_test_model_catalog(&mut config)?;
|
|
|
|
if config.include_apply_patch_tool {
|
|
config.features.enable(Feature::ApplyPatchFreeform)?;
|
|
} else {
|
|
config.features.disable(Feature::ApplyPatchFreeform)?;
|
|
}
|
|
|
|
Ok((config, cwd))
|
|
}
|
|
}
|
|
|
|
fn ensure_test_model_catalog(config: &mut Config) -> Result<()> {
|
|
if config.model.as_deref() != Some(TEST_MODEL_WITH_EXPERIMENTAL_TOOLS)
|
|
|| config.model_catalog.is_some()
|
|
{
|
|
return Ok(());
|
|
}
|
|
|
|
let bundled_models = bundled_models_response()
|
|
.unwrap_or_else(|err| panic!("bundled models.json should parse: {err}"));
|
|
let mut model = bundled_models
|
|
.models
|
|
.iter()
|
|
.find(|candidate| candidate.slug == "gpt-5.2")
|
|
.cloned()
|
|
.unwrap_or_else(|| panic!("missing bundled model gpt-5.2"));
|
|
model.slug = TEST_MODEL_WITH_EXPERIMENTAL_TOOLS.to_string();
|
|
model.display_name = TEST_MODEL_WITH_EXPERIMENTAL_TOOLS.to_string();
|
|
model.experimental_supported_tools = vec!["test_sync_tool".to_string()];
|
|
config.model_catalog = Some(ModelsResponse {
|
|
models: vec![model],
|
|
});
|
|
Ok(())
|
|
}
|
|
|
|
pub struct TestCodex {
|
|
pub home: Arc<TempDir>,
|
|
pub cwd: Arc<TempDir>,
|
|
pub codex: Arc<CodexThread>,
|
|
pub session_configured: SessionConfiguredEvent,
|
|
pub config: Config,
|
|
pub thread_manager: Arc<ThreadManager>,
|
|
_test_env: TestEnv,
|
|
}
|
|
|
|
impl TestCodex {
|
|
pub fn cwd_path(&self) -> &Path {
|
|
self.cwd.path()
|
|
}
|
|
|
|
pub fn codex_home_path(&self) -> &Path {
|
|
self.config.codex_home.as_path()
|
|
}
|
|
|
|
pub fn workspace_path(&self, rel: impl AsRef<Path>) -> PathBuf {
|
|
self.cwd_path().join(rel)
|
|
}
|
|
|
|
pub fn executor_environment(&self) -> &TestEnv {
|
|
&self._test_env
|
|
}
|
|
|
|
pub fn fs(&self) -> Arc<dyn ExecutorFileSystem> {
|
|
self._test_env.environment().get_filesystem()
|
|
}
|
|
|
|
pub async fn submit_turn(&self, prompt: &str) -> Result<()> {
|
|
self.submit_turn_with_permission_profile(prompt, PermissionProfile::Disabled)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_turn_with_permission_profile(
|
|
&self,
|
|
prompt: &str,
|
|
permission_profile: PermissionProfile,
|
|
) -> Result<()> {
|
|
self.submit_turn_with_approval_and_permission_profile(
|
|
prompt,
|
|
AskForApproval::Never,
|
|
permission_profile,
|
|
)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_turn_with_policy(
|
|
&self,
|
|
prompt: &str,
|
|
sandbox_policy: SandboxPolicy,
|
|
) -> Result<()> {
|
|
self.submit_turn_with_policies(prompt, AskForApproval::Never, sandbox_policy)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_turn_with_service_tier(
|
|
&self,
|
|
prompt: &str,
|
|
service_tier: Option<ServiceTier>,
|
|
) -> Result<()> {
|
|
self.submit_turn_with_permission_profile_context(
|
|
prompt,
|
|
AskForApproval::Never,
|
|
PermissionProfile::Disabled,
|
|
Some(service_tier),
|
|
/*environments*/ None,
|
|
)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_turn_with_policies(
|
|
&self,
|
|
prompt: &str,
|
|
approval_policy: AskForApproval,
|
|
sandbox_policy: SandboxPolicy,
|
|
) -> Result<()> {
|
|
let permission_profile = PermissionProfile::from_legacy_sandbox_policy_for_cwd(
|
|
&sandbox_policy,
|
|
self.config.cwd.as_path(),
|
|
);
|
|
self.submit_turn_with_context(
|
|
prompt,
|
|
approval_policy,
|
|
permission_profile,
|
|
/*service_tier*/ None,
|
|
/*environments*/ None,
|
|
)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_turn_with_approval_and_permission_profile(
|
|
&self,
|
|
prompt: &str,
|
|
approval_policy: AskForApproval,
|
|
permission_profile: PermissionProfile,
|
|
) -> Result<()> {
|
|
self.submit_turn_with_permission_profile_context(
|
|
prompt,
|
|
approval_policy,
|
|
permission_profile,
|
|
/*service_tier*/ None,
|
|
/*environments*/ None,
|
|
)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_turn_with_environments(
|
|
&self,
|
|
prompt: &str,
|
|
environments: Option<Vec<TurnEnvironmentSelection>>,
|
|
) -> Result<()> {
|
|
self.submit_turn_with_permission_profile_context(
|
|
prompt,
|
|
AskForApproval::Never,
|
|
PermissionProfile::Disabled,
|
|
/*service_tier*/ None,
|
|
environments,
|
|
)
|
|
.await
|
|
}
|
|
|
|
async fn submit_turn_with_permission_profile_context(
|
|
&self,
|
|
prompt: &str,
|
|
approval_policy: AskForApproval,
|
|
permission_profile: PermissionProfile,
|
|
service_tier: Option<Option<ServiceTier>>,
|
|
environments: Option<Vec<TurnEnvironmentSelection>>,
|
|
) -> Result<()> {
|
|
self.submit_turn_with_context(
|
|
prompt,
|
|
approval_policy,
|
|
permission_profile,
|
|
service_tier,
|
|
environments,
|
|
)
|
|
.await
|
|
}
|
|
|
|
async fn submit_turn_with_context(
|
|
&self,
|
|
prompt: &str,
|
|
approval_policy: AskForApproval,
|
|
permission_profile: PermissionProfile,
|
|
service_tier: Option<Option<ServiceTier>>,
|
|
environments: Option<Vec<TurnEnvironmentSelection>>,
|
|
) -> Result<()> {
|
|
let (sandbox_policy, permission_profile) =
|
|
turn_permission_fields(permission_profile, self.config.cwd.as_path());
|
|
let session_model = self.session_configured.model.clone();
|
|
self.codex
|
|
.submit(Op::UserTurn {
|
|
environments,
|
|
items: vec![UserInput::Text {
|
|
text: prompt.into(),
|
|
text_elements: Vec::new(),
|
|
}],
|
|
final_output_json_schema: None,
|
|
cwd: self.config.cwd.to_path_buf(),
|
|
approval_policy,
|
|
approvals_reviewer: None,
|
|
sandbox_policy,
|
|
permission_profile,
|
|
model: session_model,
|
|
effort: None,
|
|
summary: None,
|
|
service_tier,
|
|
collaboration_mode: None,
|
|
personality: None,
|
|
})
|
|
.await?;
|
|
|
|
let turn_id = wait_for_event_match(&self.codex, |event| match event {
|
|
EventMsg::TurnStarted(event) => Some(event.turn_id.clone()),
|
|
_ => None,
|
|
})
|
|
.await;
|
|
wait_for_event_with_timeout(
|
|
&self.codex,
|
|
|event| match event {
|
|
EventMsg::TurnComplete(event) => event.turn_id == turn_id,
|
|
_ => false,
|
|
},
|
|
SUBMIT_TURN_COMPLETE_TIMEOUT,
|
|
)
|
|
.await;
|
|
Ok(())
|
|
}
|
|
}
|
|
|
|
pub struct TestCodexHarness {
|
|
server: MockServer,
|
|
test: TestCodex,
|
|
}
|
|
|
|
impl TestCodexHarness {
|
|
pub async fn new() -> Result<Self> {
|
|
Self::with_builder(test_codex()).await
|
|
}
|
|
|
|
pub async fn with_config(mutator: impl FnOnce(&mut Config) + Send + 'static) -> Result<Self> {
|
|
Self::with_builder(test_codex().with_config(mutator)).await
|
|
}
|
|
|
|
pub async fn with_builder(mut builder: TestCodexBuilder) -> Result<Self> {
|
|
let server = start_mock_server().await;
|
|
let test = builder.build(&server).await?;
|
|
Ok(Self { server, test })
|
|
}
|
|
|
|
pub async fn with_remote_aware_builder(mut builder: TestCodexBuilder) -> Result<Self> {
|
|
let server = start_mock_server().await;
|
|
let test = builder.build_remote_aware(&server).await?;
|
|
Ok(Self { server, test })
|
|
}
|
|
|
|
pub fn server(&self) -> &MockServer {
|
|
&self.server
|
|
}
|
|
|
|
pub fn test(&self) -> &TestCodex {
|
|
&self.test
|
|
}
|
|
|
|
pub fn cwd(&self) -> &Path {
|
|
self.test.config.cwd.as_path()
|
|
}
|
|
|
|
pub fn path(&self, rel: impl AsRef<Path>) -> PathBuf {
|
|
self.path_abs(rel).into_path_buf()
|
|
}
|
|
|
|
pub fn path_abs(&self, rel: impl AsRef<Path>) -> AbsolutePathBuf {
|
|
self.test.config.cwd.join(rel)
|
|
}
|
|
|
|
pub async fn write_file(
|
|
&self,
|
|
rel: impl AsRef<Path>,
|
|
contents: impl AsRef<[u8]>,
|
|
) -> Result<()> {
|
|
let abs_path = self.path_abs(rel);
|
|
if let Some(parent) = abs_path.parent() {
|
|
self.test
|
|
.fs()
|
|
.create_directory(
|
|
&parent,
|
|
CreateDirectoryOptions { recursive: true },
|
|
/*sandbox*/ None,
|
|
)
|
|
.await?;
|
|
}
|
|
self.test
|
|
.fs()
|
|
.write_file(&abs_path, contents.as_ref().to_vec(), /*sandbox*/ None)
|
|
.await?;
|
|
Ok(())
|
|
}
|
|
|
|
pub async fn read_file_text(&self, rel: impl AsRef<Path>) -> Result<String> {
|
|
Ok(self
|
|
.test
|
|
.fs()
|
|
.read_file_text(&self.path_abs(rel), /*sandbox*/ None)
|
|
.await?)
|
|
}
|
|
|
|
pub async fn create_dir_all(&self, rel: impl AsRef<Path>) -> Result<()> {
|
|
self.test
|
|
.fs()
|
|
.create_directory(
|
|
&self.path_abs(rel),
|
|
CreateDirectoryOptions { recursive: true },
|
|
/*sandbox*/ None,
|
|
)
|
|
.await?;
|
|
Ok(())
|
|
}
|
|
|
|
pub async fn path_exists(&self, rel: impl AsRef<Path>) -> Result<bool> {
|
|
self.abs_path_exists(&self.path_abs(rel)).await
|
|
}
|
|
|
|
pub async fn remove_abs_path(&self, path: &AbsolutePathBuf) -> Result<()> {
|
|
self.test
|
|
.fs()
|
|
.remove(
|
|
path,
|
|
RemoveOptions {
|
|
recursive: false,
|
|
force: true,
|
|
},
|
|
/*sandbox*/ None,
|
|
)
|
|
.await?;
|
|
Ok(())
|
|
}
|
|
|
|
pub async fn abs_path_exists(&self, path: &AbsolutePathBuf) -> Result<bool> {
|
|
match self.test.fs().get_metadata(path, /*sandbox*/ None).await {
|
|
Ok(_) => Ok(true),
|
|
Err(err) if err.kind() == ErrorKind::NotFound => Ok(false),
|
|
Err(err) => Err(err.into()),
|
|
}
|
|
}
|
|
|
|
pub async fn submit(&self, prompt: &str) -> Result<()> {
|
|
// Box the submit-and-wait path so callers do not inline the full turn
|
|
// future into their own async state.
|
|
Box::pin(self.test.submit_turn(prompt)).await
|
|
}
|
|
|
|
pub async fn submit_with_policy(
|
|
&self,
|
|
prompt: &str,
|
|
sandbox_policy: SandboxPolicy,
|
|
) -> Result<()> {
|
|
self.test
|
|
.submit_turn_with_policy(prompt, sandbox_policy)
|
|
.await
|
|
}
|
|
|
|
pub async fn submit_with_permission_profile(
|
|
&self,
|
|
prompt: &str,
|
|
permission_profile: PermissionProfile,
|
|
) -> Result<()> {
|
|
self.test
|
|
.submit_turn_with_permission_profile(prompt, permission_profile)
|
|
.await
|
|
}
|
|
|
|
pub async fn request_bodies(&self) -> Vec<Value> {
|
|
let path_matcher = path_regex(".*/responses$");
|
|
self.server
|
|
.received_requests()
|
|
.await
|
|
.expect("mock server should not fail")
|
|
.into_iter()
|
|
.filter(|req| path_matcher.matches(req))
|
|
.map(|req| {
|
|
req.body_json::<Value>()
|
|
.expect("request body to be valid JSON")
|
|
})
|
|
.collect()
|
|
}
|
|
|
|
pub async fn function_call_output_value(&self, call_id: &str) -> Value {
|
|
let bodies = self.request_bodies().await;
|
|
function_call_output(&bodies, call_id).clone()
|
|
}
|
|
|
|
pub async fn function_call_stdout(&self, call_id: &str) -> String {
|
|
self.function_call_output_value(call_id)
|
|
.await
|
|
.get("output")
|
|
.and_then(Value::as_str)
|
|
.expect("output string")
|
|
.to_string()
|
|
}
|
|
|
|
pub async fn custom_tool_call_output(&self, call_id: &str) -> String {
|
|
let bodies = self.request_bodies().await;
|
|
custom_tool_call_output_text(&bodies, call_id)
|
|
}
|
|
|
|
pub async fn apply_patch_output(
|
|
&self,
|
|
call_id: &str,
|
|
output_type: ApplyPatchModelOutput,
|
|
) -> String {
|
|
// Box the awaited output helpers so callers do not inline request
|
|
// capture and response parsing into their own async state.
|
|
match output_type {
|
|
ApplyPatchModelOutput::Freeform => {
|
|
Box::pin(self.custom_tool_call_output(call_id)).await
|
|
}
|
|
ApplyPatchModelOutput::Function
|
|
| ApplyPatchModelOutput::Shell
|
|
| ApplyPatchModelOutput::ShellViaHeredoc
|
|
| ApplyPatchModelOutput::ShellCommandViaHeredoc => {
|
|
Box::pin(self.function_call_stdout(call_id)).await
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
fn custom_tool_call_output<'a>(bodies: &'a [Value], call_id: &str) -> &'a Value {
|
|
for body in bodies {
|
|
if let Some(items) = body.get("input").and_then(Value::as_array) {
|
|
for item in items {
|
|
if item.get("type").and_then(Value::as_str) == Some("custom_tool_call_output")
|
|
&& item.get("call_id").and_then(Value::as_str) == Some(call_id)
|
|
{
|
|
return item;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
panic!("custom_tool_call_output {call_id} not found");
|
|
}
|
|
|
|
fn custom_tool_call_output_text(bodies: &[Value], call_id: &str) -> String {
|
|
let output = custom_tool_call_output(bodies, call_id)
|
|
.get("output")
|
|
.unwrap_or_else(|| panic!("custom_tool_call_output {call_id} missing output"));
|
|
output_value_to_text(output)
|
|
.unwrap_or_else(|| panic!("custom_tool_call_output {call_id} missing text output"))
|
|
}
|
|
|
|
fn function_call_output<'a>(bodies: &'a [Value], call_id: &str) -> &'a Value {
|
|
for body in bodies {
|
|
if let Some(items) = body.get("input").and_then(Value::as_array) {
|
|
for item in items {
|
|
if item.get("type").and_then(Value::as_str) == Some("function_call_output")
|
|
&& item.get("call_id").and_then(Value::as_str) == Some(call_id)
|
|
{
|
|
return item;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
panic!("function_call_output {call_id} not found");
|
|
}
|
|
|
|
pub fn test_codex() -> TestCodexBuilder {
|
|
TestCodexBuilder {
|
|
config_mutators: vec![],
|
|
auth: CodexAuth::from_api_key("dummy"),
|
|
pre_build_hooks: vec![],
|
|
workspace_setups: vec![],
|
|
home: None,
|
|
cloud_requirements: None,
|
|
user_shell_override: None,
|
|
exec_server_url: None,
|
|
}
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
use pretty_assertions::assert_eq;
|
|
use serde_json::json;
|
|
|
|
#[test]
|
|
fn custom_tool_call_output_text_returns_output_text() {
|
|
let bodies = vec![json!({
|
|
"input": [{
|
|
"type": "custom_tool_call_output",
|
|
"call_id": "call-1",
|
|
"output": "hello"
|
|
}]
|
|
})];
|
|
|
|
assert_eq!(custom_tool_call_output_text(&bodies, "call-1"), "hello");
|
|
}
|
|
|
|
#[test]
|
|
#[should_panic(expected = "custom_tool_call_output call-2 missing output")]
|
|
fn custom_tool_call_output_text_panics_when_output_is_missing() {
|
|
let bodies = vec![json!({
|
|
"input": [{
|
|
"type": "custom_tool_call_output",
|
|
"call_id": "call-2"
|
|
}]
|
|
})];
|
|
|
|
let _ = custom_tool_call_output_text(&bodies, "call-2");
|
|
}
|
|
}
|