mirror of
https://github.com/openai/codex.git
synced 2026-04-24 22:54:54 +00:00
## 🐛 Problem Users running commands with non-ASCII characters (like Russian text "пример") in Windows/WSL environments experience garbled text in VSCode's shell preview window, with Unicode replacement characters (�) appearing instead of the actual text. **Issue**: https://github.com/openai/codex/issues/6178 ## 🔧 Root Cause The issue was in `StreamOutput<Vec<u8>>::from_utf8_lossy()` method in `codex-rs/core/src/exec.rs`, which used `String::from_utf8_lossy()` to convert shell output bytes to strings. This function immediately replaces any invalid UTF-8 byte sequences with replacement characters, without attempting to decode using other common encodings. In Windows/WSL environments, shell output often uses encodings like: - Windows-1252 (common Windows encoding) - Latin-1/ISO-8859-1 (extended ASCII) ## 🛠️ Solution Replaced the simple `String::from_utf8_lossy()` call with intelligent encoding detection via a new `bytes_to_string_smart()` function that tries multiple encoding strategies: 1. **UTF-8** (fast path for valid UTF-8) 2. **Windows-1252** (handles Windows-specific characters in 0x80-0x9F range) 3. **Latin-1** (fallback for extended ASCII) 4. **Lossy UTF-8** (final fallback, same as before) ## 📁 Changes ### New Files - `codex-rs/core/src/text_encoding.rs` - Smart encoding detection module - `codex-rs/core/tests/suite/text_encoding_fix.rs` - Integration tests ### Modified Files - `codex-rs/core/src/lib.rs` - Added text_encoding module - `codex-rs/core/src/exec.rs` - Updated StreamOutput::from_utf8_lossy() - `codex-rs/core/tests/suite/mod.rs` - Registered new test module ## ✅ Testing - **5 unit tests** covering UTF-8, Windows-1252, Latin-1, and fallback scenarios - **2 integration tests** simulating the exact Issue #6178 scenario - **Demonstrates improvement** over the previous `String::from_utf8_lossy()` approach All tests pass: ```bash cargo test -p codex-core text_encoding cargo test -p codex-core test_shell_output_encoding_issue_6178 ``` ## 🎯 Impact - ✅ **Eliminates garbled text** in VSCode shell preview for non-ASCII content - ✅ **Supports Windows/WSL environments** with proper encoding detection - ✅ **Zero performance impact** for UTF-8 text (fast path) - ✅ **Backward compatible** - UTF-8 content works exactly as before - ✅ **Handles edge cases** with robust fallback mechanism ## 🧪 Test Scenarios The fix has been tested with: - Russian text ("пример") - Windows-1252 quotation marks (""test") - Latin-1 accented characters ("café") - Mixed encoding content - Invalid byte sequences (graceful fallback) ## 📋 Checklist - [X] Addresses the reported issue - [X] Includes comprehensive tests - [X] Maintains backward compatibility - [X] Follows project coding conventions - [X] No breaking changes --------- Co-authored-by: Josh McKinney <joshka@openai.com>
62 lines
1.3 KiB
Rust
62 lines
1.3 KiB
Rust
// Aggregates all former standalone integration tests as modules.
|
|
use codex_arg0::arg0_dispatch;
|
|
use ctor::ctor;
|
|
use tempfile::TempDir;
|
|
|
|
// This code runs before any other tests are run.
|
|
// It allows the test binary to behave like codex and dispatch to apply_patch and codex-linux-sandbox
|
|
// based on the arg0.
|
|
// NOTE: this doesn't work on ARM
|
|
#[ctor]
|
|
pub static CODEX_ALIASES_TEMP_DIR: TempDir = unsafe {
|
|
#[allow(clippy::unwrap_used)]
|
|
arg0_dispatch().unwrap()
|
|
};
|
|
|
|
#[cfg(not(target_os = "windows"))]
|
|
mod abort_tasks;
|
|
#[cfg(not(target_os = "windows"))]
|
|
mod apply_patch_cli;
|
|
#[cfg(not(target_os = "windows"))]
|
|
mod approvals;
|
|
mod auth_refresh;
|
|
mod cli_stream;
|
|
mod client;
|
|
mod codex_delegate;
|
|
mod compact;
|
|
mod compact_remote;
|
|
mod compact_resume_fork;
|
|
mod deprecation_notice;
|
|
mod exec;
|
|
mod exec_policy;
|
|
mod fork_conversation;
|
|
mod grep_files;
|
|
mod items;
|
|
mod json_result;
|
|
mod list_dir;
|
|
mod live_cli;
|
|
mod model_overrides;
|
|
mod model_tools;
|
|
mod otel;
|
|
mod prompt_caching;
|
|
mod quota_exceeded;
|
|
mod read_file;
|
|
mod resume;
|
|
mod review;
|
|
mod rmcp_client;
|
|
mod rollout_list_find;
|
|
mod seatbelt;
|
|
mod shell_serialization;
|
|
mod stream_error_allows_next_turn;
|
|
mod stream_no_completed;
|
|
mod text_encoding_fix;
|
|
mod tool_harness;
|
|
mod tool_parallelism;
|
|
mod tools;
|
|
mod truncation;
|
|
mod undo;
|
|
mod unified_exec;
|
|
mod user_notification;
|
|
mod user_shell_cmd;
|
|
mod view_image;
|