js_repl: block wrapped payload prefixes in grammar (#12300)

## Summary

Tighten the `js_repl` freeform Lark grammar to block the most common
malformed payload wrappers before they reach runtime validation.

## What Changed

- Replaced the overly permissive `js_repl` freeform grammar (`start:
/[\s\S]*/`) with a structured grammar that still supports:
  - plain JS source
  - optional first-line `// codex-js-repl:` pragma followed by JS source
- Added grammar-level filtering for common bad payload shapes by
rejecting inputs whose first significant token starts with:
  - `{` (JSON object wrapper like `{"code":"..."}`)
  - `"` (quoted code string)
  - `` ``` `` (markdown code fences)
- Implemented the grammar without regex lookahead/lookbehind because the
API-side Lark regex engine does not support look-around.
- Added a unit test to validate the grammar shape and guard against
reintroducing unsupported lookaround.

## Why

`js_repl` is a freeform tool, but the model sometimes emits wrapped
payloads (JSON, quoted strings, markdown fences) instead of raw
JavaScript. We already reject those at runtime, but this change moves
the constraint into the tool grammar so the model is less likely to
generate invalid tool-call payloads in the first place.

## Testing

- `cargo test -p codex-core
js_repl_freeform_grammar_blocks_common_non_js_prefixes`
- `cargo test -p codex-core parse_freeform_args_rejects_`

## Notes

- This intentionally over-blocks a few uncommon valid JS starts (for
example top-level `{ ... }` blocks or top-level quoted directives like
`"use strict";`) in exchange for preventing the common wrapped-payload
mistakes.



#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12300
-  `2` https://github.com/openai/codex/pull/12275
-  `3` https://github.com/openai/codex/pull/12205
-  `4` https://github.com/openai/codex/pull/12185
-  `5` https://github.com/openai/codex/pull/10673
This commit is contained in:
Curtis 'Fjord' Hawthorne
2026-02-20 10:47:07 -08:00
committed by GitHub
parent e8afaed502
commit 73fd939296

View File

@@ -1068,7 +1068,24 @@ fn create_list_dir_tool() -> ToolSpec {
}
fn create_js_repl_tool() -> ToolSpec {
const JS_REPL_FREEFORM_GRAMMAR: &str = r#"start: /[\s\S]*/"#;
// Keep JS input freeform, but block the most common malformed payload shapes
// (JSON wrappers, quoted strings, and markdown fences) before they reach the
// runtime `reject_json_or_quoted_source` validation. The API's regex engine
// does not support look-around, so this uses a "first significant token"
// pattern rather than negative lookaheads.
const JS_REPL_FREEFORM_GRAMMAR: &str = r#"
start: pragma_source | plain_source
pragma_source: PRAGMA_LINE NEWLINE js_source
plain_source: PLAIN_JS_SOURCE
js_source: JS_SOURCE
PRAGMA_LINE: /[ \t]*\/\/ codex-js-repl:[^\r\n]*/
NEWLINE: /\r?\n/
PLAIN_JS_SOURCE: /(?:\s*)(?:[^\s{\"`]|`[^`]|``[^`])[\s\S]*/
JS_SOURCE: /(?:\s*)(?:[^\s{\"`]|`[^`]|``[^`])[\s\S]*/
"#;
ToolSpec::Freeform(FreeformTool {
name: "js_repl".to_string(),
@@ -1922,6 +1939,21 @@ mod tests {
assert_contains_tool_names(&tools, &["js_repl", "js_repl_reset"]);
}
#[test]
fn js_repl_freeform_grammar_blocks_common_non_js_prefixes() {
let ToolSpec::Freeform(FreeformTool { format, .. }) = create_js_repl_tool() else {
panic!("js_repl should use a freeform tool spec");
};
assert_eq!(format.syntax, "lark");
assert!(format.definition.contains("PRAGMA_LINE"));
assert!(format.definition.contains("`[^`]"));
assert!(format.definition.contains("``[^`]"));
assert!(format.definition.contains("PLAIN_JS_SOURCE"));
assert!(format.definition.contains("codex-js-repl:"));
assert!(!format.definition.contains("(?!"));
}
fn assert_model_tools(
model_slug: &str,
features: &Features,