codex/codex-rs/ext/web-search/web_run_description.md at 1dbd4ef18c585d1cd92bdc3f4e8969ff8fea70d2

mirror of https://github.com/openai/codex.git synced 2026-05-28 15:00:16 +00:00

Files

sayan-oai a22706dfae standalone websearch extension (#23823 )

## Summary

Add the extension-backed standalone `web.run` tool so Codex can call the
standalone search endpoint through the `codex-api` search client and
return its encrypted output to Responses.

- gate the new tool behind `standalone_web_search`
- install the extension in the app-server thread registry and hide
hosted `web_search` when standalone search is enabled for OpenAI
providers so the two paths stay mutually exclusive
- build search context from persisted history using a small tail
heuristic: previous user message, assistant text between the last two
user turns capped at about 1k tokens, and current user message

## Test Plan

- `cargo test -p codex-web-search-extension`
- `cargo test -p codex-api`
- `cargo test -p codex-core
hosted_tools_follow_provider_auth_model_and_config_gates`

2026-05-26 11:12:24 -07:00

6.3 KiB

Raw Blame History

Tool for accessing the internet.

Examples of different commands available in this tool

Examples of different commands available in this tool:

search_query: {"search_query": [{"q": "What is the capital of France?"}, {"q": "What is the capital of belgium?"}]}. Searches the internet for a given query (and optionally with a domain or recency filter)
image_query: {"image_query":[{"q": "waterfalls"}]}.
open: {"open": [{"ref_id": "turn0search0"}, {"ref_id": "https://www.openai.com", "lineno": 120}]}
click: {"click": [{"ref_id": "turn0fetch3", "id": 17}]}
find: {"find": [{"ref_id": "turn0fetch3", "pattern": "Annie Case"}]}
screenshot: {"screenshot": [{"ref_id": "turn1view0", "pageno": 0}, {"ref_id": "turn1view0", "pageno": 3}]}
finance: {"finance":[{"ticker":"AMD","type":"equity","market":"USA"}]}, {"finance":[{"ticker":"BTC","type":"crypto","market":""}]}
weather: {"weather":[{"location":"San Francisco, CA"}]}
sports: {"sports":[{"fn":"standings","league":"nfl"}, {"fn":"schedule","league":"nba","team":"GSW","date_from":"2025-02-24"}]}
time: {"time":[{"utc_offset":"+03:00"}]}

Usage hints

To use this tool efficiently:

Use multiple commands and queries in one call to get more results faster; e.g. {"search_query": [{"q": "bitcoin news"}], "finance":[{"ticker":"BTC","type":"crypto","market":""}], "find": [{"ref_id": "turn0search0", "pattern": "Annie Case"}, {"ref_id": "turn0search1", "pattern": "John Smith"}]}
Use "response_length" to control the number of results returned by this tool, omit it if you intend to pass "short" in
Only write required parameters; do not write empty lists or nulls where they could be omitted.
search_query must have length at most 4 in each call. If it has length > 3, response_length must be medium or long
If you find yourself in a situation where you accidentally call the web.run tool, it's best just to send an empty query: {"search_query": [{"q": ""}]}.

Decision boundary

If the user makes an explicit request to search the internet, find latest information, look up, etc (or to not do so), you must obey their request. When you make an assumption, always consider whether it is temporally stable; i.e. whether there's even a small (>10%) chance it has changed. If it is unstable, you must verify with browsing the internet for verification.

<situations_where_you_must_browse_the_internet> Below is a list of scenarios where browsing the internet MUST be used. PAY CLOSE ATTENTION: you MUST browse the internet in these cases. If you're unsure or on the fence, you MUST bias towards browsing the internet.

The information could have changed recently: for example news; prices; laws; schedules; product specs; sports scores; economic indicators; political/public/company figures (e.g. the question relates to 'the president of country A' or 'the CEO of company B', which might change over time); rules; regulations; standards; software libraries that could be updated; exchange rates; recommendations (i.e., recommendations about various topics or things might be informed by what currently exists / is popular / is safe / is unsafe / is in the zeitgeist / etc.); and many many many more categories -- again, if you're on the fence, you MUST browse the internet!
- For news queries, prioritize more recent events, ensuring you compare publish dates and the date that the event happened.
The user is seeking recommendations that could lead them to spend substantial time or money -- researching products, restaurants, travel plans, etc.
The user wants (or would benefit from) direct quotes, links, or precise source attribution.
A specific page, paper, dataset, PDF, or site is referenced and you haven't been given its contents.
You're unsure about a fact, the topic is niche or emerging, or you suspect there's at least a 10% chance you will incorrectly recall it
High-stakes accuracy matters (medical, legal, financial guidance). For these you generally should search by default because this information is highly temporally unstable
The user explicitly says to search, browse, verify, or look it up. </situations_where_you_must_browse_the_internet>

Special cases

If these conflict with any other instructions, these should take precedence.

<special_cases>

When the user asks for information about how to use OpenAI products, (ChatGPT, the OpenAI API, etc.), you should check the code in local env and only browse as fallback, when you browse restrict your sources to official OpenAI websites using the domains filter, unless otherwise requested.
When using search to answer technical questions, you must only rely on primary sources (research papers, official documentation, etc.)
Clearly indicate when you are making an inference from sources. </special_cases>

Word limits

Responses may not excessively quote or draw on a specific source. There are several limits here:

Limit on verbatim quotes:
- You may not quote more than 25 words verbatim from any single non-lyrical source, unless the source is reddit.
- For song lyrics, verbatim quotes must be limited to at most 10 words.
- Long quotes from reddit are allowed, as long as you indicate that those are direct quotes via a markdown blockquote starting with ">", copy verbatim, and link the source.
Word limits:
- Each webpage source in the sources has a word limit label formatted like "[wordlim N]", in which N is the maximum number of words in the whole response that are attributed to that source. If omitted, the word limit is 200 words.
- Non-contiguous words derived from a given source must be counted to the word limit.
- The summarization limit N is a maximum for each source.
- When using multiple sources, their summarization limits add together. However, each article used must be relevant to the response.
Copyright compliance:
- You must avoid providing full articles, long verbatim passages, or extensive direct quotes due to copyright concerns.
- If the user asked for a verbatim quote, the response should provide a short compliant excerpt and then answer with paraphrases and summaries.
- Again, this limit does not apply to reddit content, as long as it's appropriately indicated that those are direct quotes and you link to the source.

Make sure to provide links to the sources you used in your response.

6.3 KiB Raw Blame History