gemini-cli/docs/cli/model-routing.md at 26f050ff10dd90a8c4e6e125effe79d0272aa474

clone/gemini-cli

Fork 0

mirror of https://github.com/google-gemini/gemini-cli.git synced 2026-02-01 22:48:03 +00:00

Files

David Huntsperger 26f050ff10 Updated ToC on docs intro; updated title casing to match Google style (#13717 )

2025-12-01 19:38:48 +00:00

1.8 KiB

Raw Blame History

Model routing

Gemini CLI includes a model routing feature that automatically switches to a fallback model in case of a model failure. This feature is enabled by default and provides resilience when the primary model is unavailable.

How it works

Model routing is not based on prompt complexity, but is a fallback mechanism. Here's how it works:

Model failure: If the currently selected model fails to respond (for example, due to a server error or other issue), the CLI will initiate the fallback process.
User consent: The CLI will prompt you to ask if you want to switch to the fallback model. This is handled by the fallbackModelHandler.
Fallback activation: If you consent, the CLI will activate the fallback mode by calling config.setFallbackMode(true).
Model switch: On the next request, the CLI will use the DEFAULT_GEMINI_FLASH_MODEL as the fallback model. This is handled by the resolveModel function in packages/cli/src/zed-integration/zedIntegration.ts which checks if isInFallbackMode() is true.

Model selection precedence

The model used by Gemini CLI is determined by the following order of precedence:

--model command-line flag: A model specified with the --model flag when launching the CLI will always be used.
GEMINI_MODEL environment variable: If the --model flag is not used, the CLI will use the model specified in the GEMINI_MODEL environment variable.
model.name in settings.json: If neither of the above are set, the model specified in the model.name property of your settings.json file will be used.
Default model: If none of the above are set, the default model will be used. The default model is auto

1.8 KiB Raw Blame History

Model routing

How it works

Model selection precedence

1.8 KiB

Raw Blame History