feat(llm): cache-policy auto-placement (#26786)

This commit is contained in:
Kit Langton
2026-05-10 22:09:55 -04:00
committed by GitHub
parent ce66b191d1
commit 942630eb4a
13 changed files with 721 additions and 5 deletions

130
packages/llm/README.md Normal file
View File

@@ -0,0 +1,130 @@
# @opencode-ai/llm
Schema-first LLM core for opencode. One typed request, response, event, and tool language; provider quirks live in adapters, not in calling code.
```ts
import { Effect } from "effect"
import { LLM, LLMClient } from "@opencode-ai/llm"
import { OpenAI } from "@opencode-ai/llm/providers"
const model = OpenAI.model("gpt-4o-mini", { apiKey: process.env.OPENAI_API_KEY })
const request = LLM.request({
model,
system: "You are concise.",
prompt: "Say hello in one short sentence.",
generation: { maxTokens: 40 },
})
const program = Effect.gen(function* () {
const response = yield* LLMClient.generate(request)
console.log(response.text)
})
```
Run `LLMClient.stream(request)` instead of `generate` when you want incremental `LLMEvent`s. The event stream is provider-neutral — same shape across OpenAI Chat, OpenAI Responses, Anthropic Messages, Gemini, Bedrock Converse, and any OpenAI-compatible deployment.
## Public API
- **`LLM.request({...})`** — build a provider-neutral `LLMRequest`. Accepts ergonomic inputs (`system: string`, `prompt: string`) that normalize into the canonical Schema classes.
- **`LLM.generate` / `LLM.stream`** — re-exported from `LLMClient` for one-import use.
- **`LLM.user(...)` / `LLM.assistant(...)` / `LLM.toolMessage(...)`** — message constructors.
- **`LLM.toolCall(...)` / `LLM.toolResult(...)` / `LLM.toolDefinition(...)`** — tool-related parts.
- **`LLMClient.prepare(request)`** — compile a request through protocol body construction, validation, and HTTP preparation without sending. Useful for inspection and testing.
- **`LLMEvent.is.*`** — typed guards (`is.text`, `is.toolCall`, `is.requestFinish`, …) for filtering streams.
## Caching
Prompt caching is unified across providers. Mark content with a `CacheHint` and each protocol translates it to its wire format (`cache_control` on Anthropic, `cachePoint` on Bedrock; OpenAI's implicit caching needs no markers).
### Auto placement
The simplest path is `cache: "auto"` on the request:
```ts
LLM.request({
model,
system,
messages,
tools,
cache: "auto",
})
```
`"auto"` places three breakpoints — last tool definition, last system part, latest user message. The last-user-message boundary is the load-bearing detail: in a tool-use loop, a single user turn expands into many assistant/tool round-trips, all sharing that prefix. Caching at that boundary lets every intra-turn API call hit.
On OpenAI and Gemini `"auto"` is a no-op (their wire formats don't accept inline markers — both use implicit caching). On Anthropic and Bedrock it emits provider-native cache markers.
### Granular policy
```ts
cache: {
tools?: boolean,
system?: boolean,
messages?: "latest-user-message" | "latest-assistant" | { tail: number },
ttlSeconds?: number, // ≥ 3600 → 1h on Anthropic/Bedrock; else 5m
}
```
### Manual hints
Inline `CacheHint` on any text / system / tool / tool-result part overrides automatic placement. The auto policy preserves manual hints; it only fills gaps.
```ts
LLM.request({
model,
system: [
{ type: "text", text: "stable system prompt", cache: { type: "ephemeral" } },
],
...
})
```
### Provider behavior table
| Protocol | `cache: "auto"` |
|---|---|
| Anthropic Messages | emits up to 3 `cache_control` markers (4-breakpoint cap enforced) |
| Bedrock Converse | emits up to 3 `cachePoint` blocks (4-breakpoint cap enforced) |
| OpenAI Chat / Responses | no-op (implicit caching above 1024 tokens) |
| Gemini | no-op (implicit caching on 2.5+; explicit `CachedContent` is out-of-band) |
Normalized cache usage is read back into `response.usage.cacheReadInputTokens` and `cacheWriteInputTokens` across every provider.
## Providers
Each provider exports a `model(...)` helper that records identity, protocol, capabilities, auth, and defaults.
```ts
import { Anthropic } from "@opencode-ai/llm/providers"
const model = Anthropic.model("claude-sonnet-4-6", {
apiKey: process.env.ANTHROPIC_API_KEY,
})
```
Included providers: OpenAI, Anthropic, Google (Gemini), Amazon Bedrock, Azure OpenAI, Cloudflare, GitHub Copilot, OpenRouter, xAI, plus generic OpenAI-compatible helpers for DeepSeek, Cerebras, Groq, Fireworks, Together, etc.
## Provider options & HTTP overlays
Three escape hatches in order of stability:
1. **`generation`** — portable knobs (`maxTokens`, `temperature`, `topP`, `topK`, penalties, seed, stop).
2. **`providerOptions: { <provider>: {...} }`** — typed-at-the-facade provider-specific knobs (OpenAI `promptCacheKey`, Anthropic `thinking`, Gemini `thinkingConfig`, OpenRouter routing).
3. **`http: { body, headers, query }`** — last-resort serializable overlays merged into the final HTTP request. Reach for this only when a stable typed path doesn't yet exist.
Model-level defaults are overridden by request-level values for each axis.
## Routes
Adding a new model or deployment is usually 515 lines using `Route.make({ protocol, transport, ... })`. The four orthogonal pieces are protocol (body construction + stream parsing), transport (endpoint + auth + framing + encoding), defaults, and capabilities. See `AGENTS.md` for the architectural detail.
## Effect
This package is built on Effect. Public methods return `Effect` or `Stream`; provide `LLMClient.layer` (the default registers every shipped route) for runtime dispatch. The example at `example/tutorial.ts` is a runnable walkthrough.
## See also
- `AGENTS.md` — architecture, route construction, contributor guide
- `example/tutorial.ts` — runnable end-to-end walkthrough
- `test/provider/*.test.ts` — fixture-first protocol tests; `*.recorded.test.ts` files cover live cassettes

View File

@@ -0,0 +1,120 @@
// Apply an `LLMRequest.cache` policy by injecting `CacheHint`s onto the parts
// the policy designates. Runs once at compile time, before the per-protocol
// body builder, so the existing inline-hint lowering path handles the rest.
//
// The default `"auto"` shape places one breakpoint at the last tool definition,
// one at the last system part, and one at the latest user message. This
// matches what production agent harnesses (LangChain's caching middleware,
// kern-ai's 10x cost-reduction playbook) converge on for tool-use loops: the
// latest user message stays put while a single turn explodes into many
// assistant/tool round-trips, so caching at that boundary lets every
// intra-turn API call hit the prefix.
//
// Manual `cache: CacheHint` placements on individual parts are preserved —
// this function only fills gaps the caller left empty.
import { CacheHint, type CachePolicy, type CachePolicyObject } from "./schema/options"
import { LLMRequest, Message, ToolDefinition, type ContentPart } from "./schema/messages"
const AUTO: CachePolicyObject = {
tools: true,
system: true,
messages: "latest-user-message",
}
const NONE: CachePolicyObject = {}
// Resolution rules:
// - undefined → "none" (opt-in default so the policy never changes wire
// shape for existing callers; downstream code can flip to
// `cache: "auto"` once they audit the placement choices).
// - "auto" → the recommended policy: tools + system + latest user msg.
// - "none" → no auto placement; manual `CacheHint`s still flow.
// - object form → exactly what the caller asked for.
const resolve = (policy: CachePolicy | undefined): CachePolicyObject => {
if (policy === undefined || policy === "none") return NONE
if (policy === "auto") return AUTO
return policy
}
// Protocols whose wire format ignores inline cache markers (OpenAI's implicit
// prefix caching, Gemini's implicit + out-of-band CachedContent). Skip the
// whole policy pass for these — emitting hints would be harmless but pointless.
const RESPECTS_INLINE_HINTS = new Set(["anthropic-messages", "bedrock-converse"])
const makeHint = (ttlSeconds: number | undefined): CacheHint =>
ttlSeconds !== undefined ? new CacheHint({ type: "ephemeral", ttlSeconds }) : new CacheHint({ type: "ephemeral" })
const markLastTool = (
tools: ReadonlyArray<ToolDefinition>,
hint: CacheHint,
): ReadonlyArray<ToolDefinition> => {
if (tools.length === 0) return tools
const last = tools.length - 1
if (tools[last]!.cache) return tools
return tools.map((tool, i) => (i === last ? new ToolDefinition({ ...tool, cache: hint }) : tool))
}
const markLastSystem = (system: LLMRequest["system"], hint: CacheHint): LLMRequest["system"] => {
if (system.length === 0) return system
const last = system.length - 1
if (system[last]!.cache) return system
return system.map((part, i) => (i === last ? { ...part, cache: hint } : part))
}
const lastIndexOfRole = (messages: ReadonlyArray<Message>, role: Message["role"]): number =>
messages.findLastIndex((m) => m.role === role)
// Mark the last text part of `messages[index]`. If no text part exists, mark
// the last content part regardless of type — that's the breakpoint position
// in tool-result-only messages too.
const markMessageAt = (
messages: ReadonlyArray<Message>,
index: number,
hint: CacheHint,
): ReadonlyArray<Message> => {
if (index < 0 || index >= messages.length) return messages
const target = messages[index]!
if (target.content.length === 0) return messages
const lastTextIndex = target.content.findLastIndex((part) => part.type === "text")
const markAt = lastTextIndex >= 0 ? lastTextIndex : target.content.length - 1
const existing = target.content[markAt]!
if ("cache" in existing && existing.cache) return messages
const nextContent = target.content.map((part, i) =>
i === markAt ? ({ ...part, cache: hint } as ContentPart) : part,
)
const next = new Message({ ...target, content: nextContent })
// Single pass over `messages`, substituting the one updated entry. Long
// conversations call this on every request, so avoid `.map()` here — its
// closure dispatch and identity copies show up in profiling.
const result = messages.slice()
result[index] = next
return result
}
const markMessages = (
messages: ReadonlyArray<Message>,
strategy: NonNullable<CachePolicyObject["messages"]>,
hint: CacheHint,
): ReadonlyArray<Message> => {
if (messages.length === 0) return messages
if (strategy === "latest-user-message") return markMessageAt(messages, lastIndexOfRole(messages, "user"), hint)
if (strategy === "latest-assistant") return markMessageAt(messages, lastIndexOfRole(messages, "assistant"), hint)
const start = Math.max(0, messages.length - strategy.tail)
let next = messages
for (let i = start; i < messages.length; i++) next = markMessageAt(next, i, hint)
return next
}
export const applyCachePolicy = (request: LLMRequest): LLMRequest => {
if (!RESPECTS_INLINE_HINTS.has(request.model.route)) return request
const policy = resolve(request.cache)
if (!policy.tools && !policy.system && !policy.messages) return request
const hint = makeHint(policy.ttlSeconds)
const tools = policy.tools ? markLastTool(request.tools, hint) : request.tools
const system = policy.system ? markLastSystem(request.system, hint) : request.system
const messages = policy.messages ? markMessages(request.messages, policy.messages, hint) : request.messages
if (tools === request.tools && system === request.system && messages === request.messages) return request
return LLMRequest.update(request, { tools, system, messages })
}

View File

@@ -8,6 +8,7 @@ import type { Transport, TransportRuntime } from "./transport"
import { WebSocketExecutor } from "./transport"
import type { Service as WebSocketExecutorService } from "./transport/websocket"
import type { Protocol } from "./protocol"
import { applyCachePolicy } from "../cache-policy"
import * as ProviderShared from "../protocols/shared"
import * as ToolRuntime from "../tool-runtime"
import type { Tools } from "../tool"
@@ -400,7 +401,7 @@ export function make<Body, Prepared, Frame, Event, State>(
// validated provider body plus transport-private prepared data, but does not
// execute transport.
const compile = Effect.fn("LLM.compile")(function* (request: LLMRequest) {
const resolved = resolveRequestOptions(request)
const resolved = applyCachePolicy(resolveRequestOptions(request))
const route = registeredRoute(resolved.model.route)
if (!route) return yield* noRoute(resolved.model)

View File

@@ -1,6 +1,6 @@
import { Schema } from "effect"
import { JsonSchema, MessageRole, ProviderMetadata } from "./ids"
import { CacheHint, GenerationOptions, HttpOptions, ModelRef, ProviderOptions } from "./options"
import { CacheHint, CachePolicy, GenerationOptions, HttpOptions, ModelRef, ProviderOptions } from "./options"
const isRecord = (value: unknown): value is Record<string, unknown> =>
typeof value === "object" && value !== null && !Array.isArray(value)
@@ -206,6 +206,7 @@ export class LLMRequest extends Schema.Class<LLMRequest>("LLM.Request")({
providerOptions: Schema.optional(ProviderOptions),
http: Schema.optional(HttpOptions),
responseFormat: Schema.optional(ResponseFormat),
cache: Schema.optional(CachePolicy),
metadata: Schema.optional(Schema.Record(Schema.String, Schema.Unknown)),
}) {}
@@ -223,6 +224,7 @@ export namespace LLMRequest {
providerOptions: request.providerOptions,
http: request.http,
responseFormat: request.responseFormat,
cache: request.cache,
metadata: request.metadata,
})

View File

@@ -200,3 +200,35 @@ export class CacheHint extends Schema.Class<CacheHint>("LLM.CacheHint")({
type: Schema.Literals(["ephemeral", "persistent"]),
ttlSeconds: Schema.optional(Schema.Number),
}) {}
// Auto-placement policy for prompt caching. The protocol-neutral lowering step
// reads this and injects `CacheHint`s at the configured boundaries; the
// per-protocol body builders then translate those hints into wire markers as
// usual. `"auto"` is the recommended default for agent loops — it places one
// breakpoint at the last tool definition, one at the last system part, and one
// at the latest user message. The combination of provider invalidation
// hierarchy (tools → system → messages) and Anthropic/Bedrock's 20-block
// lookback means three trailing breakpoints reliably cover the static prefix.
//
// Pass `"none"` to opt out entirely (the legacy behavior). Pass the granular
// object form to override individual choices.
export const CachePolicyObject = Schema.Struct({
tools: Schema.optional(Schema.Boolean),
system: Schema.optional(Schema.Boolean),
messages: Schema.optional(
Schema.Union([
Schema.Literal("latest-user-message"),
Schema.Literal("latest-assistant"),
Schema.Struct({ tail: Schema.Number }),
]),
),
ttlSeconds: Schema.optional(Schema.Number),
})
export type CachePolicyObject = Schema.Schema.Type<typeof CachePolicyObject>
export const CachePolicy = Schema.Union([
Schema.Literal("auto"),
Schema.Literal("none"),
CachePolicyObject,
])
export type CachePolicy = Schema.Schema.Type<typeof CachePolicy>

View File

@@ -0,0 +1,262 @@
import { describe, expect, test } from "bun:test"
import { Effect } from "effect"
import { CacheHint, LLM } from "../src"
import { LLMClient } from "../src/route"
import * as AnthropicMessages from "../src/protocols/anthropic-messages"
import * as BedrockConverse from "../src/protocols/bedrock-converse"
import * as Gemini from "../src/protocols/gemini"
import * as OpenAIChat from "../src/protocols/openai-chat"
import { applyCachePolicy } from "../src/cache-policy"
import { it } from "./lib/effect"
const anthropicModel = AnthropicMessages.model({
id: "claude-sonnet-4-5",
baseURL: "https://api.anthropic.test/v1/",
headers: { "x-api-key": "test" },
})
const bedrockModel = BedrockConverse.model({
id: "anthropic.claude-3-5-sonnet-20241022-v2:0",
credentials: { region: "us-east-1", accessKeyId: "fixture", secretAccessKey: "fixture" },
})
const openaiModel = OpenAIChat.model({
id: "gpt-4o-mini",
baseURL: "https://api.openai.test/v1/",
headers: { authorization: "Bearer test" },
})
const geminiModel = Gemini.model({
id: "gemini-2.5-flash",
baseURL: "https://generativelanguage.test/v1beta/",
headers: { "x-goog-api-key": "test" },
})
describe("applyCachePolicy", () => {
it.effect("undefined cache leaves the request untouched (opt-in default)", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
system: "You are concise.",
prompt: "hi",
}),
)
expect(prepared.body).toMatchObject({
system: [{ type: "text", text: "You are concise.", cache_control: undefined }],
})
}),
)
it.effect("'auto' marks the last tool, last system part, and latest user message on Anthropic", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
system: "Sys A",
tools: [{ name: "t1", description: "t1", inputSchema: { type: "object", properties: {} } }],
messages: [
LLM.user("first user"),
LLM.assistant("assistant reply"),
LLM.user("latest user message"),
],
cache: "auto",
}),
)
expect(prepared.body).toMatchObject({
tools: [{ name: "t1", cache_control: { type: "ephemeral" } }],
system: [{ type: "text", text: "Sys A", cache_control: { type: "ephemeral" } }],
messages: [
{ role: "user", content: [{ type: "text", text: "first user" }] },
{ role: "assistant", content: [{ type: "text", text: "assistant reply" }] },
{
role: "user",
content: [{ type: "text", text: "latest user message", cache_control: { type: "ephemeral" } }],
},
],
})
}),
)
it.effect("'auto' is a no-op on OpenAI (implicit caching protocol)", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: openaiModel,
system: "Sys",
prompt: "hi",
cache: "auto",
}),
)
const body = prepared.body as { messages: Array<{ content: unknown }> }
// OpenAI doesn't accept cache_control on messages — policy must skip.
const flat = JSON.stringify(body)
expect(flat).not.toContain("cache_control")
expect(flat).not.toContain("cachePoint")
}),
)
it.effect("'auto' is a no-op on Gemini (out-of-band caching protocol)", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: geminiModel,
system: "Sys",
prompt: "hi",
cache: "auto",
}),
)
const flat = JSON.stringify(prepared.body)
expect(flat).not.toContain("cache_control")
expect(flat).not.toContain("cachePoint")
}),
)
it.effect("'auto' on Bedrock emits cachePoint markers in the right places", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: bedrockModel,
system: "Sys",
tools: [{ name: "t1", description: "t1", inputSchema: { type: "object", properties: {} } }],
messages: [LLM.user("first user"), LLM.assistant("reply"), LLM.user("latest user")],
cache: "auto",
}),
)
expect(prepared.body).toMatchObject({
toolConfig: {
tools: [{ toolSpec: { name: "t1" } }, { cachePoint: { type: "default" } }],
},
system: [{ text: "Sys" }, { cachePoint: { type: "default" } }],
messages: [
{ role: "user", content: [{ text: "first user" }] },
{ role: "assistant", content: [{ text: "reply" }] },
{ role: "user", content: [{ text: "latest user" }, { cachePoint: { type: "default" } }] },
],
})
}),
)
it.effect("'none' disables auto placement even when manual hints exist", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
system: "Sys",
tools: [{ name: "t1", description: "t1", inputSchema: { type: "object", properties: {} } }],
prompt: "hi",
cache: "none",
}),
)
expect(prepared.body).toMatchObject({
tools: [{ name: "t1", cache_control: undefined }],
system: [{ type: "text", text: "Sys", cache_control: undefined }],
})
}),
)
it.effect("granular object form: tools-only marks just tools", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
system: "Sys",
tools: [{ name: "t1", description: "t1", inputSchema: { type: "object", properties: {} } }],
prompt: "hi",
cache: { tools: true },
}),
)
expect(prepared.body).toMatchObject({
tools: [{ name: "t1", cache_control: { type: "ephemeral" } }],
system: [{ type: "text", text: "Sys", cache_control: undefined }],
})
}),
)
it.effect("auto policy preserves manual CacheHints on other parts", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
system: [
{ type: "text", text: "first system", cache: new CacheHint({ type: "ephemeral", ttlSeconds: 3600 }) },
{ type: "text", text: "last system" },
],
prompt: "hi",
cache: "auto",
}),
)
const body = prepared.body as { system: Array<{ text: string; cache_control?: unknown }> }
expect(body.system[0]?.cache_control).toEqual({ type: "ephemeral", ttl: "1h" })
expect(body.system[1]?.cache_control).toEqual({ type: "ephemeral" })
}),
)
it.effect("ttlSeconds in the policy flows through to wire markers", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
system: "Sys",
prompt: "hi",
cache: { system: true, ttlSeconds: 3600 },
}),
)
expect(prepared.body).toMatchObject({
system: [{ type: "text", text: "Sys", cache_control: { type: "ephemeral", ttl: "1h" } }],
})
}),
)
it.effect("messages: { tail: 2 } marks the last 2 message boundaries", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
messages: [LLM.user("u1"), LLM.assistant("a1"), LLM.user("u2"), LLM.assistant("a2")],
cache: { messages: { tail: 2 } },
}),
)
const body = prepared.body as { messages: Array<{ content: Array<{ cache_control?: unknown }> }> }
expect(body.messages[0]?.content[0]?.cache_control).toBeUndefined()
expect(body.messages[1]?.content[0]?.cache_control).toBeUndefined()
expect(body.messages[2]?.content[0]?.cache_control).toEqual({ type: "ephemeral" })
expect(body.messages[3]?.content[0]?.cache_control).toEqual({ type: "ephemeral" })
}),
)
it.effect("'latest-assistant' marks the last assistant message", () =>
Effect.gen(function* () {
const prepared = yield* LLMClient.prepare(
LLM.request({
model: anthropicModel,
messages: [LLM.user("u1"), LLM.assistant("a1"), LLM.user("u2")],
cache: { messages: "latest-assistant" },
}),
)
const body = prepared.body as { messages: Array<{ content: Array<{ cache_control?: unknown }> }> }
expect(body.messages[0]?.content[0]?.cache_control).toBeUndefined()
expect(body.messages[1]?.content[0]?.cache_control).toEqual({ type: "ephemeral" })
expect(body.messages[2]?.content[0]?.cache_control).toBeUndefined()
}),
)
test("returns the same request reference when policy is a no-op (pure function)", () => {
const request = LLM.request({
model: anthropicModel,
prompt: "hi",
})
expect(applyCachePolicy(request)).toBe(request)
})
})

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -28,7 +28,12 @@ const recorded = recordedTests({
provider: "anthropic",
protocol: "anthropic-messages",
requires: ["ANTHROPIC_API_KEY"],
options: { redactor: Redactor.defaults({ requestHeaders: { allow: ["content-type", "anthropic-version"] } }) },
// Two identical requests in one cassette — match by recording order so the
// second call replays the cached-hit interaction.
options: {
dispatch: "sequential",
redactor: Redactor.defaults({ requestHeaders: { allow: ["content-type", "anthropic-version"] } }),
},
})
describe("Anthropic Messages cache recorded", () => {

View File

@@ -35,6 +35,9 @@ const recorded = recordedTests({
provider: "amazon-bedrock",
protocol: "bedrock-converse",
requires: ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"],
// Two identical requests in one cassette — match by recording order so the
// second call replays the cached-hit interaction.
options: { dispatch: "sequential" },
})
describe("Bedrock Converse cache recorded", () => {

View File

@@ -8,7 +8,7 @@ import { recordedTests } from "../recorded-test"
const model = Gemini.model({
id: "gemini-2.5-flash",
apiKey: process.env.GEMINI_API_KEY ?? "fixture",
apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY ?? process.env.GEMINI_API_KEY ?? "fixture",
})
// Gemini does implicit prefix caching on 2.5+ models above ~1024 tokens. The
@@ -28,7 +28,10 @@ const recorded = recordedTests({
prefix: "gemini-cache",
provider: "google",
protocol: "gemini",
requires: ["GEMINI_API_KEY"],
requires: ["GOOGLE_GENERATIVE_AI_API_KEY"],
// Two identical requests in one cassette — match by recording order so the
// second call replays the cached-hit interaction.
options: { dispatch: "sequential" },
})
describe("Gemini cache recorded", () => {

View File

@@ -29,6 +29,9 @@ const recorded = recordedTests({
provider: "openai",
protocol: "openai-responses",
requires: ["OPENAI_API_KEY"],
// Two identical requests in one cassette — match by recording order so the
// second call replays the cached-hit interaction, not the cold-miss one.
options: { dispatch: "sequential" },
})
describe("OpenAI Responses cache recorded", () => {