Parallelize with generalist agent.

2026-02-01 22:48:03 +00:00 · 2026-01-23 09:33:36 -08:00
parent 189d7d8cec
commit fa1709d0e3
4 changed files with 22 additions and 16 deletions
--- a/evals/subagents.eval.ts
+++ b/evals/subagents.eval.ts
@@ -53,8 +53,7 @@ describe('subagent eval test cases', () => {

  evalTest('ALWAYS_PASSES', {
    name: 'should fix linter errors in multiple projects',
-    prompt:
-      'Fix all linter errors by delegating to the codebase investigator in parallel. Do not run any shell commands to verify.',
+    prompt: 'Fix all linter errors.',
    files: {
      'project-a/eslint.config.js': `
        module.exports = [
--- a/packages/core/src/agents/codebase-investigator.ts
+++ b/packages/core/src/agents/codebase-investigator.ts
@@ -10,6 +10,8 @@ import {
  GREP_TOOL_NAME,
  LS_TOOL_NAME,
  READ_FILE_TOOL_NAME,
+  SHELL_TOOL_NAME,
+  WEB_FETCH_TOOL_NAME,
 } from '../tools/tool-names.js';
 import {
  DEFAULT_THINKING_MODE,
@@ -61,8 +63,8 @@ export const CodebaseInvestigatorAgent = (
    name: 'codebase_investigator',
    kind: 'local',
    displayName: 'Codebase Investigator Agent',
-    description: `The specialized tool for codebase analysis, architectural mapping, and understanding system-wide dependencies.
-    Invoke this tool for tasks like vague requests, bug root-cause analysis, system refactoring, comprehensive feature implementation or to answer questions about the codebase that require investigation.
+    description: `The specialized tool for codebase analysis, architectural mapping, understanding system-wide dependencies, and VERIFYING fixes.
+    Invoke this tool for tasks like vague requests, bug root-cause analysis, system refactoring, comprehensive feature implementation or to answer questions about the codebase that require investigation or final verification.
    It returns a structured report with key file paths, symbols, and actionable architectural insights.`,
    inputConfig: {
      inputSchema: {
@@ -109,12 +111,14 @@ export const CodebaseInvestigatorAgent = (
    },

    toolConfig: {
-      // Grant access only to read-only tools.
+      // Grant access to investigation tools.
      tools: [
        LS_TOOL_NAME,
        READ_FILE_TOOL_NAME,
        GLOB_TOOL_NAME,
        GREP_TOOL_NAME,
+        SHELL_TOOL_NAME,
+        WEB_FETCH_TOOL_NAME,
      ],
    },

@@ -139,7 +143,8 @@ You operate in a non-interactive loop and must reason based on the information p
 1.  **DEEP ANALYSIS, NOT JUST FILE FINDING:** Your goal is to understand the *why* behind the code. Don't just list files; explain their purpose and the role of their key components. Your final report should empower another agent to make a correct and complete fix.
 2.  **SYSTEMATIC & CURIOUS EXPLORATION:** Start with high-value clues (like tracebacks or ticket numbers) and broaden your search as needed. Think like a senior engineer doing a code review. An initial file contains clues (imports, function calls, puzzling logic). **If you find something you don't understand, you MUST prioritize investigating it until it is clear.** Treat confusion as a signal to dig deeper.
 3.  **HOLISTIC & PRECISE:** Your goal is to find the complete and minimal set of locations that need to be understood or changed. Do not stop until you are confident you have considered the side effects of a potential fix (e.g., type errors, breaking changes to callers, opportunities for code reuse).
-4.  **Web Search:** You are allowed to use the \`web_fetch\` tool to research libraries, language features, or concepts you don't understand (e.g., "what does gettext.translation do with localedir=None?").
+4.  **Tool Usage:** You are allowed to use the \`run_shell_command\` tool to run linters, tests, or other diagnostic commands to gather information or verify that issues are resolved. Do NOT use it to perform implementation changes.
+5.  **Web Search:** You are allowed to use the \`web_fetch\` tool to research libraries, language features, or concepts you don't understand (e.g., "what does gettext.translation do with localedir=None?").
 </RULES>
 ---
 ## Scratchpad Management
--- a/packages/core/src/agents/generalist-agent.ts
+++ b/packages/core/src/agents/generalist-agent.ts
@@ -23,8 +23,9 @@ export const GeneralistAgent = (
  kind: 'local',
  name: 'generalist',
  displayName: 'Generalist Agent',
-  description:
-    "A general-purpose AI agent with access to all tools. Use it for complex tasks that don't fit into other specialized agents.",
+  description: `A general-purpose AI agent with access to all tools.
+    - ALWAYS use it to break up and parallelize independent pieces of a larger task, when possible.
+    `,
  experimental: true,
  inputConfig: {
    inputSchema: {
--- a/packages/core/src/core/prompts.ts
+++ b/packages/core/src/core/prompts.ts
@@ -123,7 +123,7 @@ export function getCoreSystemPrompt(
  const enableCodebaseInvestigator = config
    .getToolRegistry()
    .getAllToolNames()
-    .includes(CodebaseInvestigatorAgent.name);
+    .includes('codebase_investigator');

  const enableWriteTodosTool = config
    .getToolRegistry()
@@ -162,6 +162,7 @@ export function getCoreSystemPrompt(
 - **Idiomatic Changes:** When editing, understand the local context (imports, functions/classes) to ensure your changes integrate naturally and idiomatically.
 - **Comments:** Add code comments sparingly. Focus on *why* something is done, especially for complex logic, rather than *what* is done. Only add high-value comments if necessary for clarity or if requested by the user. Do not edit comments that are separate from the code you are changing. *NEVER* talk to the user or describe your changes through comments.
 - **Proactiveness:** Fulfill the user's request thoroughly. When adding features or fixing bugs, this includes adding tests to ensure quality. Consider all created files, especially tests, to be permanent artifacts unless the user says otherwise.
+- **Fixing vs. Disabling:** When asked to fix errors (including linter errors) or bugs, you MUST actually resolve the underlying issue. Do NOT use workarounds like disabling linter rules (e.g., via comments like \`eslint-disable\`), commenting out problematic code, suppressing warnings, or merely hiding the symptoms unless explicitly instructed to do so. If the fix involves removing forbidden code (like a \`console.log\` statement), you MUST remove the code entirely rather than commenting it out.
 - ${interactiveMode ? `**Confirm Ambiguity/Expansion:** Do not take significant actions beyond the clear scope of the request without confirming with the user. If asked *how* to do something, explain first, don't just do it.` : `**Handle Ambiguity/Expansion:** Do not take significant actions beyond the clear scope of the request.`}
 - **Explaining Changes:** After completing a code modification or file operation *do not* provide summaries unless asked.
 - **Do Not revert changes:** Do not revert changes to the codebase unless asked to do so by the user. Only revert changes made by you if they have resulted in an error or if the user has explicitly asked you to revert the changes.${
@@ -190,23 +191,23 @@ ${config.getAgentRegistry().getDirectoryContext()}${skillsPrompt}`,
 When requested to perform tasks like fixing bugs, adding features, refactoring, or explaining code, follow this sequence:
 1. **Understand:** Think about the user's request and the relevant codebase context. Use '${GREP_TOOL_NAME}' and '${GLOB_TOOL_NAME}' search tools extensively (in parallel if independent) to understand file structures, existing code patterns, and conventions.
 Use '${READ_FILE_TOOL_NAME}' to understand context and validate any assumptions you may have. If you need to read multiple files, you should make multiple parallel calls to '${READ_FILE_TOOL_NAME}'.
-2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,
+2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. For tasks that can be broken down into independent sub-tasks, leverage the 'generalist' agent to parallelize their execution. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,

      primaryWorkflows_prefix_ci: `
 # Primary Workflows

 ## Software Engineering Tasks
 When requested to perform tasks like fixing bugs, adding features, refactoring, or explaining code, follow this sequence:
-1. **Understand & Strategize:** Think about the user's request and the relevant codebase context. When the task involves **complex refactoring, codebase exploration or system-wide analysis**, your **first and primary action** must be to delegate to the '${CodebaseInvestigatorAgent.name}' agent using the '${CodebaseInvestigatorAgent.name}' tool. Use it to build a comprehensive understanding of the code, its structure, and dependencies. For **simple, targeted searches** (like finding a specific function name, file path, or variable declaration), you should use '${GREP_TOOL_NAME}' or '${GLOB_TOOL_NAME}' directly.
-2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. If '${CodebaseInvestigatorAgent.name}' was used, do not ignore the output of the agent, you must use it as the foundation of your plan. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,
+1. **Understand & Strategize:** Think about the user's request and the relevant codebase context. **CRITICAL:** When the task involves **complex refactoring, codebase exploration, system-wide analysis, or fixing issues across multiple files or projects (including linter errors)**, your **absolute first and primary action** MUST be to delegate to the 'codebase_investigator' agent using the 'codebase_investigator' tool. Do NOT attempt to explore the codebase or run any diagnostic commands (like linters) yourself. You MUST use 'codebase_investigator' to build the initial understanding. Furthermore, after applying fixes for such tasks, you MUST delegate to 'codebase_investigator' again to verify the solution before finishing. You are NOT allowed to perform the final verification yourself using manual commands; you MUST use 'codebase_investigator' for this purpose. For **simple, targeted searches** within a single known file, you may use '${GREP_TOOL_NAME}' or '${GLOB_TOOL_NAME}' directly.
+2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. If '${CodebaseInvestigatorAgent.name}' was used, do not ignore the output of the agent, you must use it as the foundation of your plan. For tasks that can be broken down into independent sub-tasks, leverage the 'generalist' agent to parallelize their execution. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,

      primaryWorkflows_prefix_ci_todo: `
 # Primary Workflows

 ## Software Engineering Tasks
 When requested to perform tasks like fixing bugs, adding features, refactoring, or explaining code, follow this sequence:
-1. **Understand & Strategize:** Think about the user's request and the relevant codebase context. When the task involves **complex refactoring, codebase exploration or system-wide analysis**, your **first and primary action** must be to delegate to the '${CodebaseInvestigatorAgent.name}' agent using the '${CodebaseInvestigatorAgent.name}' tool. Use it to build a comprehensive understanding of the code, its structure, and dependencies. For **simple, targeted searches** (like finding a specific function name, file path, or variable declaration), you should use '${GREP_TOOL_NAME}' or '${GLOB_TOOL_NAME}' directly.
-2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. If '${CodebaseInvestigatorAgent.name}' was used, do not ignore the output of the agent, you must use it as the foundation of your plan. For complex tasks, break them down into smaller, manageable subtasks and use the \`${WRITE_TODOS_TOOL_NAME}\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,
+1. **Understand & Strategize:** Think about the user's request and the relevant codebase context. **CRITICAL:** When the task involves **complex refactoring, codebase exploration, system-wide analysis, or fixing issues across multiple files or projects (including linter errors)**, your **absolute first and primary action** MUST be to delegate to the 'codebase_investigator' agent using the 'codebase_investigator' tool. Do NOT attempt to explore the codebase or run any diagnostic commands (like linters) yourself. You MUST use 'codebase_investigator' to build the initial understanding. Furthermore, after applying fixes for such tasks, you MUST delegate to 'codebase_investigator' again to verify the solution before finishing. You are NOT allowed to perform the final verification yourself using manual commands; you MUST use 'codebase_investigator' for this purpose. For **simple, targeted searches** within a single known file, you may use '${GREP_TOOL_NAME}' or '${GLOB_TOOL_NAME}' directly.
+2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. If '${CodebaseInvestigatorAgent.name}' was used, do not ignore the output of the agent, you must use it as the foundation of your plan. For complex tasks, break them down into smaller, manageable subtasks and use the \`${WRITE_TODOS_TOOL_NAME}\` tool to track your progress. When these subtasks are independent, leverage the 'generalist' agent to execute them in parallel, increasing efficiency. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,

      primaryWorkflows_todo: `
 # Primary Workflows
@@ -214,7 +215,7 @@ When requested to perform tasks like fixing bugs, adding features, refactoring,
 ## Software Engineering Tasks
 When requested to perform tasks like fixing bugs, adding features, refactoring, or explaining code, follow this sequence:
 1. **Understand:** Think about the user's request and the relevant codebase context. Use '${GREP_TOOL_NAME}' and '${GLOB_TOOL_NAME}' search tools extensively (in parallel if independent) to understand file structures, existing code patterns, and conventions. Use '${READ_FILE_TOOL_NAME}' to understand context and validate any assumptions you may have. If you need to read multiple files, you should make multiple parallel calls to '${READ_FILE_TOOL_NAME}'.
-2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`${WRITE_TODOS_TOOL_NAME}\` tool to track your progress. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,
+2. **Plan:** Build a coherent and grounded (based on the understanding in step 1) plan for how you intend to resolve the user's task. For complex tasks, break them down into smaller, manageable subtasks and use the \`${WRITE_TODOS_TOOL_NAME}\` tool to track your progress. When these subtasks are independent, leverage the 'generalist' agent to execute them in parallel, increasing efficiency. Share an extremely concise yet clear plan with the user if it would help the user understand your thought process. As part of the plan, you should use an iterative development process that includes writing unit tests to verify your changes. Use output logs or debug statements as part of this process to arrive at a solution.`,
      primaryWorkflows_suffix: `3. **Implement:** Use the available tools (e.g., '${EDIT_TOOL_NAME}', '${WRITE_FILE_TOOL_NAME}' '${SHELL_TOOL_NAME}' ...) to act on the plan, strictly adhering to the project's established conventions (detailed under 'Core Mandates').
 4. **Verify (Tests):** If applicable and feasible, verify the changes using the project's testing procedures. Identify the correct test commands and frameworks by examining 'README' files, build/package configuration (e.g., 'package.json'), or existing test execution patterns. NEVER assume standard test commands. When executing test commands, prefer "run once" or "CI" modes to ensure the command terminates after completion.
 5. **Verify (Standards):** VERY IMPORTANT: After making code changes, execute the project-specific build, linting and type-checking commands (e.g., 'tsc', 'npm run lint', 'ruff check .') that you have identified for this project (or obtained from the user). This ensures code quality and adherence to standards.${interactiveMode ? " If unsure about these commands, you can ask the user if they'd like you to run them and if so how to." : ''}
@@ -286,7 +287,7 @@ IT IS CRITICAL TO FOLLOW THESE GUIDELINES TO AVOID EXCESSIVE TOKEN CONSUMPTION.
 - **Security First:** Always apply security best practices. Never introduce code that exposes, logs, or commits secrets, API keys, or other sensitive information.

 ## Tool Usage
- **Parallelism:** Execute multiple independent tool calls in parallel when feasible (i.e. searching the codebase).
+- **Parallelism:** Execute multiple independent tool calls in parallel when feasible (i.e. searching the codebase). For tasks that can be decomposed into independent sub-tasks, leverage sub-agents like 'generalist' to parallelize execution and improve efficiency.
 - **Command Execution:** Use the '${SHELL_TOOL_NAME}' tool for running shell commands, remembering the safety rule to explain modifying commands first.
 ${(function () {
  if (interactiveMode) {