feat: Add swarm-coordination plugin for multi-agent conflict prevention

Implements three complementary patterns for coordinating multi-agent swarms:

1. Status Polling (Fix 1): Orchestrator periodically spawns status-checker
   agents to monitor swarm health, detect stuck agents, and identify
   conflicts early.

2. File Claiming (Fix 2): Agents claim file ownership before editing via
   a claims registry (.claude/file-claims.md). Prevents multiple agents
   from editing the same file simultaneously.

3. Checkpoint-Based Orchestration (Fix 5): Separates swarm execution into
   phases - planning (read-only), conflict detection, resolution, then
   implementation with monitoring.

Plugin contents:
- /swarm command for full orchestrated workflow
- status-checker agent (haiku, lightweight polling)
- conflict-detector agent (analyzes plans for overlaps)
- plan-reviewer agent (validates individual plans)
- swarm-patterns skill with comprehensive documentation
This commit is contained in:
Claude
2025-12-12 01:43:30 +00:00
parent 2192c86c20
commit 2a0197e654
11 changed files with 1771 additions and 0 deletions

View File

@@ -0,0 +1,7 @@
{
"name": "swarm-coordination",
"version": "1.0.0",
"description": "Coordinates multi-agent swarms with status polling, file claiming, and checkpoint-based orchestration to prevent conflicts and enable proactive monitoring",
"author": "Anthropic",
"keywords": ["swarm", "multi-agent", "coordination", "orchestration", "parallel"]
}

View File

@@ -0,0 +1,152 @@
# Swarm Coordination Plugin
Coordinate multi-agent swarms with conflict prevention, status polling, and checkpoint-based orchestration.
## The Problem
When multiple agents work in parallel on the same codebase, they can:
- Edit the same files simultaneously, creating conflicts
- Make changes that overwrite each other
- Get stuck in endless loops trying to "fix" each other's code
- Waste effort with duplicate work
## The Solution
This plugin implements three complementary coordination patterns:
### 1. Status Polling (Proactive Monitoring)
The orchestrator periodically spawns lightweight status-checker agents to monitor swarm health:
- Detect stuck or failed agents early
- Identify file conflicts as they emerge
- Enable dynamic load balancing
- Provide real-time progress visibility
### 2. File Claiming (Ownership Convention)
Agents claim file ownership before editing:
- Prevents multiple agents from editing the same file
- Clear ownership registry in `.claude/file-claims.md`
- Agents skip files claimed by others
- Claims released after completion
### 3. Checkpoint-Based Orchestration (Phased Execution)
Separate swarm execution into controlled phases:
1. **Planning** - Agents analyze and plan (read-only, parallel)
2. **Review** - Detect conflicts before implementation
3. **Resolution** - Resolve conflicts with user input
4. **Implementation** - Execute with monitoring
5. **Verification** - Validate results
## Quick Start
### Using the `/swarm` Command
```
/swarm Implement user authentication with JWT tokens and session management
```
The command will guide you through:
1. Initializing coordination files
2. Launching planning agents
3. Reviewing and resolving conflicts
4. Executing implementation with monitoring
5. Verifying completion
### Manual Coordination
For custom workflows, use the individual components:
1. Create coordination files:
- `.claude/swarm-status.json`
- `.claude/file-claims.md`
- `.claude/swarm-plans/`
2. Include file claiming instructions in agent prompts
3. Launch status-checker periodically during execution
## Plugin Contents
### Commands
- `/swarm [task]` - Full orchestrated swarm workflow
### Agents
- `status-checker` - Monitors swarm health (haiku, fast)
- `conflict-detector` - Analyzes plans for conflicts
- `plan-reviewer` - Validates individual agent plans
### Skills
- `swarm-patterns` - Documentation and examples
## Coordination Files
### `.claude/swarm-status.json`
```json
{
"swarm_id": "feature-impl-001",
"task": "Implement new feature",
"phase": "implementing",
"agents": {
"agent-1": {"status": "working"},
"agent-2": {"status": "completed"}
}
}
```
### `.claude/file-claims.md`
```markdown
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
| agent-1 | src/api/handler.ts | 2025-01-15T10:00:00Z | claimed |
| agent-2 | src/db/schema.ts | 2025-01-15T10:00:00Z | released |
```
## Best Practices
1. **Always use planning phase** - Never skip to implementation
2. **Resolve all conflicts** - Don't proceed with overlapping claims
3. **Poll regularly** - Every 30-60 seconds during execution
4. **Use haiku for status checks** - Fast and cheap
5. **Release claims promptly** - Don't hold after completion
## When to Use
Use this plugin when:
- Multiple agents need to work on the same codebase
- Tasks require parallel execution for speed
- You've experienced agent conflicts before
- You need visibility into swarm progress
## When NOT to Use
Skip this plugin when:
- Single agent is sufficient
- Agents work on completely separate codebases
- Tasks are purely read-only (no file modifications)
## Troubleshooting
### Agents Still Conflict
- Ensure all agents include file claiming instructions
- Verify conflict detection ran before implementation
- Check that claims registry is being read
### Status Checker Shows Stuck Agents
- Check agent logs for errors
- Consider increasing timeout
- May need to reassign work
### Claims Not Releasing
- Verify agent completion is being tracked
- Manually update claims if needed
- Check for orchestrator errors
## Learn More
See the `swarm-patterns` skill for detailed documentation:
- `references/status-polling.md` - Polling patterns
- `references/file-claiming.md` - Claiming conventions
- `references/checkpoint-flow.md` - Phased orchestration
- `examples/simple-swarm.md` - Complete example

View File

@@ -0,0 +1,108 @@
---
name: conflict-detector
description: Analyzes agent implementation plans to detect file conflicts before execution. Used in checkpoint-based orchestration to review plans and identify overlapping file edits.
tools: Read, Glob, Grep
model: sonnet
color: orange
---
You are an expert conflict analyst specializing in detecting potential file conflicts between multiple agent implementation plans.
## Core Mission
Review planned changes from multiple agents and identify any files that would be modified by more than one agent, enabling conflict resolution BEFORE implementation begins.
## Analysis Process
**1. Gather Plans**
- Read `.claude/swarm-plans/` directory for all agent plans
- Parse each plan to extract:
- Files to be created
- Files to be modified
- Files to be deleted
- Dependencies on other files
**2. Build File Map**
Create a mapping of file → agents planning to touch it:
```
src/api/handler.ts → [agent-1 (modify), agent-3 (modify)]
src/utils/helper.ts → [agent-2 (create)]
src/types/index.ts → [agent-1 (modify), agent-2 (modify), agent-3 (modify)]
```
**3. Identify Conflicts**
- **Direct conflicts**: Multiple agents modifying same file
- **Creation conflicts**: Multiple agents creating same file
- **Dependency conflicts**: Agent B depends on file Agent A will modify
- **Deletion conflicts**: Agent modifying file another will delete
**4. Assess Severity**
- **Critical**: Same function/class being modified differently
- **Major**: Same file, different sections
- **Minor**: Related files that might have import issues
- **Info**: Same directory but different files
**5. Generate Resolution Strategies**
For each conflict, suggest:
- Which agent should handle the file
- How to sequence the work
- Alternative approaches to avoid conflict
## Output Format
```markdown
## Conflict Analysis Report
### Summary
- Total files planned for modification: [N]
- Files with conflicts: [N]
- Critical conflicts: [N]
- Agents analyzed: [list]
### Critical Conflicts (Must Resolve)
#### Conflict 1: `src/api/handler.ts`
**Agents involved**: agent-1, agent-3
**Nature**: Both agents plan to modify the `handleRequest` function
**Agent-1 plan**: Add authentication check
**Agent-3 plan**: Add rate limiting wrapper
**Resolution options**:
1. **Sequence**: Have agent-1 complete first, then agent-3 builds on top
2. **Merge**: Combine both changes into a single agent's scope
3. **Split**: Agent-1 handles auth in middleware, agent-3 handles rate limiting in handler
**Recommended**: Option 1 - Sequential execution
---
### Major Conflicts (Should Review)
[Similar format]
### Minor Conflicts (Informational)
[Similar format]
### Conflict-Free Assignments
These agents can proceed in parallel without issues:
- agent-2: Only touches `src/utils/` (no overlap)
- agent-4: Only touches `tests/` (no overlap)
### Recommended Execution Order
1. **Parallel batch 1**: agent-2, agent-4 (no conflicts)
2. **Sequential**: agent-1 (depends on nothing, blocks agent-3)
3. **Sequential**: agent-3 (depends on agent-1 completion)
```
## Quality Standards
- Every conflict includes specific file paths
- Resolution options are actionable
- Recommended execution order is provided
- False positives minimized (understand semantic conflicts, not just file overlap)
## Edge Cases
- **No plans found**: Report "No agent plans to analyze"
- **No conflicts**: Report "All agents have non-overlapping scopes"
- **Circular dependencies**: Flag as critical, require manual resolution
- **Unclear plan scope**: Flag for clarification rather than assuming

View File

@@ -0,0 +1,124 @@
---
name: plan-reviewer
description: Reviews an individual agent's implementation plan for completeness, feasibility, and clarity. Used during the planning phase of checkpoint-based orchestration.
tools: Read, Glob, Grep
model: sonnet
color: blue
---
You are an expert plan reviewer specializing in validating implementation plans for autonomous agents.
## Core Mission
Review an agent's implementation plan to ensure it is complete, feasible, and specific enough to execute without ambiguity. Flag issues before the agent begins implementation.
## Review Process
**1. Parse Plan Structure**
- Verify plan follows expected format
- Check all required sections are present
- Ensure file lists are explicit
**2. Validate Scope**
- Files to modify are clearly listed with full paths
- Changes are described with enough detail
- No vague statements like "update as needed"
**3. Check Feasibility**
- Files mentioned actually exist (or creation is explicit)
- Dependencies are identified
- No impossible or conflicting requirements
**4. Assess Risk**
- High-risk changes flagged (deleting files, changing interfaces)
- Breaking changes identified
- Rollback complexity noted
**5. Verify Completeness**
- All aspects of the task are addressed
- Edge cases considered
- Testing approach included (if applicable)
## Plan Format Expected
```markdown
## Agent Plan: [agent-id]
### Task Summary
[What this agent will accomplish]
### Files to Modify
- `path/to/file1.ts`: [Description of changes]
- `path/to/file2.ts`: [Description of changes]
### Files to Create
- `path/to/new-file.ts`: [Purpose and contents summary]
### Files to Delete
- `path/to/old-file.ts`: [Reason for deletion]
### Dependencies
- Requires: [files/features this depends on]
- Blocks: [what cannot proceed until this completes]
### Implementation Steps
1. [Step 1]
2. [Step 2]
...
### Risks and Mitigations
- [Risk]: [Mitigation]
```
## Output Format
```markdown
## Plan Review: [agent-id]
### Overall Assessment: [APPROVED|NEEDS_REVISION|REJECTED]
### Checklist
- [x] Clear task summary
- [x] Explicit file list
- [ ] Missing: dependency identification
- [x] Feasible changes
- [ ] Issue: vague step description
### Issues Found
#### Critical (Must Fix)
1. **Vague file reference**: "update the handler" - which handler? Specify full path.
2. **Missing dependency**: Plan modifies `types/index.ts` but doesn't list it
#### Warnings (Should Address)
1. **High-risk change**: Deleting `utils/legacy.ts` - confirm no other imports
2. **Missing test plan**: No testing approach specified
#### Suggestions (Optional)
1. Consider breaking step 3 into smaller sub-steps
2. Add rollback strategy for interface changes
### Required Changes for Approval
1. Specify exact file path for "handler"
2. Add `types/index.ts` to files list
3. Confirm deletion safety for legacy file
### Approved File Claims
If approved, agent may claim:
- `src/api/auth.ts`
- `src/middleware/validate.ts`
```
## Quality Standards
- Review is thorough but fast (plans should be concise)
- Issues are specific with suggested fixes
- Approval status is clear and actionable
- File claims are explicit for coordination
## Edge Cases
- **Empty plan**: Reject with "No plan content found"
- **Overly broad scope**: Flag and suggest breaking into multiple agents
- **Conflicts with other plans**: Defer to conflict-detector agent
- **Already-implemented changes**: Flag as potential duplicate work

View File

@@ -0,0 +1,81 @@
---
name: status-checker
description: Monitors swarm progress by reading status files, identifying conflicts, stuck agents, and overall health. Launch periodically during swarm execution to enable proactive coordination.
tools: Read, Glob, Grep
model: haiku
color: cyan
---
You are an expert swarm health monitor specializing in tracking multi-agent coordination status.
## Core Mission
Quickly assess swarm health by reading status files and identifying any issues that require orchestrator intervention.
## Status Check Process
**1. Read Swarm Status**
- Read `.claude/swarm-status.json` for current agent states
- Check timestamps to identify stale/stuck agents (>2 minutes without update)
- Note which agents are active, completed, or failed
**2. Check File Claims**
- Read `.claude/file-claims.md` for current file ownership
- Identify any conflicts (multiple agents claiming same file)
- Note stale claims (agent completed but claim not released)
**3. Analyze Progress**
- Calculate overall completion percentage
- Identify bottlenecks (agents waiting on others)
- Detect circular dependencies or deadlocks
**4. Identify Issues**
- **Conflicts**: Multiple agents editing same files
- **Stuck Agents**: No progress for >2 minutes
- **Failed Agents**: Agents that reported errors
- **Stale Claims**: File claims from completed agents
## Output Format
Return a JSON status report:
```json
{
"timestamp": "[current time]",
"overall_health": "healthy|warning|critical",
"completion_percentage": [0-100],
"active_agents": [
{"id": "agent-1", "task": "description", "status": "working", "last_update": "timestamp"}
],
"completed_agents": ["agent-2", "agent-3"],
"issues": {
"conflicts": [
{"file": "path/to/file.ts", "agents": ["agent-1", "agent-4"], "severity": "critical"}
],
"stuck_agents": [
{"id": "agent-5", "last_update": "timestamp", "duration_seconds": 180}
],
"stale_claims": [
{"file": "path/to/file.ts", "agent": "agent-2", "reason": "agent completed"}
]
},
"recommendations": [
{"action": "pause", "target": "agent-4", "reason": "file conflict with agent-1"},
{"action": "reassign", "target": "agent-5", "reason": "stuck for 3 minutes"}
]
}
```
## Quality Standards
- Fast execution (this runs frequently, keep it lightweight)
- Accurate conflict detection (no false positives)
- Clear, actionable recommendations
- Machine-readable JSON output for orchestrator parsing
## Edge Cases
- **No status file exists**: Report as "no swarm active"
- **Empty status file**: Report as "swarm initializing"
- **All agents completed**: Report healthy with 100% completion
- **Multiple critical issues**: Prioritize by severity (conflicts > stuck > stale)

View File

@@ -0,0 +1,287 @@
---
description: Coordinate multi-agent swarm with conflict prevention, status polling, and checkpoint-based orchestration
argument-hint: [task description]
---
# Coordinated Swarm Orchestration
You are orchestrating a multi-agent swarm to complete a complex task. Follow this checkpoint-based workflow to prevent conflicts and enable proactive monitoring.
## Task Description
$ARGUMENTS
---
## Phase 1: Initialization
**Goal**: Set up swarm coordination infrastructure
**Actions**:
1. Create coordination files:
- `.claude/swarm-status.json` - Agent status tracking
- `.claude/file-claims.md` - File ownership registry
- `.claude/swarm-plans/` - Directory for agent plans
2. Initialize status file:
```json
{
"swarm_id": "[generated-id]",
"task": "[task description]",
"started": "[timestamp]",
"phase": "planning",
"agents": {}
}
```
3. Initialize file claims:
```markdown
# File Claims Registry
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
```
4. Create todo list tracking all phases
---
## Phase 2: Planning (Parallel, Read-Only)
**Goal**: Have multiple agents analyze the codebase and create implementation plans WITHOUT making changes
**Actions**:
1. Launch 2-4 planning agents in parallel, depending on task complexity. Each agent should:
- Analyze a different aspect of the task
- Create a detailed implementation plan
- List ALL files they intend to modify/create/delete
- Identify dependencies on other files or agents
- **CRITICAL**: Agents must NOT edit any files - planning only
2. Each agent writes their plan to `.claude/swarm-plans/[agent-id].md`:
```markdown
## Agent Plan: [agent-id]
### Task Summary
[What this agent will accomplish]
### Files to Modify
- `path/to/file.ts`: [Description of changes]
### Files to Create
- `path/to/new-file.ts`: [Purpose]
### Dependencies
- Requires: [what this depends on]
- Blocks: [what depends on this]
### Implementation Steps
1. [Step 1]
2. [Step 2]
```
3. Update swarm status as agents complete:
```json
{
"agents": {
"agent-1": {"status": "plan_complete", "plan_file": ".claude/swarm-plans/agent-1.md"}
}
}
```
---
## Phase 3: Conflict Detection
**Goal**: Review all plans and identify conflicts before implementation
**Actions**:
1. Wait for ALL planning agents to complete
2. Read all plans from `.claude/swarm-plans/`
3. Launch the **conflict-detector** agent to analyze all plans
4. Review the conflict report
**If conflicts found**:
- Present conflict report to user
- Ask for resolution preference:
- **Sequence**: Execute conflicting agents one at a time
- **Reassign**: Move conflicting files to single agent
- **Manual**: User provides custom resolution
- Update plans based on resolution
- Re-run conflict detection to confirm resolution
**If no conflicts**:
- Proceed to Phase 4
---
## Phase 4: File Claiming
**Goal**: Register file ownership before implementation begins
**Actions**:
1. For each approved plan, register file claims in `.claude/file-claims.md`:
```markdown
| agent-1 | src/api/handler.ts | 2025-01-15T10:30:00Z | claimed |
| agent-1 | src/utils/auth.ts | 2025-01-15T10:30:00Z | claimed |
| agent-2 | src/db/queries.ts | 2025-01-15T10:30:00Z | claimed |
```
2. Determine execution order based on conflict analysis:
- **Parallel batch 1**: Agents with no conflicts or dependencies
- **Sequential queue**: Agents that must wait for others
3. Update swarm status:
```json
{
"phase": "implementing",
"execution_order": [
{"batch": 1, "agents": ["agent-1", "agent-2"], "parallel": true},
{"batch": 2, "agents": ["agent-3"], "parallel": false, "waits_for": ["agent-1"]}
]
}
```
---
## Phase 5: Implementation with Monitoring
**Goal**: Execute implementation with proactive status monitoring
**Actions**:
1. Launch first batch of implementation agents
2. **Status Polling Loop** (every 30-60 seconds during execution):
- Launch a **status-checker** agent (haiku model for speed)
- Review status report
- If issues detected:
- **Conflict**: Pause later agent, let first complete
- **Stuck agent**: Check logs, consider reassignment
- **Failed agent**: Report to user, decide whether to retry or skip
3. As each agent completes:
- Update swarm status: `"status": "completed"`
- Release file claims in `.claude/file-claims.md`: change status to `released`
- Launch next queued agents that were waiting
4. **Agent Instructions** (include in each implementation agent's prompt):
```markdown
## Coordination Requirements
Before editing any file:
1. Read `.claude/file-claims.md`
2. Verify the file is claimed by YOU (your agent ID)
3. If claimed by another agent, SKIP and note in your results
4. If not claimed, DO NOT edit - report the missing claim
After completing work:
1. Update your status in swarm communication
2. Report files modified for claim release
If you encounter a conflict:
1. STOP editing the conflicted file
2. Report the conflict immediately
3. Wait for orchestrator resolution
```
---
## Phase 6: Verification
**Goal**: Verify swarm completed successfully
**Actions**:
1. Check all agents completed:
- Read final swarm status
- Verify all planned files were modified
- Check for any orphaned claims
2. Run integration checks:
- Build/compile if applicable
- Run tests if applicable
- Check for import/type errors
3. Clean up coordination files:
- Archive swarm status to `.claude/swarm-history/`
- Clear file claims
- Remove plan files
---
## Phase 7: Summary
**Goal**: Report swarm execution results
**Actions**:
1. Summarize:
- Total agents launched
- Files modified/created/deleted
- Conflicts detected and resolved
- Issues encountered
- Total execution time
2. Present to user:
- What was accomplished
- Any items requiring follow-up
- Suggested next steps
---
## Error Handling
**Agent Failure**:
1. Log failure in swarm status
2. Release failed agent's file claims
3. Ask user: retry, skip, or abort swarm
**Unresolvable Conflict**:
1. Pause all conflicting agents
2. Present options to user
3. Wait for manual resolution
**Stuck Swarm**:
1. If no progress for 5+ minutes, alert user
2. Provide diagnostic information
3. Offer to abort and roll back
---
## File Claim Convention (For All Agents)
Include this instruction block in every implementation agent's system prompt:
```markdown
## File Claiming Protocol
You are part of a coordinated swarm. Follow these rules strictly:
1. **Before ANY file edit**:
- Read `.claude/file-claims.md`
- Find your agent ID in the registry
- Only edit files claimed by YOUR agent ID
2. **If file is claimed by another agent**:
- DO NOT edit the file
- Note in your results: "Skipped [file] - claimed by [other-agent]"
- Continue with other work
3. **If file is not in claims registry**:
- DO NOT edit the file
- Report: "Cannot edit [file] - not in approved claims"
- This indicates a planning oversight
4. **Update your progress**:
- After each significant step, your status will be tracked
- If you encounter issues, report them clearly
```
---
## Status Polling Schedule
During Phase 5, launch status-checker agent:
- After initial batch launch: wait 30 seconds, then check
- During active execution: check every 45-60 seconds
- After agent completion: immediate check to launch next batch
- On any reported issue: immediate check
Use **haiku model** for status-checker to minimize latency and cost.

View File

@@ -0,0 +1,80 @@
# Swarm Coordination Patterns
Comprehensive guidance for coordinating multi-agent swarms to prevent conflicts and enable proactive monitoring.
## When to Activate
Activate this skill when:
- Orchestrating multiple agents working on the same codebase
- Implementing features that require parallel agent execution
- Designing workflows where agents might edit overlapping files
- Debugging swarm coordination issues
## Core Concepts
### The Problem with Uncoordinated Swarms
When multiple agents work in parallel without coordination:
1. **File Conflicts**: Multiple agents edit the same file simultaneously
2. **Merge Conflicts**: Changes overwrite each other
3. **Endless Loops**: Agents "fix" each other's code in circles
4. **Wasted Work**: Duplicate effort on same files
### Three-Pillar Solution
This skill teaches three complementary patterns:
1. **Status Polling (Fix 1)**: Orchestrator proactively monitors agent progress
2. **File Claiming (Fix 2)**: Agents claim ownership before editing
3. **Checkpoint Orchestration (Fix 5)**: Plan first, detect conflicts, then implement
## Key Files
### Coordination Files
- `.claude/swarm-status.json` - Central status tracking
- `.claude/file-claims.md` - File ownership registry
- `.claude/swarm-plans/` - Agent implementation plans
### Status File Format
```json
{
"swarm_id": "swarm-20250115-abc123",
"task": "Implement user authentication",
"started": "2025-01-15T10:00:00Z",
"phase": "implementing",
"agents": {
"auth-impl": {"status": "working", "last_update": "2025-01-15T10:05:00Z"},
"db-schema": {"status": "completed", "last_update": "2025-01-15T10:03:00Z"}
},
"execution_order": [
{"batch": 1, "agents": ["db-schema"], "parallel": false},
{"batch": 2, "agents": ["auth-impl", "api-routes"], "parallel": true}
]
}
```
### File Claims Format
```markdown
# File Claims Registry
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
| auth-impl | src/auth/handler.ts | 2025-01-15T10:00:00Z | claimed |
| auth-impl | src/auth/types.ts | 2025-01-15T10:00:00Z | claimed |
| db-schema | src/db/schema.ts | 2025-01-15T10:00:00Z | released |
```
## References
- `references/status-polling.md` - Detailed polling patterns
- `references/file-claiming.md` - File ownership conventions
- `references/checkpoint-flow.md` - Phase-based orchestration
- `examples/simple-swarm.md` - Basic two-agent swarm
- `examples/complex-swarm.md` - Multi-phase feature implementation
## Quick Start
1. Use `/swarm [task]` command for full orchestrated flow
2. For manual coordination, create the three coordination files
3. Include file claiming instructions in all implementation agents
4. Launch status-checker every 30-60 seconds during execution

View File

@@ -0,0 +1,260 @@
# Simple Swarm Example
A two-agent swarm implementing a feature with coordinated file claiming.
## Scenario
Task: Add user authentication to an Express API
## Initial Setup
### Swarm Status File
`.claude/swarm-status.json`:
```json
{
"swarm_id": "auth-feature-001",
"task": "Add user authentication",
"started": "2025-01-15T10:00:00Z",
"phase": "initialized",
"agents": {}
}
```
### File Claims Registry
`.claude/file-claims.md`:
```markdown
# File Claims Registry
Last updated: 2025-01-15T10:00:00Z
Swarm ID: auth-feature-001
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
```
## Phase 1: Planning
Launch two planning agents:
**Agent 1 Prompt**:
```
Analyze the codebase and create an implementation plan for:
Adding JWT token validation middleware
You are in PLANNING MODE - DO NOT modify any files.
Output a structured plan with all files you need to modify.
```
**Agent 2 Prompt**:
```
Analyze the codebase and create an implementation plan for:
Adding user login/logout API endpoints
You are in PLANNING MODE - DO NOT modify any files.
Output a structured plan with all files you need to modify.
```
### Agent 1 Plan Output
`.claude/swarm-plans/jwt-middleware.md`:
```markdown
## Agent Plan: jwt-middleware
### Task Summary
Implement JWT token validation middleware for protected routes.
### Files to Modify
- `src/middleware/index.ts`: Export new auth middleware
### Files to Create
- `src/middleware/auth.ts`: JWT validation middleware
- `src/types/auth.ts`: Token payload types
### Dependencies
- Requires: None
- Blocks: Protected routes need this middleware
### Implementation Steps
1. Create auth types
2. Implement JWT validation middleware
3. Export from middleware index
```
### Agent 2 Plan Output
`.claude/swarm-plans/auth-endpoints.md`:
```markdown
## Agent Plan: auth-endpoints
### Task Summary
Implement login and logout API endpoints.
### Files to Modify
- `src/routes/index.ts`: Add auth routes
- `src/middleware/index.ts`: Import auth utilities
### Files to Create
- `src/routes/auth.ts`: Login/logout endpoints
- `src/services/auth.ts`: Authentication service
### Dependencies
- Requires: JWT middleware for logout validation
- Blocks: None
### Implementation Steps
1. Create auth service
2. Implement auth routes
3. Update routes index
4. Update middleware index
```
## Phase 2: Conflict Detection
Analyzing plans:
```
File: src/middleware/index.ts
- jwt-middleware: modify (export new middleware)
- auth-endpoints: modify (import auth utilities)
→ CONFLICT DETECTED
```
### Conflict Report
```markdown
## Conflict Analysis
### Conflicts Found: 1
#### Conflict 1: src/middleware/index.ts
Agents: jwt-middleware, auth-endpoints
Nature: Both agents plan to modify this file
- jwt-middleware: Add export for auth middleware
- auth-endpoints: Import auth utilities
**Resolution Options**:
1. Sequential: jwt-middleware first, then auth-endpoints
2. Merge: Have jwt-middleware handle all middleware/index.ts changes
```
## Phase 3: Resolution
**Chosen Resolution**: Option 1 - Sequential execution
Updated execution plan:
- Batch 1: jwt-middleware (no dependencies)
- Batch 2: auth-endpoints (after jwt-middleware completes)
## Phase 4: File Claiming
Updated `.claude/file-claims.md`:
```markdown
# File Claims Registry
Last updated: 2025-01-15T10:05:00Z
Swarm ID: auth-feature-001
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
| jwt-middleware | src/middleware/auth.ts | 2025-01-15T10:05:00Z | claimed |
| jwt-middleware | src/middleware/index.ts | 2025-01-15T10:05:00Z | claimed |
| jwt-middleware | src/types/auth.ts | 2025-01-15T10:05:00Z | claimed |
| auth-endpoints | src/routes/auth.ts | 2025-01-15T10:05:00Z | pending |
| auth-endpoints | src/routes/index.ts | 2025-01-15T10:05:00Z | pending |
| auth-endpoints | src/services/auth.ts | 2025-01-15T10:05:00Z | pending |
```
Note: auth-endpoints claims are "pending" until jwt-middleware completes.
## Phase 5: Implementation
### Batch 1: jwt-middleware
Launch jwt-middleware agent with implementation instructions:
```
Execute your approved plan from .claude/swarm-plans/jwt-middleware.md
Only modify files claimed by jwt-middleware in .claude/file-claims.md
```
**Status after 45 seconds** (from status-checker):
```json
{
"overall_health": "healthy",
"agents": {
"jwt-middleware": {"status": "working", "progress": "Creating middleware"}
}
}
```
**Status after 2 minutes**:
```json
{
"overall_health": "healthy",
"agents": {
"jwt-middleware": {"status": "completed"}
}
}
```
### Release Claims & Activate Batch 2
Updated `.claude/file-claims.md`:
```markdown
| jwt-middleware | src/middleware/auth.ts | 2025-01-15T10:05:00Z | released |
| jwt-middleware | src/middleware/index.ts | 2025-01-15T10:05:00Z | released |
| jwt-middleware | src/types/auth.ts | 2025-01-15T10:05:00Z | released |
| auth-endpoints | src/routes/auth.ts | 2025-01-15T10:07:00Z | claimed |
| auth-endpoints | src/routes/index.ts | 2025-01-15T10:07:00Z | claimed |
| auth-endpoints | src/services/auth.ts | 2025-01-15T10:07:00Z | claimed |
| auth-endpoints | src/middleware/index.ts | 2025-01-15T10:07:00Z | claimed |
```
Note: auth-endpoints now also claims middleware/index.ts after jwt-middleware released it.
### Batch 2: auth-endpoints
Launch auth-endpoints agent with implementation instructions.
**Status after 3 minutes**:
```json
{
"overall_health": "healthy",
"agents": {
"jwt-middleware": {"status": "completed"},
"auth-endpoints": {"status": "completed"}
},
"completion_percentage": 100
}
```
## Phase 6: Verification
Checks:
- [x] jwt-middleware completed
- [x] auth-endpoints completed
- [x] All files modified as planned
- [x] No orphaned claims
- [x] Build succeeds
- [x] Tests pass
## Phase 7: Summary
```markdown
## Swarm Completion Report
### Task: Add user authentication
### Duration: 8 minutes
### Agents: 2
### Files Created
- src/middleware/auth.ts
- src/types/auth.ts
- src/routes/auth.ts
- src/services/auth.ts
### Files Modified
- src/middleware/index.ts
- src/routes/index.ts
### Conflicts Resolved
- 1 conflict on src/middleware/index.ts (sequential resolution)
### Status: SUCCESS
```

View File

@@ -0,0 +1,287 @@
# Checkpoint-Based Orchestration
A phased approach to swarm execution that prevents conflicts through planning, review, and controlled implementation.
## Overview
Checkpoint-based orchestration separates swarm execution into distinct phases:
1. **Planning** - Agents analyze and plan (read-only)
2. **Review** - Orchestrator detects conflicts
3. **Resolution** - Conflicts resolved before implementation
4. **Claiming** - Files assigned to agents
5. **Implementation** - Agents execute plans
6. **Verification** - Results validated
## Why Checkpoints?
### Without Checkpoints
```
Launch agents → Agents work in parallel → CONFLICT! →
Agents overwrite each other → Endless fix loops → Chaos
```
### With Checkpoints
```
Launch planning agents → Collect plans → Detect conflicts →
Resolve conflicts → Claim files → Sequential/parallel execution → Success
```
## Phase Details
### Phase 1: Planning (Parallel, Read-Only)
**Purpose**: Gather implementation plans without making changes
**Key Rules**:
- Agents may READ any file
- Agents must NOT WRITE any file
- Each agent produces a structured plan
**Agent Instructions**:
```markdown
You are in PLANNING MODE. Analyze the codebase and create an implementation plan.
CRITICAL RESTRICTIONS:
- DO NOT use Edit, Write, or any file modification tools
- DO NOT execute commands that modify files
- ONLY use Read, Glob, Grep for analysis
Your output must be a structured plan listing:
- All files you need to modify (with full paths)
- All files you need to create
- All files you need to delete
- Dependencies on other components
- Step-by-step implementation approach
```
**Plan Format**:
```markdown
## Agent Plan: [agent-id]
### Task Summary
[1-2 sentence description of what this agent will accomplish]
### Files to Modify
- `src/auth/handler.ts`: Add validateToken() function and update handleRequest()
- `src/types/auth.ts`: Add TokenPayload interface
### Files to Create
- `src/auth/tokens.ts`: Token generation and validation utilities
### Files to Delete
- `src/auth/legacy-auth.ts`: Replaced by new implementation
### Dependencies
- **Requires**: Database schema must include users table
- **Blocks**: API routes cannot be updated until auth is complete
### Implementation Steps
1. Create TokenPayload interface in types
2. Implement token utilities in new file
3. Update handler with validation logic
4. Remove legacy file after verification
### Estimated Scope
- Files touched: 4
- Lines added: ~150
- Lines removed: ~80
- Risk level: Medium (touching auth system)
```
### Phase 2: Conflict Detection
**Purpose**: Identify overlapping file edits before they happen
**Process**:
1. Collect all agent plans
2. Build file → agent mapping
3. Identify conflicts:
- Same file modified by multiple agents
- Delete conflicts with modify
- Creation conflicts
- Dependency cycles
**Conflict Types**:
| Type | Severity | Example |
|------|----------|---------|
| Same file modify | Critical | agent-1 and agent-2 both modify handler.ts |
| Create collision | Critical | Both agents create utils/helper.ts |
| Delete + Modify | Critical | agent-1 deletes file agent-2 modifies |
| Dependency cycle | Critical | agent-1 waits for agent-2, agent-2 waits for agent-1 |
| Same directory | Warning | Both agents add files to src/utils/ |
| Import chain | Info | agent-1's file imports from agent-2's file |
### Phase 3: Resolution
**Purpose**: Resolve all conflicts before implementation begins
**Resolution Strategies**:
**Sequential Execution**:
```markdown
Conflict: agent-1 and agent-2 both modify src/api/index.ts
Resolution: Execute sequentially
- Execution order: agent-1 first, then agent-2
- agent-2 will see agent-1's changes before starting
```
**Scope Reassignment**:
```markdown
Conflict: agent-1 (auth) and agent-2 (logging) both modify middleware.ts
Resolution: Reassign to single agent
- Expand agent-1's scope to include logging changes
- Remove middleware.ts from agent-2's plan
```
**File Splitting**:
```markdown
Conflict: agent-1 and agent-2 both modify large config.ts
Resolution: Split the file
- Create config/auth.ts (agent-1)
- Create config/db.ts (agent-2)
- Update config/index.ts to re-export
```
**User Decision**:
```markdown
Conflict: Complex dependency between agent-1 and agent-3
Resolution: Present to user
"Agents 1 and 3 have interleaved dependencies. Options:
1. Merge into single agent
2. Manual sequencing with intermediate reviews
3. Redesign the task split"
```
### Phase 4: File Claiming
**Purpose**: Register file ownership before implementation
**Process**:
1. For each resolved plan, register claims
2. Update `.claude/file-claims.md`
3. Determine execution batches
**Execution Order Determination**:
```markdown
Given resolved plans:
- agent-1: No dependencies
- agent-2: No dependencies
- agent-3: Depends on agent-1
- agent-4: Depends on agent-2 and agent-3
Execution order:
Batch 1 (parallel): agent-1, agent-2
Batch 2 (after batch 1): agent-3
Batch 3 (after agent-3): agent-4
```
### Phase 5: Implementation with Monitoring
**Purpose**: Execute plans with status tracking
**Process**:
1. Launch batch 1 agents
2. Start polling loop (every 30-60 seconds)
3. As agents complete:
- Release their file claims
- Launch dependent agents
4. Handle issues as detected:
- Stuck agents → investigate/reassign
- Conflicts → pause and resolve
- Failures → report and decide
**Agent Instructions for Implementation**:
```markdown
You are now in IMPLEMENTATION MODE. Execute your approved plan.
Your approved plan is in: .claude/swarm-plans/[your-agent-id].md
Your claimed files are in: .claude/file-claims.md
RULES:
1. Only modify files that are claimed by YOUR agent ID
2. Follow your plan exactly - do not expand scope
3. If you need to modify an unclaimed file, STOP and report
4. Update progress by completing your assigned tasks
```
### Phase 6: Verification
**Purpose**: Validate swarm completed successfully
**Checks**:
- [ ] All agents reported completion
- [ ] All planned files were modified
- [ ] No orphaned file claims
- [ ] Build succeeds (if applicable)
- [ ] Tests pass (if applicable)
- [ ] No unexpected files modified
## Checkpoint Gates
Each phase has a gate that must pass before proceeding:
| Gate | Condition | Failure Action |
|------|-----------|----------------|
| Planning → Review | All planning agents completed | Wait or timeout |
| Review → Resolution | Conflict report generated | Re-run detection |
| Resolution → Claiming | All conflicts resolved | Return to resolution |
| Claiming → Implementation | All files claimed, no overlaps | Fix claim issues |
| Implementation → Verification | All agents completed | Investigate failures |
| Verification → Complete | All checks pass | Fix issues or report |
## State Machine
```
┌─────────────┐
│ INITIALIZED │
└──────┬──────┘
│ Start swarm
┌─────────────┐
│ PLANNING │◄────────────────┐
└──────┬──────┘ │
│ All plans received │
▼ │
┌─────────────┐ │
│ REVIEWING │ │
└──────┬──────┘ │
│ Conflicts identified │
▼ │
┌─────────────┐ │
│ RESOLVING │─────────────────┘
└──────┬──────┘ Need re-plan
│ All resolved
┌─────────────┐
│ CLAIMING │
└──────┬──────┘
│ Files assigned
┌─────────────┐
│IMPLEMENTING │◄───┐
└──────┬──────┘ │
│ │ Next batch
▼ │
┌─────────────┐ │
│ VERIFYING │────┘
└──────┬──────┘ More batches
│ All verified
┌─────────────┐
│ COMPLETED │
└─────────────┘
```
## Benefits
1. **No Conflicts**: Detected and resolved before implementation
2. **Visibility**: Know exactly what each agent will do
3. **Control**: Orchestrator maintains full oversight
4. **Recovery**: Can roll back or adjust between phases
5. **Efficiency**: Parallel execution where safe, sequential where needed

View File

@@ -0,0 +1,233 @@
# File Claiming Convention
A coordination protocol where agents claim file ownership before editing to prevent conflicts.
## Overview
File claiming is a simple but effective convention:
1. Before editing any file, agent checks if it's claimed
2. If unclaimed or claimed by self, proceed
3. If claimed by another agent, skip and report
4. After completion, release claims
## The Claims Registry
Location: `.claude/file-claims.md`
### Format
```markdown
# File Claims Registry
Last updated: 2025-01-15T10:30:00Z
Swarm ID: swarm-20250115-abc123
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
| auth-impl | src/auth/handler.ts | 2025-01-15T10:00:00Z | claimed |
| auth-impl | src/auth/types.ts | 2025-01-15T10:00:00Z | claimed |
| auth-impl | src/auth/middleware.ts | 2025-01-15T10:00:00Z | claimed |
| db-agent | src/db/schema.ts | 2025-01-15T10:00:00Z | released |
| db-agent | src/db/queries.ts | 2025-01-15T10:00:00Z | released |
```
### Status Values
| Status | Meaning |
|--------|---------|
| `claimed` | Agent is actively working on this file |
| `released` | Agent completed, file available |
| `conflict` | Multiple agents claimed (needs resolution) |
## Agent Instructions
Include this block in every implementation agent's system prompt:
```markdown
## File Claiming Protocol
You are part of a coordinated swarm. Follow these rules strictly:
### Before ANY File Edit
1. Read `.claude/file-claims.md`
2. Find the file you want to edit in the registry
3. Check the claim status:
**If claimed by YOUR agent ID** → Proceed with edit
**If claimed by ANOTHER agent** → DO NOT edit, report:
"Skipped [file] - claimed by [other-agent]"
**If file NOT in registry** → DO NOT edit, report:
"Cannot edit [file] - not in approved claims"
### During Execution
- Only edit files explicitly claimed by you
- If you discover a need to edit an unclaimed file, report it
- Do not modify the claims registry yourself
### After Completion
Report all files you modified so claims can be released.
```
## Claim Lifecycle
```
┌─────────────────────────────────────────────────────────┐
│ PLANNING PHASE │
│ Agent creates plan → Lists files to modify │
└────────────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ CONFLICT DETECTION │
│ Orchestrator reviews all plans → Identifies overlaps │
│ Resolves conflicts → Determines execution order │
└────────────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ CLAIM REGISTRATION │
│ Orchestrator writes claims to registry │
│ Each file → exactly one agent │
└────────────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ IMPLEMENTATION │
│ Agents check registry before each edit │
│ Only edit files claimed by self │
└────────────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ CLAIM RELEASE │
│ Agent completes → Reports to orchestrator │
│ Orchestrator marks claims as "released" │
└─────────────────────────────────────────────────────────┘
```
## Conflict Resolution Strategies
When multiple agents need the same file:
### Strategy 1: Sequential Execution
```markdown
Conflict: agent-1 and agent-3 both need src/api/handler.ts
Resolution:
- agent-1 claims file, executes first
- After agent-1 completes, release claim
- agent-3 claims file, executes second
```
### Strategy 2: Scope Partition
```markdown
Conflict: agent-1 and agent-2 both need src/types/index.ts
Resolution:
- Split file into src/types/auth.ts and src/types/user.ts
- agent-1 claims auth.ts
- agent-2 claims user.ts
- Update index.ts to re-export (claimed by orchestrator)
```
### Strategy 3: Merge Responsibility
```markdown
Conflict: agent-1 (auth) and agent-2 (validation) both need middleware.ts
Resolution:
- Expand agent-1's scope to include validation changes
- Remove middleware.ts from agent-2's plan
- agent-1 handles all middleware changes
```
### Strategy 4: Section-Based Claims
```markdown
Conflict: Multiple agents need same config file
Resolution:
- Claim specific sections rather than whole file
- agent-1 claims: config.ts lines 1-50 (auth section)
- agent-2 claims: config.ts lines 51-100 (db section)
- Requires careful merge at end
```
## Handling Violations
### Agent Edits Unclaimed File
```markdown
Detected: agent-2 modified src/utils/helper.ts (not in claims)
Response:
1. Flag as violation in status report
2. Options:
a. Add retroactive claim if no conflict
b. Revert change if conflicts with another agent
c. Pause agent and request clarification
```
### Agent Edits Another's File
```markdown
Detected: agent-2 modified src/auth/handler.ts (claimed by agent-1)
Response:
1. CRITICAL violation
2. Pause agent-2 immediately
3. Check if agent-1's work is corrupted
4. Options:
a. Revert agent-2's changes
b. Have agent-1 re-do affected work
c. Manual merge by orchestrator
```
## Best Practices
1. **Register claims BEFORE launching agents** - Not during
2. **One file, one owner** - Never have overlapping claims
3. **Include all touched files** - Even read-heavy files if modified
4. **Release promptly** - Don't hold claims after completion
5. **Verify at completion** - Check all claimed files were handled
6. **Track unclaimed edits** - They indicate planning gaps
## Claims Registry Management
### Creating the Registry
```markdown
# File Claims Registry
Last updated: [timestamp]
Swarm ID: [swarm-id]
| Agent ID | File Path | Claimed At | Status |
|----------|-----------|------------|--------|
```
### Adding Claims (Orchestrator Only)
```markdown
| new-agent | src/new/file.ts | [timestamp] | claimed |
```
### Releasing Claims
Change status from `claimed` to `released`:
```markdown
| agent-id | src/file.ts | [timestamp] | released |
```
### Cleaning Up
After swarm completion:
1. Archive registry to `.claude/swarm-history/`
2. Delete or clear current registry
3. Ready for next swarm

View File

@@ -0,0 +1,152 @@
# Status Polling Pattern
Proactive orchestrator monitoring for swarm health and conflict detection.
## Overview
Instead of fire-and-forget agent launching, the orchestrator periodically spawns lightweight "status checker" agents to monitor swarm progress and identify issues early.
## Why Polling Matters
Without polling:
- Orchestrator has no visibility into agent progress
- Conflicts discovered only after damage is done
- Stuck agents waste time until final timeout
- No opportunity for mid-execution corrections
With polling:
- Real-time visibility into agent status
- Conflicts detected and resolved quickly
- Stuck agents identified and reassigned
- Dynamic load balancing possible
## Polling Schedule
### Recommended Intervals
| Phase | Interval | Reason |
|-------|----------|--------|
| Initial launch | 30 seconds | Catch early failures fast |
| Active execution | 45-60 seconds | Balance visibility vs overhead |
| Near completion | 30 seconds | Ensure clean handoffs |
| Post-completion | Immediate | Verify success, launch next batch |
### Adaptive Polling
Adjust frequency based on:
- **More frequent**: High-conflict swarms, many parallel agents
- **Less frequent**: Simple tasks, sequential execution
- **Immediate**: After any agent reports an issue
## Status Checker Agent
The status-checker agent is designed for fast, lightweight execution:
```yaml
model: haiku # Fast and cheap
tools: Read, Glob, Grep # Read-only, no edits
```
### What It Checks
1. **Agent Status**
- Last update timestamp
- Current task progress
- Reported errors or warnings
2. **File Claims**
- Ownership conflicts
- Stale claims from completed agents
- Unclaimed files being edited
3. **Overall Health**
- Completion percentage
- Estimated time remaining
- Bottlenecks and blockers
### Output Format
```json
{
"timestamp": "2025-01-15T10:35:00Z",
"overall_health": "warning",
"completion_percentage": 65,
"issues": {
"conflicts": [{
"file": "src/api/handler.ts",
"agents": ["agent-1", "agent-3"],
"severity": "critical"
}],
"stuck_agents": [{
"id": "agent-2",
"last_update": "2025-01-15T10:30:00Z",
"duration_seconds": 300
}]
},
"recommendations": [
{"action": "pause", "target": "agent-3", "reason": "resolve conflict"}
]
}
```
## Responding to Status Reports
### Healthy Status
```json
{"overall_health": "healthy"}
```
- Continue execution
- Schedule next poll at normal interval
### Warning Status
```json
{"overall_health": "warning", "issues": {...}}
```
- Review specific issues
- Take corrective action if needed
- Increase polling frequency temporarily
### Critical Status
```json
{"overall_health": "critical", "issues": {...}}
```
- Pause affected agents immediately
- Resolve conflicts before continuing
- Consider notifying user for input
## Implementation Example
```markdown
## During Implementation Phase
1. Launch batch 1 agents (agent-1, agent-2)
2. Wait 30 seconds
3. Launch status-checker agent
4. If healthy: continue, schedule next check in 45 seconds
5. If issues:
- Conflicts: Pause later agent, let first complete
- Stuck: Check logs, consider timeout or reassignment
- Failed: Report to user, decide on retry/skip
6. Repeat until all agents complete
```
## Polling vs Event-Driven
| Approach | Pros | Cons |
|----------|------|------|
| Polling | Simple, no agent modification needed | Some latency in detection |
| Events | Immediate detection | Requires agent cooperation |
This plugin uses polling because:
- Works with any agent without modification
- Orchestrator maintains full control
- Simpler implementation
- Haiku model makes polling cheap
## Best Practices
1. **Use haiku for status checks** - Fast and cheap
2. **Don't poll too frequently** - 30 seconds minimum
3. **Act on issues promptly** - Don't just log and continue
4. **Track polling history** - Useful for debugging
5. **Combine with file claims** - Polling detects, claims prevent