โ† Back to Productivity & Tasks
Productivity & Tasks by @chair4ce

swarm

Parallel task execution using Gemini Flash workers

0
Source Code

Swarm โ€” Cut Your LLM Costs by 200x

Turn your expensive model into an affordable daily driver. Offload the boring stuff to Gemini Flash workers โ€” parallel, batch, research โ€” at a fraction of the cost.

At a Glance

30 tasks via Time Cost
Opus (sequential) ~30s ~$0.50
Swarm (parallel) ~1s ~$0.003

When to Use

Swarm is ideal for:

  • 3+ independent tasks (research, summaries, comparisons)
  • Comparing or researching multiple subjects
  • Multiple URLs to fetch/analyze
  • Batch processing (documents, entities, facts)
  • Complex analysis needing multiple perspectives โ†’ use chain

Quick Reference

# Check daemon (do this every session)
swarm status

# Start if not running
swarm start

# Parallel prompts
swarm parallel "What is X?" "What is Y?" "What is Z?"

# Research multiple subjects
swarm research "OpenAI" "Anthropic" "Mistral" --topic "AI safety"

# Discover capabilities
swarm capabilities

Execution Modes

Parallel (v1.0)

N prompts โ†’ N workers simultaneously. Best for independent tasks.

swarm parallel "prompt1" "prompt2" "prompt3"

Research (v1.1)

Multi-phase: search โ†’ fetch โ†’ analyze. Uses Google Search grounding.

swarm research "Buildertrend" "Jobber" --topic "pricing 2026"

Chain (v1.3) โ€” Refinement Pipelines

Data flows through multiple stages, each with a different perspective/filter. Stages run in sequence; tasks within a stage run in parallel.

Stage modes:

  • parallel โ€” N inputs โ†’ N workers (same perspective)
  • single โ€” merged input โ†’ 1 worker
  • fan-out โ€” 1 input โ†’ N workers with DIFFERENT perspectives
  • reduce โ€” N inputs โ†’ 1 synthesized output

Auto-chain โ€” describe what you want, get an optimal pipeline:

curl -X POST http://localhost:9999/chain/auto \
  -d '{"task":"Find business opportunities","data":"...market data...","depth":"standard"}'

Manual chain:

swarm chain pipeline.json
# or
echo '{"stages":[...]}' | swarm chain --stdin

Depth presets: quick (2 stages), standard (4), deep (6), exhaustive (8)

Built-in perspectives: extractor, filter, enricher, analyst, synthesizer, challenger, optimizer, strategist, researcher, critic

Preview without executing:

curl -X POST http://localhost:9999/chain/preview \
  -d '{"task":"...","depth":"standard"}'

Benchmark (v1.3)

Compare single vs parallel vs chain on the same task with LLM-as-judge scoring.

curl -X POST http://localhost:9999/benchmark \
  -d '{"task":"Analyze X","data":"...","depth":"standard"}'

Scores on 6 FLASK dimensions: accuracy (2x weight), depth (1.5x), completeness, coherence, actionability (1.5x), nuance.

Capabilities Discovery (v1.3)

Lets the orchestrator discover what execution modes are available:

swarm capabilities
# or
curl http://localhost:9999/capabilities

Prompt Cache (v1.3.2)

LRU cache for LLM responses. 212x speedup on cache hits (parallel), 514x on chains.

  • Keyed by hash of instruction + input + perspective
  • 500 entries max, 1 hour TTL
  • Skips web search tasks (need fresh data)
  • Persists to disk across daemon restarts
  • Per-task bypass: set task.cache = false
# View cache stats
curl http://localhost:9999/cache

# Clear cache
curl -X DELETE http://localhost:9999/cache

Cache stats show in swarm status.

Stage Retry (v1.3.2)

If tasks fail within a chain stage, only the failed tasks get retried (not the whole stage). Default: 1 retry. Configurable per-phase via phase.retries or globally via options.stageRetries.

Cost Tracking (v1.3.1)

All endpoints return cost data in their complete event:

  • session โ€” current daemon session totals
  • daily โ€” persisted across restarts, accumulates all day
swarm status        # Shows session + daily cost
swarm savings       # Monthly savings report

Web Search (v1.1)

Workers search the live web via Google Search grounding (Gemini only, no extra cost).

# Research uses web search by default
swarm research "Subject" --topic "angle"

# Parallel with web search
curl -X POST http://localhost:9999/parallel \
  -d '{"prompts":["Current price of X?"],"options":{"webSearch":true}}'

JavaScript API

const { parallel, research } = require('~/clawd/skills/node-scaling/lib');
const { SwarmClient } = require('~/clawd/skills/node-scaling/lib/client');

// Simple parallel
const result = await parallel(['prompt1', 'prompt2', 'prompt3']);

// Client with streaming
const client = new SwarmClient();
for await (const event of client.parallel(prompts)) { ... }
for await (const event of client.research(subjects, topic)) { ... }

// Chain
const result = await client.chainSync({ task, data, depth });

Daemon Management

swarm start              # Start daemon (background)
swarm stop               # Stop daemon
swarm status             # Status, cost, cache stats
swarm restart            # Restart daemon
swarm savings            # Monthly savings report
swarm logs [N]           # Last N lines of daemon log

Performance (v1.3.2)

Mode Tasks Time Notes
Parallel (simple) 5 ~700ms 142ms/task effective
Parallel (stress) 10 ~1.2s 123ms/task effective
Chain (standard) 5 ~14s 3-stage multi-perspective
Chain (quick) 2 ~3s 2-stage extract+synthesize
Cache hit any ~3-5ms 200-500x speedup
Research (web) 2 ~15s Google grounding latency

Config

Location: ~/.config/clawdbot/node-scaling.yaml

node_scaling:
  enabled: true
  limits:
    max_nodes: 16
    max_concurrent_api: 16
  provider:
    name: gemini
    model: gemini-2.0-flash
  web_search:
    enabled: true
    parallel_default: false
  cost:
    max_daily_spend: 10.00

Troubleshooting

Issue Fix
Daemon not running swarm start
No API key Set GEMINI_API_KEY or run npm run setup
Rate limited Lower max_concurrent_api in config
Web search not working Ensure provider is gemini + web_search.enabled
Cache stale results curl -X DELETE http://localhost:9999/cache
Chain too slow Use depth: "quick" or check context size

Structured Output (v1.3.7)

Force JSON output with schema validation โ€” zero parse failures on structured tasks.

# With built-in schema
curl -X POST http://localhost:9999/structured \
  -d '{"prompt":"Extract entities from: Tim Cook announced iPhone 17","schema":"entities"}'

# With custom schema
curl -X POST http://localhost:9999/structured \
  -d '{"prompt":"Classify this text","data":"...","schema":{"type":"object","properties":{"category":{"type":"string"}}}}'

# JSON mode (no schema, just force JSON)
curl -X POST http://localhost:9999/structured \
  -d '{"prompt":"Return a JSON object with name, age, city for a fictional person"}'

# List available schemas
curl http://localhost:9999/structured/schemas

Built-in schemas: entities, summary, comparison, actions, classification, qa

Uses Gemini's native response_mime_type: application/json + responseSchema for guaranteed JSON output. Includes schema validation on the response.

Majority Voting (v1.3.7)

Same prompt โ†’ N parallel executions โ†’ pick the best answer. Higher accuracy on factual/analytical tasks.

# Judge strategy (LLM picks best โ€” most reliable)
curl -X POST http://localhost:9999/vote \
  -d '{"prompt":"What are the key factors in SaaS pricing?","n":3,"strategy":"judge"}'

# Similarity strategy (consensus โ€” zero extra cost)
curl -X POST http://localhost:9999/vote \
  -d '{"prompt":"What year was Python released?","n":3,"strategy":"similarity"}'

# Longest strategy (heuristic โ€” zero extra cost)
curl -X POST http://localhost:9999/vote \
  -d '{"prompt":"Explain recursion","n":3,"strategy":"longest"}'

Strategies:

  • judge โ€” LLM scores all candidates on accuracy/completeness/clarity/actionability, picks winner (N+1 calls)
  • similarity โ€” Jaccard word-set similarity, picks consensus answer (N calls, zero extra cost)
  • longest โ€” Picks longest response as heuristic for thoroughness (N calls, zero extra cost)

When to use: Factual questions, critical decisions, or any task where accuracy > speed.

Strategy Calls Extra Cost Quality
similarity N $0 Good (consensus)
longest N $0 Decent (heuristic)
judge N+1 ~$0.0001 Best (LLM-scored)

Self-Reflection (v1.3.5)

Optional critic pass after chain/skeleton output. Scores 5 dimensions, auto-refines if below threshold.

# Add reflect:true to any chain or skeleton request
curl -X POST http://localhost:9999/chain/auto \
  -d '{"task":"Analyze the AI chip market","data":"...","reflect":true}'

curl -X POST http://localhost:9999/skeleton \
  -d '{"task":"Write a market analysis","reflect":true}'

Proven: improved weak output from 5.0 โ†’ 7.6 avg score. Skeleton + reflect scored 9.4/10.

Skeleton-of-Thought (v1.3.6)

Generate outline โ†’ expand each section in parallel โ†’ merge into coherent document. Best for long-form content.

curl -X POST http://localhost:9999/skeleton \
  -d '{"task":"Write a comprehensive guide to SaaS pricing","maxSections":6,"reflect":true}'

Performance: 14,478 chars in 21s (675 chars/sec) โ€” 5.1x more content than chain at 2.9x higher throughput.

Metric Chain Skeleton-of-Thought Winner
Output size 2,856 chars 14,478 chars SoT (5.1x)
Throughput 234 chars/sec 675 chars/sec SoT (2.9x)
Duration 12s 21s Chain (faster)
Quality (w/ reflect) ~7-8/10 9.4/10 SoT

When to use what:

  • SoT โ†’ long-form content, reports, guides, docs (anything with natural sections)
  • Chain โ†’ analysis, research, adversarial review (anything needing multiple perspectives)
  • Parallel โ†’ independent tasks, batch processing
  • Structured โ†’ entity extraction, classification, any task needing reliable JSON
  • Voting โ†’ factual accuracy, critical decisions, consensus-building

API Endpoints

Method Path Description
GET /health Health check
GET /status Detailed status + cost + cache
GET /capabilities Discover execution modes
POST /parallel Execute N prompts in parallel
POST /research Multi-phase web research
POST /skeleton Skeleton-of-Thought (outline โ†’ expand โ†’ merge)
POST /chain Manual chain pipeline
POST /chain/auto Auto-build + execute chain
POST /chain/preview Preview chain without executing
POST /chain/template Execute pre-built template
POST /structured Forced JSON with schema validation
GET /structured/schemas List built-in schemas
POST /vote Majority voting (best-of-N)
POST /benchmark Quality comparison test
GET /templates List chain templates
GET /cache Cache statistics
DELETE /cache Clear cache

Cost Comparison

Model Cost per 1M tokens Relative
Claude Opus 4 ~$15 input / $75 output 1x
GPT-4o ~$2.50 input / $10 output ~7x cheaper
Gemini Flash ~$0.075 input / $0.30 output 200x cheaper

Cache hits are essentially free (~3-5ms, no API call).