← Back to Web & Frontend Development
Web & Frontend Development by @vincentzhangz

singleshot-prompt-testing

Prompt cost testing with single shot

0
Source Code

Singleshot Prompt Testing & Optimization Skill

Description

Prompt cost testing with single shot

Installation

brew tap vincentzhangz/singleshot
brew install singleshot

Or: cargo install singleshot

When to Use

  • Testing new prompts before openclaw implementation
  • Benchmarking prompt variations for token efficiency
  • Comparing model performance and costs
  • Validating prompt outputs before production

Core Commands

Always use -d (detail) and -r (report) flags for efficiency analysis:

# Basic test with full metrics
singleshot chat -p "Your prompt" -P openai -d -r report.md

# Test with config file
singleshot chat -l config.md -d -r report.md

# Compare providers
singleshot chat -p "Test" -P openai -m gpt-4o-mini -d -r openai.md
singleshot chat -p "Test" -P anthropic -m claude-sonnet-4-20250514 -d -r anthropic.md

# Batch test variations
for config in *.md; do
  singleshot chat -l "$config" -d -r "report-${config%.md}.md"
done

Report Analysis Workflow

1. Generate Baseline

singleshot chat -p "Your prompt" -P openai -d -r baseline.md
cat baseline.md

2. Optimize & Compare

# Create optimized version, test, and compare
cat > optimized.md << 'EOF'
openai
---model---
gpt-4o-mini
---max_tokens---
200
---system---
Expert. Be concise.
---prompt---
Your optimized prompt
EOF

singleshot chat -l optimized.md -d -r optimized-report.md

# Compare metrics
echo "Baseline:" && grep -E "(Tokens|Cost)" baseline.md
echo "Optimized:" && grep -E "(Tokens|Cost)" optimized-report.md

Report Metrics

Reports contain:

## Token Usage
- Input Tokens: 245
- Output Tokens: 180
- Total Tokens: 425

## Cost (estimated)
- Input Cost: $0.00003675
- Output Cost: $0.000108
- Total Cost: $0.00014475

## Timing
- Time to First Token: 0.45s
- Total Time: 1.23s

Optimization Strategies

  1. Test with cheaper models first:

    singleshot chat -p "Test" -P openai -m gpt-4o-mini -d -r report.md
    
  2. Reduce tokens:

    • Shorten system prompts
    • Use --max-tokens to limit output
    • Add "be concise" to system prompt
  3. Test locally (free):

    singleshot chat -p "Test" -P ollama -m llama3.2 -d -r report.md
    

Example: Full Optimization

# Step 1: Baseline (verbose)
singleshot chat \
  -p "How do I write a Rust function to add two numbers?" \
  -s "You are an expert Rust programmer with 10 years experience" \
  -P openai -d -r v1.md

# Step 2: Read metrics
cat v1.md
# Expected: ~130 input tokens, ~400 output tokens

# Step 3: Optimized version
singleshot chat \
  -p "Rust function: add(a: i32, b: i32) -> i32" \
  -s "Rust expert. Code only." \
  -P openai --max-tokens 100 -d -r v2.md

# Step 4: Compare
echo "=== COMPARISON ==="
grep "Total Cost" v1.md v2.md
grep "Total Tokens" v1.md v2.md

Quick Reference

# Test with full details
singleshot chat -p "prompt" -P openai -d -r report.md

# Extract metrics
grep -E "(Input|Output|Total)" report.md

# Compare reports
diff report1.md report2.md

# Vision test
singleshot chat -p "Describe" -i image.jpg -P openai -d -r report.md

# List models
singleshot models -P openai

# Test connection
singleshot ping -P openai

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."

Best Practices

  1. Always use -d for detailed token metrics
  2. Always use -r to save reports
  3. Always cat reports to analyze metrics
  4. Test variations and compare costs
  5. Set --max-tokens to control costs
  6. Use gpt-4o-mini for testing (cheaper)

Troubleshooting

  • No metrics: Ensure -d flag is used
  • No report file: Ensure -r flag is used
  • High costs: Switch to gpt-4o-mini or Ollama
  • Connection issues: Run singleshot ping -P <provider>