Source Code
Multi-LLM - Intelligent Model Switching
Trigger Command: multi llm
Default Behavior: Always use Claude Opus 4.5 (strongest model) Only when the message contains
multi llmcommand will local model selection be activated.
What's New in v1.1.0
- Renamed trigger from
mlti llmtomulti llm(clearer naming) - Enhanced model existence checking with fallback chain
- Added detailed usage examples and troubleshooting
- Improved task detection patterns
Usage
Default Mode (without command)
Help me write a Python function -> Uses Claude Opus 4.5
Analyze this code -> Uses Claude Opus 4.5
Multi-Model Mode (with command)
multi llm Help me write a Python function -> Selects qwen2.5-coder:32b
multi llm Analyze this math proof -> Selects deepseek-r1:70b
multi llm Translate to Chinese -> Selects glm4:9b
Command Format
| Command | Description |
|---|---|
multi llm |
Activate intelligent model selection |
multi llm coding |
Force coding model |
multi llm reasoning |
Force reasoning model |
multi llm chinese |
Force Chinese model |
multi llm general |
Force general model |
Model Mapping
Primary Model (Default): github-copilot/claude-opus-4.5
Local Models (when multi llm triggered):
| Task Type | Model | Size | Best For |
|---|---|---|---|
| Coding | qwen2.5-coder:32b | 19GB | Code generation, debugging, refactoring |
| Reasoning | deepseek-r1:70b | 42GB | Math, logic, complex analysis |
| Chinese | glm4:9b | 5.5GB | Translation, summaries, quick tasks |
| General | qwen3:32b | 20GB | General purpose, fallback |
Fallback Chain
If the selected model is unavailable, the system tries alternatives:
Coding: qwen2.5-coder:32b -> qwen2.5-coder:14b -> qwen3:32b
Reasoning: deepseek-r1:70b -> deepseek-r1:32b -> qwen3:32b
Chinese: glm4:9b -> qwen3:8b -> qwen3:32b
General: qwen3:32b -> qwen3:14b -> qwen3:8b
Detection Logic
User Input
|
v
Contains "multi llm"?
|
+-- No -> Use Claude Opus 4.5 (default)
|
+-- Yes -> Task Type Detection
|
+-------+-------+-------+
v v v v
Coding Reasoning Chinese General
| | | |
v v v v
qwen2.5 deepseek glm4 qwen3
coder r1:70b :9b :32b
Task Detection Keywords
| Category | Keywords (EN) | Keywords (CN) |
|---|---|---|
| Coding | code, debug, function, script, api, bug, refactor, python, java, javascript | ไปฃ็ , ็ผ็จ, ๅฝๆฐ, ่ฐ่ฏ, ้ๆ |
| Reasoning | analysis, proof, logic, math, solve, algorithm, evaluate | ๆจ็, ๅๆ, ่ฏๆ, ้ป่พ, ๆฐๅญฆ, ่ฎก็ฎ, ็ฎๆณ |
| Chinese | translate, summary | ็ฟป่ฏ, ๆป็ป, ๆ่ฆ, ็ฎๅ, ๅฟซ้ |
Examples
Example 1: Coding Task
# Input
multi llm Write a Python function to calculate fibonacci
# Output
Selected: qwen2.5-coder:32b
Reason: Detected coding task (keywords: python, function)
Example 2: Math Analysis
# Input
multi llm reasoning Prove that sqrt(2) is irrational
# Output
Selected: deepseek-r1:70b
Reason: Force command 'reasoning' used
Example 3: Quick Translation
# Input
multi llm ๆ่ฟๆฎต่ฏ็ฟป่ฏๆ่ฑๆ
# Output
Selected: glm4:9b
Reason: Detected Chinese lightweight task (keywords: ็ฟป่ฏ)
Example 4: Default (No trigger)
# Input
Write a REST API with authentication
# Output
Selected: claude-opus-4.5
Reason: Default model (no 'multi llm' trigger)
Prerequisites
- Ollama must be installed and running:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
ollama serve
# Pull required models
ollama pull qwen2.5-coder:32b
ollama pull deepseek-r1:70b
ollama pull glm4:9b
ollama pull qwen3:32b
- Check available models:
ollama list
Troubleshooting
Model not found
# Check if model exists
ollama list | grep "qwen2.5-coder"
# Pull missing model
ollama pull qwen2.5-coder:32b
Ollama not running
# Check service status
curl -s http://localhost:11434/api/tags
# Start Ollama
ollama serve &
Slow response
- Large models (70b) require significant RAM/VRAM
- Consider using smaller variants:
deepseek-r1:32binstead of70b
Wrong model selected
- Use force commands:
multi llm coding,multi llm reasoning - Check if keywords match your task type
Files in This Skill
multi-llm/
โโโ SKILL.md # This documentation
โโโ scripts/
โโโ select-model.sh # Model selection logic
โโโ fallback-demo.sh # Interactive demo script
Integration
With OpenCode/ClaudeCode
The trigger multi llm is detected in your message. Simply prefix your request:
multi llm [your request here]
Programmatic Usage
# Get recommended model for a task
./scripts/select-model.sh "multi llm write a sorting algorithm"
# Output: qwen2.5-coder:32b
# Demo with actual model call
./scripts/fallback-demo.sh --force-local "explain recursion"
Author
- GitHub: @leohan123123
License
MIT