Source Code
Video Message
Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).
Installation
npm install -g openclaw-avatarcam
Configuration
Configure in TOOLS.md:
### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00
Settings Reference
| Setting | Default | Description |
|---|---|---|
avatar |
default.vrm |
VRM avatar file path |
background |
#00FF00 |
Color (hex) or image path |
Prerequisites
System Dependencies
| Platform | Command |
|---|---|
| macOS | brew install ffmpeg |
| Linux | sudo apt-get install -y xvfb xauth ffmpeg |
| Windows | Install ffmpeg and add to PATH |
| Docker | See Docker section below |
Note: macOS and Windows don't need xvfb โ they have native display support.
Docker Users
Add to OPENCLAW_DOCKER_APT_PACKAGES:
build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg
Usage
# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"
# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"
# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"
Sending as Video Note
Use OpenClaw's message tool with asVideoNote:
message action=send filePath=/tmp/video.mp4 asVideoNote=true
Workflow
- Read config from TOOLS.md (avatar, background)
- Generate TTS if given text:
tts text="..."โ audio path - Run avatarcam with audio + settings โ MP4 output
- Send as video note via
message action=send filePath=... asVideoNote=true - Return NO_REPLY after sending
Example Flow
User: "Send me a video message saying hello"
# 1. TTS
tts text="Hello! How are you today?" โ /tmp/voice.mp3
# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"
# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true
# 4. Reply
NO_REPLY
Technical Details
| Setting | Value |
|---|---|
| Resolution | 384x384 (square) |
| Frame rate | 30fps constant |
| Max duration | 60 seconds |
| Video codec | H.264 (libx264) |
| Audio codec | AAC |
| Quality | CRF 18 (high quality) |
| Container | MP4 |
Processing Pipeline
- Electron renders VRM avatar with lip sync at 1280x720
- WebM captured via
canvas.captureStream(30) - FFmpeg processes: crop โ fps normalize โ scale โ encode
- Message tool sends via Telegram
sendVideoNoteAPI
Platform Support
| Platform | Display | Notes |
|---|---|---|
| macOS | Native Quartz | No extra deps |
| Linux | xvfb (headless) | apt install xvfb |
| Windows | Native | No extra deps |
Headless Rendering
Avatarcam auto-detects headless environments:
- Uses
xvfb-runwhen$DISPLAYis not set (Linux only) - macOS/Windows use native display
- GPU stall warnings are safe to ignore
- Generation time: ~1.5x realtime (20s audio โ 30s processing)
Notes
- Config is read from TOOLS.md
- Clean up temp files after sending:
rm /tmp/video*.mp4 - For regular video (not circular), omit
asVideoNote=true