Source Code

Video Message

Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).

Installation

npm install -g openclaw-avatarcam

Configuration

Configure in TOOLS.md:

### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00

Settings Reference

Setting	Default	Description
`avatar`	`default.vrm`	VRM avatar file path
`background`	`#00FF00`	Color (hex) or image path

Prerequisites

System Dependencies

Platform	Command
macOS	`brew install ffmpeg`
Linux	`sudo apt-get install -y xvfb xauth ffmpeg`
Windows	Install ffmpeg and add to PATH
Docker	See Docker section below

Note: macOS and Windows don't need xvfb — they have native display support.

Docker Users

Add to OPENCLAW_DOCKER_APT_PACKAGES:

build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg

Usage

# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"

# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"

# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"

Sending as Video Note

Use OpenClaw's message tool with asVideoNote:

message action=send filePath=/tmp/video.mp4 asVideoNote=true

Workflow

Read config from TOOLS.md (avatar, background)
Generate TTS if given text: tts text="..." → audio path
Run avatarcam with audio + settings → MP4 output
Send as video note via message action=send filePath=... asVideoNote=true
Return NO_REPLY after sending

Example Flow

User: "Send me a video message saying hello"

# 1. TTS
tts text="Hello! How are you today?" → /tmp/voice.mp3

# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"

# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true

# 4. Reply
NO_REPLY

Technical Details

Setting	Value
Resolution	384x384 (square)
Frame rate	30fps constant
Max duration	60 seconds
Video codec	H.264 (libx264)
Audio codec	AAC
Quality	CRF 18 (high quality)
Container	MP4

Processing Pipeline

Electron renders VRM avatar with lip sync at 1280x720
WebM captured via canvas.captureStream(30)
FFmpeg processes: crop → fps normalize → scale → encode
Message tool sends via Telegram sendVideoNote API

Platform Support

Platform	Display	Notes
macOS	Native Quartz	No extra deps
Linux	xvfb (headless)	`apt install xvfb`
Windows	Native	No extra deps

Headless Rendering

Avatarcam auto-detects headless environments:

Uses xvfb-run when $DISPLAY is not set (Linux only)
macOS/Windows use native display
GPU stall warnings are safe to ignore
Generation time: ~1.5x realtime (20s audio ≈ 30s processing)

Notes

Config is read from TOOLS.md
Clean up temp files after sending: rm /tmp/video*.mp4
For regular video (not circular), omit asVideoNote=true

avatar-video-messages