โ† Back to Communication
Communication by @globalcaos

whatsapp-ultimate

Complete WhatsApp integration โ€” messages, media, polls, stickers, voice notes, reactions, FTS5 history search, voice transcription. Native Baileys, zero Docker

0
Source Code

WhatsApp Ultimate

Your AI agent on WhatsApp โ€” not a chatbot, a real presence.

Send messages, voice notes, polls, stickers, and reactions. Search years of chat history instantly. Manage groups, transcribe voice messages, and control exactly who talks to your agent and when. Native Baileys integration โ€” zero Docker, zero external services, zero monthly fees.

This isn't a wrapper around a REST API. This is your agent living inside WhatsApp as a first-class participant.


Why This Skill Exists

Every other WhatsApp integration we found was either:

  • A webhook relay that could send text and... that's it
  • A Docker container you had to babysit
  • A Business API wrapper requiring Meta approval and a separate phone number
  • A CLI tool that couldn't search history or manage groups

We built what we actually needed: an agent that can do everything a human can do on WhatsApp โ€” from sending a thumbs-up reaction to pulling 3 years of chat history into a searchable database. And we made it secure enough to share a phone number with family.


What You Get

24 Distinct Actions

Category What Your Agent Can Do
Messaging Text, images, videos, documents, voice notes, GIFs, polls, stickers
Interactions React with any emoji, reply/quote, edit sent messages, unsend/delete
Groups Create, rename, set icon/description, add/remove/promote/demote members, invite links
History Full-text search (SQLite + FTS5), date filters, sender filters, bulk import
Voice Transcribe incoming voice notes, send metallic TTS replies
Security 3-rule group gate, DM prefix gate, per-conversation access control

What Makes This Different

๐Ÿ”’ Strict 3-Rule Group Gate โ€” The #1 problem with AI agents in WhatsApp groups: they respond to everything. Someone shares a photo? The agent chimes in. A family member sends a meme? The agent analyzes it. We fixed this with three rules that ALL must pass before your agent opens its mouth:

  1. Is this an allowed group? โ€” You whitelist which group chats the agent responds in. The agent sees all chats (for history search, context, and awareness), but only triggers a response in approved groups.
  2. Is this person authorized? โ€” Even in an allowed group, only specific phone numbers can trigger the agent. Your cousin's random messages? Ignored.
  3. Did they say the magic word? โ€” The message must start with your trigger prefix (e.g. "Jarvis"). No prefix, no response. Photos, stickers, memes, forwarded chains โ€” all silently ignored.

No bypasses, no exceptions, no "but the owner sent media so let it through." Your agent stays silent until explicitly addressed by name, by someone you trust, in a chat you approved.

๐Ÿค”โ†”๐Ÿง Thinking Heartbeat โ€” WhatsApp's linked-device API can't show "typing..." in groups (Baileys #866). We solved it: the agent reacts with ๐Ÿค” instantly, alternates to ๐Ÿง, and removes the reaction when the reply is ready. Your users always know the agent is working. No other WhatsApp skill does this.

๐ŸŽค Voice-First Design โ€” Voice notes are transcribed before prefix checking. Say "Jarvis, what's the weather?" in a voice note and it works exactly like text. The transcript is checked against triggerPrefix, and the agent responds with a metallic voice reply using local TTS. Zero cloud costs. Pair with the sherpa-onnx-tts skill for the full JARVIS effect, or use jarvis-voice for a ready-made metallic voice pipeline.

๐Ÿ“š Searchable History โ€” Every message is stored in SQLite with FTS5 full-text search. Import years of old chats from WhatsApp exports. Ask your agent "what did Sarah say about the deadline last month?" and get an instant answer. Combine with agent-memory-ultimate for cognitive recall that spans WhatsApp, email, calendar, and more.

๐Ÿ”„ Full History Resync โ€” Pull your entire WhatsApp history (3+ years, 17K+ messages) into the local database with a single re-link. No manual exports needed.


Quick Start

Prerequisites

  • OpenClaw with WhatsApp channel configured
  • WhatsApp account linked via QR code (openclaw whatsapp login)

Minimal Config

{
  "channels": {
    "whatsapp": {
      "dmPolicy": "allowlist",
      "allowFrom": ["+1234567890"],
      "triggerPrefix": "jarvis",
      "messagePrefix": "๐Ÿค–",
      "responsePrefix": "๐Ÿค–"
    }
  }
}

That's it. Your agent now responds only to your messages, only when you say "Jarvis", and every reply is tagged with ๐Ÿค– so you always know who's talking.


Messaging

Send Text

message action=send channel=whatsapp to="+34612345678" message="Hello!"

Send Media (Image/Video/Document)

message action=send channel=whatsapp to="+34612345678" message="Check this out" filePath=/path/to/image.jpg

Supported: JPG, PNG, GIF, MP4, PDF, DOC, etc.

Send Poll

message action=poll channel=whatsapp to="+34612345678" pollQuestion="What time?" pollOption=["3pm", "4pm", "5pm"]

Send Sticker

message action=sticker channel=whatsapp to="+34612345678" filePath=/path/to/sticker.webp

Must be WebP format, ideally 512x512.

Send Voice Note

message action=send channel=whatsapp to="+34612345678" filePath=/path/to/audio.ogg asVoice=true

Critical: Use OGG/Opus format. MP3 may not play correctly on WhatsApp.

Send GIF

message action=send channel=whatsapp to="+34612345678" filePath=/path/to/animation.mp4 gifPlayback=true

Convert GIF to MP4 first (WhatsApp requires this):

ffmpeg -i input.gif -movflags faststart -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4 -y

Interactions

Reactions

# Add reaction
message action=react channel=whatsapp chatJid="34612345678@s.whatsapp.net" messageId="ABC123" emoji="๐Ÿš€"

# Remove reaction
message action=react channel=whatsapp chatJid="34612345678@s.whatsapp.net" messageId="ABC123" remove=true

Reply/Quote

message action=reply channel=whatsapp to="34612345678@s.whatsapp.net" replyTo="QUOTED_MSG_ID" message="Replying to this!"

Edit & Unsend

# Edit (own messages only)
message action=edit channel=whatsapp chatJid="34612345678@s.whatsapp.net" messageId="ABC123" message="Updated text"

# Unsend/delete
message action=unsend channel=whatsapp chatJid="34612345678@s.whatsapp.net" messageId="ABC123"

Group Management

Full group lifecycle โ€” create, configure, manage members, and control access:

# Create group
message action=group-create channel=whatsapp name="Project Team" participants=["+34612345678"]

# Rename / set icon / set description
message action=renameGroup channel=whatsapp groupId="123@g.us" name="New Name"
message action=setGroupIcon channel=whatsapp groupId="123@g.us" filePath=/path/to/icon.jpg
message action=setGroupDescription channel=whatsapp groupJid="123@g.us" description="Team chat"

# Manage members
message action=addParticipant channel=whatsapp groupId="123@g.us" participant="+34612345678"
message action=removeParticipant channel=whatsapp groupId="123@g.us" participant="+34612345678"
message action=promoteParticipant channel=whatsapp groupJid="123@g.us" participants=["+34612345678"]
message action=demoteParticipant channel=whatsapp groupJid="123@g.us" participants=["+34612345678"]

# Invite links
message action=getInviteCode channel=whatsapp groupJid="123@g.us"
message action=revokeInviteCode channel=whatsapp groupJid="123@g.us"

# Group info
message action=getGroupInfo channel=whatsapp groupJid="123@g.us"

# Leave group
message action=leaveGroup channel=whatsapp groupId="123@g.us"

๐Ÿ”’ Access Control (v2.0)

The most granular WhatsApp access control available for any AI agent. Because the last thing you want is your agent responding to your mother-in-law's photos with a treatise on capitulaciones matrimoniales.

The 3-Rule Gate (Groups)

Every group message must pass ALL three rules:

Rule Check Configured By
1. Allowed Chat Is this group in the allowlist? groupPolicy + group JIDs in groupAllowFrom
2. Authorized Sender Is this person allowed to talk to the agent? Phone numbers in groupAllowFrom
3. Trigger Prefix Does the message start with "Jarvis" (or @mention, or reply-to-bot)? triggerPrefix

No bypasses. Photos, videos, stickers, documents โ€” all silently ignored unless the sender explicitly addresses the agent by name. Owner slash commands (/new, /status) pass without prefix.

DM Prefix Gate

The same triggerPrefix applies to DMs too. Messages without the prefix are silently dropped. Voice notes are transcribed first, then checked.

Configuration

{
  "channels": {
    "whatsapp": {
      "dmPolicy": "allowlist",
      "allowFrom": ["+34612345678", "+14155551234"],
      "groupPolicy": "allowlist",
      "groupAllowFrom": [
        "+34612345678",
        "+14155551234",
        "120363409030785922@g.us"
      ],
      "triggerPrefix": "jarvis",
      "messagePrefix": "๐Ÿค–",
      "responsePrefix": "๐Ÿค–"
    }
  }
}
DM Policy Behavior
"open" Anyone can DM
"allowlist" Only numbers in allowFrom
"pairing" Unknown senders get pairing prompt
"disabled" No DMs accepted
Group Policy Behavior
"open" Responds to mentions in any group
"allowlist" Only from senders in groupAllowFrom
"disabled" Ignores all group messages

Self-Chat Mode

{ "channels": { "whatsapp": { "selfChatMode": true } } }

Talk to your agent through your "Note to Self" chat.


๐Ÿค” Thinking Heartbeat

The problem: WhatsApp linked devices can't show "typing..." in groups. This is a WhatsApp server-side limitation โ€” confirmed in Baileys #866.

Our solution: The agent reacts with ๐Ÿค” instantly (<100ms), alternates to ๐Ÿง every second, and removes the reaction when the reply arrives. It doubles as a watchdog โ€” if the reaction freezes on one emoji, something is hung.

Works in groups โœ… and DMs โœ….


๐Ÿ“š Message History & Search

Every message stored in SQLite with FTS5 full-text search. Import old chats. Search by keyword, sender, date, or chat.

# Search by keyword
whatsapp_history action=search query="meeting tomorrow"

# Filter by chat
whatsapp_history action=search chat="Family Group" limit=50

# What did I say?
whatsapp_history action=search fromMe=true query="I promised"

# Filter by sender
whatsapp_history action=search sender="John" limit=20

# Date range
whatsapp_history action=search since="2026-01-01" until="2026-02-01"

# Database stats
whatsapp_history action=stats

Import Historical Chats

  1. Export from phone: Settings โ†’ Chats โ†’ Export chat โ†’ Without media
  2. Import:
whatsapp_history action=import path="/path/to/exports"
whatsapp_history action=import path="/path/to/chat.txt" chatName="Family Group"

Full History Resync

Pull 3+ years of history with a single re-link:

curl -X POST http://localhost:18789/api/whatsapp/resync

Then scan the QR code. In testing: 17,609 messages across 1,229 chats spanning 3+ years.

Database: ~/.openclaw/data/whatsapp-history.db (SQLite + WAL mode)


๐ŸŽค Voice Pipeline

Incoming Voice Notes

Voice notes are transcribed before prefix checking:

Voice note โ†’ Download OGG โ†’ Transcribe (Whisper) โ†’ Check triggerPrefix โ†’ Process

Say "Jarvis, what's on my calendar?" โ€” the transcript is checked, prefix matches, agent responds. No prefix? Silently dropped after transcription.

Outgoing Metallic Voice

Send JARVIS-style voice replies with local TTS:

# Generate metallic voice note
jarvis-wa "Systems nominal, sir." /tmp/reply.ogg

# Send as WhatsApp voice note
message action=send channel=whatsapp target="+1234567890" filePath=/tmp/reply.ogg asVoice=true

Effects chain: 2x speed โ†’ +5% pitch โ†’ flanger โ†’ 15ms echo โ†’ high-pass 200Hz โ†’ treble +6dB

Requires sherpa-onnx-tts. See also jarvis-voice for the full speaker + webchat voice pipeline.


๐Ÿ”„ Offline Recovery

Gateway down? Messages aren't lost. WhatsApp delivers missed messages on reconnect, and OpenClaw processes them automatically (6-hour recovery window). Recovered messages are tagged [OFFLINE RECOVERY] so your agent can batch-review instead of blindly acting on stale requests.


Download & Transcribe Media

The history database stores full WAMessage protos including media encryption keys. Download any voice message, image, or document:

Media Type Proto Field Content Type
Voice/Audio audioMessage "audio"
Image imageMessage "image"
Video videoMessage "video"
Document documentMessage "document"
Sticker stickerMessage "sticker"

Media URLs expire โ€” download soon after receiving, or ensure the WhatsApp socket is connected for re-fetch.


Pairs Well With

Build a complete AI assistant stack:

Skill What It Adds
agent-memory-ultimate Cognitive memory โ€” your agent remembers WhatsApp conversations across sessions
sherpa-onnx-tts Local text-to-speech engine for metallic voice replies
jarvis-voice Full JARVIS voice pipeline โ€” webchat speakers + WhatsApp voice notes
openai-whisper Local speech-to-text for voice note transcription (no API costs)
agent-boundaries-ultimate Safety framework for agents with messaging access
shell-security-ultimate Command classification before your agent runs anything dangerous
gog Google Workspace โ€” your agent reads Gmail/Calendar and reports via WhatsApp
outlook-hack Outlook email access โ€” draft replies, check calendar, all via WhatsApp
ai-humor-ultimate 12 humor patterns โ€” make your agent's WhatsApp replies actually fun
youtube YouTube transcripts โ€” "Jarvis, summarize this video" works in WhatsApp

Comparison

Feature whatsapp-ultimate wacli whatsapp-business
Native integration โœ… Zero deps โŒ Go CLI binary โŒ External API + key
Actions 24+ ~6 ~10
Polls โœ… โŒ โŒ
Stickers โœ… โŒ โŒ
Voice notes โœ… โŒ โŒ
Reactions โœ… โŒ โŒ
Reply/Quote/Edit/Unsend โœ… โŒ โŒ
Full group management โœ… โŒ โŒ
Thinking indicator โœ… ๐Ÿค”โ†”๐Ÿง โŒ โŒ
3-rule group gate โœ… โŒ โŒ
DM prefix gate โœ… โŒ โŒ
Voice transcription โ†’ prefix check โœ… โŒ โŒ
SQLite history + FTS5 โœ… โœ… (sync) โŒ
Chat export import โœ… โŒ โŒ
Full history resync โœ… โŒ โŒ
Offline recovery โœ… โŒ โŒ
Personal WhatsApp โœ… โœ… โŒ (Business only)
Monthly cost $0 $0 $$ (Meta pricing)

JID Reference

Type Format Example
Individual <number>@s.whatsapp.net 34612345678@s.whatsapp.net
Group <id>@g.us 123456789012345678@g.us

OpenClaw auto-converts phone numbers to JID format when using to=.


Troubleshooting

Messages from contacts not reaching agent โ†’ Add them to allowFrom (not just groupAllowFrom). Group and DM access are separate.

Voice notes won't play โ†’ Use OGG/Opus: ffmpeg -i input.mp3 -c:a libopus -b:a 64k output.ogg

Agent responds to everything in groups โ†’ Set triggerPrefix: "jarvis" and ensure groupPolicy: "allowlist".

No typing indicator in groups โ†’ This is a WhatsApp limitation. The ๐Ÿค” thinking reaction is your indicator.


Architecture

Your Agent โ†’ OpenClaw message tool โ†’ WhatsApp Channel Plugin โ†’ Baileys โ†’ WhatsApp Servers

No external services. No Docker. No CLI tools. Direct protocol integration via Baileys.


Links


License

MIT โ€” Part of OpenClaw

Built by people who actually use their AI agent on WhatsApp every day.