โ† Back to Media & Streaming
Media & Streaming by @matusvojtek

tubescribe

YouTube video summarizer with speaker detection, formatted

0
Source Code

TubeScribe ๐ŸŽฌ

Turn any YouTube video into a polished document + audio summary.

Drop a YouTube link โ†’ get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.

๐Ÿ’ธ Free & No Paid APIs

  • No subscriptions or API keys โ€” works out of the box
  • Local processing โ€” transcription, speaker detection, and TTS run on your machine
  • Network access โ€” fetching from YouTube (captions, metadata, comments) requires internet
  • No data uploaded โ€” nothing is sent to external services; all processing stays on your machine
  • Safe sub-agent โ€” spawned sub-agent has strict instructions: no software installation, no network calls beyond YouTube

โœจ Features

  • ๐Ÿ“„ Transcript with summary and key quotes โ€” Export as DOCX, HTML, or Markdown
  • ๐ŸŽฏ Smart Speaker Detection โ€” Automatically identifies participants
  • ๐Ÿ”Š Audio Summaries โ€” Listen to key points (MP3/WAV)
  • ๐Ÿ“ Clickable Timestamps โ€” Every quote links directly to that moment in the video
  • ๐Ÿ’ฌ YouTube Comments โ€” Viewer sentiment analysis and best comments
  • ๐Ÿ“‹ Queue Support โ€” Send multiple links, they get processed in order
  • ๐Ÿš€ Non-Blocking Workflow โ€” Conversation continues while video processes in background

๐ŸŽฌ Works With Any Video

  • Interviews & podcasts (multi-speaker detection)
  • Lectures & tutorials (single speaker)
  • Music videos (lyrics extraction)
  • News & documentaries
  • Any YouTube content with captions

Quick Start

When user sends a YouTube URL:

  1. Spawn sub-agent with the full pipeline task immediately
  2. Reply: "๐ŸŽฌ TubeScribe is processing โ€” I'll let you know when it's ready!"
  3. Continue conversation (don't wait!)
  4. Sub-agent notification will announce completion with title and details

DO NOT BLOCK โ€” spawn and move on instantly.

First-Time Setup

Run setup to check dependencies and configure defaults:

python skills/tubescribe/scripts/setup.py

This checks: summarize CLI, pandoc, ffmpeg, Kokoro TTS

Full Workflow (Single Sub-Agent)

Spawn ONE sub-agent that does the entire pipeline:

sessions_spawn(
    task=f"""
## TubeScribe: Process {youtube_url}

โš ๏ธ CRITICAL: Do NOT install any software.
No pip, brew, curl, venv, or binary downloads.
If a tool is missing, STOP and report what's needed.

Run the COMPLETE pipeline โ€” do not stop until all steps are done.

### Step 1: Extract
```bash
python3 skills/tubescribe/scripts/tubescribe.py "{youtube_url}"

Note the Source and Output paths printed by the script. Use those exact paths in subsequent steps.

Step 2: Read source JSON

Read the Source path from Step 1 output and note:

  • metadata.title (for filename)
  • metadata.video_id
  • metadata.channel, upload_date, duration_string

Step 3: Create formatted markdown

Write to the Output path from Step 1:

  1. # **<title>**

  1. Video info block โ€” Channel, Date, Duration, URL (clickable). Empty line between each field.

  1. ## **Participants** โ€” table with bold headers:
    | **Name** | **Role** | **Description** |
    |----------|----------|-----------------|
    

  1. ## **Summary** โ€” 3-5 paragraphs of prose

  1. ## **Key Quotes** โ€” 5 best with clickable YouTube timestamps. Format each as:
    "Quote text here." - [12:34](https://www.youtube.com/watch?v=ID&t=754s)
    
    "Another quote." - [25:10](https://www.youtube.com/watch?v=ID&t=1510s)
    
    Use regular dash -, NOT em dash โ€”. Do NOT use blockquotes >. Plain paragraphs only.

  1. ## **Viewer Sentiment** (if comments exist)

  1. ## **Best Comments** (if comments exist) โ€” Top 5, NO lines between them:
    Comment text here.
    
    *- โ–ฒ 123 @AuthorName*
    
    Next comment text here.
    
    *- โ–ฒ 45 @AnotherAuthor*
    
    Attribution line: dash + italic. Just blank line between comments, NO --- separators.

  1. ## **Full Transcript** โ€” merge segments, speaker labels, clickable timestamps

Step 4: Create DOCX

Clean the title for filename (remove special chars), then:

pandoc <output_path> -o ~/Documents/TubeScribe/<safe_title>.docx

Step 5: Generate audio

Write the summary text to a temp file, then use TubeScribe's built-in audio generation:

# Write summary to temp file (use python3 to write, avoids shell escaping issues)
python3 -c "
text = '''YOUR SUMMARY TEXT HERE'''
with open('<temp_dir>/tubescribe_<video_id>_summary.txt', 'w') as f:
    f.write(text)
"

# Generate audio (auto-detects engine, voice, format from config)
python3 skills/tubescribe/scripts/tubescribe.py \
  --generate-audio <temp_dir>/tubescribe_<video_id>_summary.txt \
  --audio-output ~/Documents/TubeScribe/<safe_title>_summary

This reads ~/.tubescribe/config.json and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.

Step 6: Cleanup

python3 skills/tubescribe/scripts/tubescribe.py --cleanup <video_id>

Step 7: Open folder

open ~/Documents/TubeScribe/

Report

Tell what was created: DOCX name, MP3 name + duration, video stats. """, label="tubescribe", runTimeoutSeconds=900, cleanup="delete" )


**After spawning, reply immediately:**
> ๐ŸŽฌ TubeScribe is processing - I'll let you know when it's ready!
Then continue the conversation. The sub-agent notification announces completion.

## Configuration

Config file: `~/.tubescribe/config.json`

```json
{
  "output": {
    "folder": "~/Documents/TubeScribe",
    "open_folder_after": true,
    "open_document_after": false,
    "open_audio_after": false
  },
  "document": {
    "format": "docx",
    "engine": "pandoc"
  },
  "audio": {
    "enabled": true,
    "format": "mp3",
    "tts_engine": "mlx"
  },
  "mlx_audio": {
    "path": "~/.openclaw/tools/mlx-audio",
    "model": "mlx-community/Kokoro-82M-bf16",
    "voice": "af_heart",
    "lang_code": "a",
    "speed": 1.05
  },
  "kokoro": {
    "path": "~/.openclaw/tools/kokoro",
    "voice_blend": { "af_heart": 0.6, "af_sky": 0.4 },
    "speed": 1.05
  },
  "processing": {
    "subagent_timeout": 600,
    "cleanup_temp_files": true
  }
}

Output Options

Option Default Description
output.folder ~/Documents/TubeScribe Where to save files
output.open_folder_after true Open output folder when done
output.open_document_after false Auto-open generated document
output.open_audio_after false Auto-open generated audio summary

Document Options

Option Default Values Description
document.format docx docx, html, md Output format
document.engine pandoc pandoc Converter for DOCX (falls back to HTML)

Audio Options

Option Default Values Description
audio.enabled true true, false Generate audio summary
audio.format mp3 mp3, wav Audio format (mp3 needs ffmpeg)
audio.tts_engine mlx mlx, kokoro, builtin TTS engine (mlx = fastest on Apple Silicon)

MLX-Audio Options (preferred on Apple Silicon)

Option Default Description
mlx_audio.path ~/.openclaw/tools/mlx-audio mlx-audio venv location
mlx_audio.model mlx-community/Kokoro-82M-bf16 MLX model to use
mlx_audio.voice af_heart Voice preset (used if no voice_blend)
mlx_audio.voice_blend {af_heart: 0.6, af_sky: 0.4} Custom voice mix (weighted blend)
mlx_audio.lang_code a Language code (a=US English)
mlx_audio.speed 1.05 Playback speed (1.0 = normal, 1.05 = 5% faster)

Kokoro PyTorch Options (fallback)

Option Default Description
kokoro.path ~/.openclaw/tools/kokoro Kokoro repo location
kokoro.voice_blend {af_heart: 0.6, af_sky: 0.4} Custom voice mix
kokoro.speed 1.05 Playback speed (1.0 = normal, 1.05 = 5% faster)

Processing Options

Option Default Description
processing.subagent_timeout 600 Seconds for sub-agent (increase for long videos)
processing.cleanup_temp_files true Remove /tmp files after completion

Comment Options

Option Default Description
comments.max_count 50 Number of comments to fetch
comments.timeout 90 Timeout for comment fetching (seconds)

Queue Options

Option Default Description
queue.stale_minutes 30 Consider a processing job stale after this many minutes

Output Structure

~/Documents/TubeScribe/
โ”œโ”€โ”€ {Video Title}.html         # Formatted document (or .docx / .md)
โ””โ”€โ”€ {Video Title}_summary.mp3  # Audio summary (or .wav)

After generation, opens the folder (not individual files) so you can access everything.

Dependencies

Required:

  • summarize CLI โ€” brew install steipete/tap/summarize
  • Python 3.8+

Optional (better quality):

  • pandoc โ€” DOCX output: brew install pandoc
  • ffmpeg โ€” MP3 audio: brew install ffmpeg
  • yt-dlp โ€” YouTube comments: brew install yt-dlp
  • mlx-audio โ€” Fastest TTS on Apple Silicon: pip install mlx-audio (uses MLX backend for Kokoro)
  • Kokoro TTS โ€” PyTorch fallback: see https://github.com/hexgrad/kokoro

yt-dlp Search Paths

TubeScribe checks these locations (in order):

Priority Path Source
1 which yt-dlp System PATH
2 /opt/homebrew/bin/yt-dlp Homebrew (Apple Silicon)
3 /usr/local/bin/yt-dlp Homebrew (Intel) / Linux
4 ~/.local/bin/yt-dlp pip install --user
5 ~/.local/pipx/venvs/yt-dlp/bin/yt-dlp pipx
6 ~/.openclaw/tools/yt-dlp/yt-dlp TubeScribe auto-install

If not found, setup downloads a standalone binary to the tools directory. The tools directory version doesn't conflict with system installations.

Queue Handling

When user sends multiple YouTube URLs while one is processing:

Check Before Starting

python skills/tubescribe/scripts/tubescribe.py --queue-status

If Already Processing

# Add to queue instead of starting parallel processing
python skills/tubescribe/scripts/tubescribe.py --queue-add "NEW_URL"
# โ†’ Replies: "๐Ÿ“‹ Added to queue (position 2)"

After Completion

# Check if more in queue
python skills/tubescribe/scripts/tubescribe.py --queue-next
# โ†’ Automatically pops and processes next URL

Queue Commands

Command Description
--queue-status Show what's processing + queued items
--queue-add URL Add URL to queue
--queue-next Process next item from queue
--queue-clear Clear entire queue

Batch Processing (multiple URLs at once)

python skills/tubescribe/scripts/tubescribe.py url1 url2 url3

Processes all URLs sequentially with a summary at the end.

Error Handling

The script detects and reports these errors with clear messages:

Error Message
Invalid URL โŒ Not a valid YouTube URL
Private video โŒ Video is private โ€” can't access
Video removed โŒ Video not found or removed
No captions โŒ No captions available for this video
Age-restricted โŒ Age-restricted video โ€” can't access without login
Region-blocked โŒ Video blocked in your region
Live stream โŒ Live streams not supported โ€” wait until it ends
Network error โŒ Network error โ€” check your connection
Timeout โŒ Request timed out โ€” try again later

When an error occurs, report it to the user and don't proceed with that video.

Tips

  • For long videos (>30 min), increase sub-agent timeout to 900s
  • Speaker detection works best with clear interview/podcast formats
  • Single-speaker videos (tutorials, lectures) skip speaker labels automatically
  • Timestamps link directly to YouTube at that moment
  • Use batch mode for multiple videos: tubescribe url1 url2 url3