Source Code

Video Ad Analyzer

AI-powered video content extraction using Google Gemini Vision.

What This Skill Does

Frame Extraction: Smart sampling with scene change detection
OCR Text Detection: Extract text overlays using EasyOCR
Audio Transcription: Convert speech to text with Google Cloud Speech
AI Scene Analysis: Describe each scene using Gemini Vision
Native Video Analysis: Direct video understanding for longer content
Thumbnail Generation: Auto-generate thumbnails from first frame

Setup

1. Environment Variables

# Required for Gemini Vision
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Required for audio transcription
# (same service account needs Speech-to-Text API enabled)

2. Dependencies

pip install opencv-python pillow easyocr ffmpeg-python google-cloud-speech vertexai google-api-python-client

Also requires ffmpeg and ffprobe installed on system.

Usage

Basic Video Analysis

from scripts.video_extractor import VideoExtractor
from scripts.models import ExtractedVideoContent
import vertexai
from vertexai.generative_models import GenerativeModel

# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
gemini_model = GenerativeModel("gemini-1.5-flash")

# Create extractor
extractor = VideoExtractor(gemini_model=gemini_model)

# Analyze video
result = extractor.extract_content("/path/to/video.mp4")

print(f"Duration: {result.duration}s")
print(f"Scenes: {len(result.scene_timeline)}")
print(f"Text overlays: {len(result.text_timeline)}")
print(f"Transcript: {result.transcript[:200]}...")

Extract Only Frames

frames, timestamps, text_timeline, scene_timeline, thumbnail = extractor.extract_smart_frames(
    "/path/to/video.mp4",
    scene_interval=2,    # Check for scene changes every 2s
    text_interval=0.5    # Check for text every 0.5s
)

Analyze Images

# Works with images too
result = extractor.extract_content("/path/to/image.jpg")
print(result.scene_timeline[0]['description'])

Output Structure

ExtractedVideoContent(
    video_path="/path/to/video.mp4",
    duration=30.5,
    transcript="Here's what we found...",
    text_timeline=[
        {"at": 0.0, "text": ["Download Now"]},
        {"at": 5.5, "text": ["50% Off Today"]}
    ],
    scene_timeline=[
        {"timestamp": 0.0, "description": "Woman using phone app..."},
        {"timestamp": 2.0, "description": "Product showcase with features..."}
    ],
    thumbnail_url="/static/thumbnails/video_thumb.jpg",
    extraction_complete=True
)

Key Features

Feature	Description
Scene Detection	Histogram-based change detection (threshold=65)
OCR Confidence	Tiered thresholds (0.5 high, 0.3 low)
AI Proofreading	Gemini cleans up OCR errors
Source Reconciliation	Merges OCR + Vision text intelligently
Native Video	Direct Gemini analysis for <20MB files

Prompts

Customize AI behavior by editing prompts in the prompts/ folder:

scene_analysis.md - Frame analysis prompts
scene_reconciliation.md - Scene enrichment prompts

Common Questions This Answers

"What text appears in this video ad?"
"Describe each scene in this creative"
"What does the narrator say?"
"Extract the call-to-action from this ad"

meta-video-ad-analyzer

Video Ad Analyzer

What This Skill Does

Setup

1. Environment Variables

2. Dependencies

Usage

Basic Video Analysis

Extract Only Frames

Analyze Images

Output Structure

Key Features

Prompts

Common Questions This Answers