← Back to Shopping & E-commerce
Shopping & E-commerce by @kylelol

clawdbites

Extract recipes from Instagram reels

0
Source Code

Instagram Recipe Extractor

Extract recipes from Instagram reels using a multi-layered approach:

  1. Caption parsing β€” Instant, check description first
  2. Audio transcription β€” Whisper (local, no API key)
  3. Frame analysis β€” Vision model for on-screen text

No Instagram login required. Works on public reels.

When to Use

  • User sends an Instagram reel link
  • User mentions "recipe from Instagram" or "save this reel"
  • User wants to extract recipe details from a video post

How It Works (MANDATORY FLOW)

ALWAYS follow this complete flow β€” do not stop after caption if instructions are missing:

  1. User sends Instagram reel URL
  2. Extract metadata using yt-dlp (--dump-json)
  3. Parse the caption for recipe details
  4. Check completeness: Does caption have BOTH ingredients AND instructions?
    • βœ… YES: Present the recipe
    • ❌ NO (missing instructions or incomplete): Automatically proceed to audio transcription β€” do NOT stop or ask the user
  5. If audio transcription needed:
    • Download video: yt-dlp -o "/tmp/reel.mp4" "URL"
    • Extract audio: ffmpeg -y -i /tmp/reel.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/reel.wav
    • Transcribe: whisper /tmp/reel.wav --model base --output_format txt --output_dir /tmp
    • Merge caption ingredients with audio instructions
  6. Present clean, formatted recipe (combining caption + audio as needed)
  7. User decides what to do (save to notes, add to wishlist, etc.)

Completeness check heuristics:

  • Has ingredients = contains 3+ quantity+item patterns (e.g., "1 cup flour", "2 lbs chicken")
  • Has instructions = contains action verbs (blend, cook, bake, mix, pour, add) + sequence OR numbered steps

Extraction Command

yt-dlp --dump-json "https://www.instagram.com/reel/SHORTCODE/" 2>/dev/null

Key fields from JSON output:

  • description β€” The caption containing the recipe
  • uploader β€” Creator's name
  • channel β€” Creator's handle
  • webpage_url β€” Original URL
  • like_count β€” Popularity indicator

Recipe Parsing

Look for these patterns in the caption:

Macros:

  • "X Calories | Xg P | Xg C | Xg F"
  • "Macros per serving"
  • "Cal/Protein/Carbs/Fat"

Ingredients:

  • Lines starting with quantities (1 cup, 2 tbsp, 24oz)
  • Lines with measurement units
  • Emoji bullet points (πŸ₯© 🌽 πŸ§€ etc.)

Sections:

  • "For the [component]:"
  • "Ingredients:"
  • "Instructions:"
  • "Directions:"

Output Format

Present extracted recipe cleanly:

## [Recipe Name]
*From @[handle]*

**Macros (per serving):** X cal | Xg P | Xg C | Xg F

### Ingredients
- [ingredient 1]
- [ingredient 2]
...

### Instructions
1. [step 1]
2. [step 2]
...

---
Source: [original URL]

User Actions After Extraction

Let the user decide what to do:

  • "Save to my recipes" β†’ Save to Apple Notes (if meal-planner skill available)
  • "Add to wishlist" β†’ Save to memory/recipe-wishlist.json
  • "Just show me" β†’ Display only, no save
  • "Plan this for next week" β†’ Hand off to meal-planner skill

Wishlist Storage

Optional storage for recipes user wants to try later:

memory/recipe-wishlist.json:

{
  "recipes": [
    {
      "name": "Recipe Name",
      "source": "instagram",
      "sourceUrl": "https://instagram.com/reel/...",
      "handle": "@creator",
      "addedDate": "2026-01-26",
      "tried": false,
      "macros": {
        "calories": 585,
        "protein": 56,
        "carbs": 25,
        "fat": 28,
        "servings": 3
      },
      "ingredients": [...],
      "instructions": [...]
    }
  ]
}

Error Handling

If yt-dlp fails:

  • Check if URL is valid Instagram reel format
  • May be a private account β€” inform user
  • Suggest user paste caption text manually as fallback

If no recipe found in caption (IMPORTANT):

After extracting, scan the caption for recipe indicators:

  • Ingredient quantities (numbers + units like oz, cups, tbsp, lbs)
  • Recipe sections ("For the...", "Ingredients:", "Instructions:")
  • Cooking verbs (bake, cook, sautΓ©, mix, combine)
  • Macro information (calories, protein, carbs, fat)

If none found, tell the user clearly:

"I pulled the caption but it doesn't look like the recipe is there β€” it might just be a teaser or the recipe is only shown in the video itself. Here's what the caption says:

[show caption]

A few options:

  1. Check the comments β€” sometimes creators post recipes there
  2. Check their bio link β€” might lead to the full recipe
  3. Describe what you saw in the video and I can help find a similar recipe"

Recipe detection heuristics:

HAS_RECIPE if caption contains:
- 3+ ingredient-like patterns (quantity + food item)
- OR "recipe" + ingredient list
- OR macro breakdown + ingredients
- OR numbered/bulleted instructions

NO_RECIPE if caption is:
- Mostly hashtags
- Just a description/teaser
- Under 100 characters
- No quantities or measurements

Integration with meal-planner

The meal-planner skill can reference this skill:

  • When planning meals, check wishlist for untried recipes
  • Suggest wishlist recipes that match pantry items
  • Mark recipes as "tried" after they're used in a meal plan

Audio Transcription (V2) β€” MANDATORY FALLBACK

When caption is missing instructions, ALWAYS transcribe the audio automatically. Do not stop and ask the user β€” just do it. This is the most common case since creators often put ingredients in captions but speak the instructions.

Step 1: Download video

yt-dlp -o "/tmp/reel.mp4" "https://instagram.com/reel/XXX"

Step 2: Extract audio

ffmpeg -i /tmp/reel.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/reel.wav

Step 3: Transcribe with Whisper

/Users/kylekirkland/Library/Python/3.14/bin/whisper /tmp/reel.wav --model base --output_format txt --output_dir /tmp

Step 4: Parse transcript for recipe Look for cooking instructions, ingredients mentioned verbally.

Inference for Missing Measurements

ALWAYS infer quantities when not provided. Never present a recipe without amounts β€” estimate based on context and standard package sizes.

Vague Language β†’ Specific Amounts

What they say Infer
"some chicken" ~1 lb
"a bit of garlic" 2-3 cloves
"handful of spinach" ~2 cups
"drizzle of oil" 1-2 tbsp
"season to taste" Β½ tsp salt, ΒΌ tsp pepper
"splash of soy sauce" 1-2 tbsp
"a few tablespoons" 2-3 tbsp
"some rice" 1 cup dry
"cheese on top" Β½ - 1 cup shredded
"diced onion" 1 medium onion
"bell peppers" 2 peppers

Standard Package Sizes (when item mentioned without amount)

Ingredient Standard Package Infer
Puff pastry 17oz sheet 1 sheet
Ground beef/turkey 1 lb pack 1 lb
Chicken breast ~1.5 lb pack 1.5 lbs
Sausage links 14oz / 4-5 links 1 package
Bacon 12oz / 12 slices Β½ package (6 slices)
Shredded cheese 8oz bag 1-2 cups
Tortillas 8-10 count 1 package
Canned beans 15oz can 1 can
Broth/stock 32oz carton 1-2 cups
Pasta 16oz box 8oz (half box)
Rice 2 lb bag 1-2 cups dry

Context-Aware Scaling

By recipe type:

  • Stir fry for 2 β†’ 1 lb protein, 4 cups veggies
  • Soup/stew β†’ 1.5-2 lbs protein, 4 cups broth
  • Sheet pan meal β†’ 1.5 lbs protein, 3-4 cups veggies
  • Appetizers β†’ smaller portions, estimate ~12-15 pieces per batch

By servings mentioned:

  • "Serves 4" β†’ Scale standard amounts for 4
  • "Meal prep for the week" β†’ Assume 5-8 servings
  • No servings mentioned β†’ Default to 4 servings

By protein target (if user has macro goals):

  • 40-50g protein per serving β†’ ~6-8oz cooked meat per portion
  • Scale recipe protein accordingly

Output Format

Always present inferred amounts clearly:

### Ingredients
- 1 lb ground turkey *(estimated)*
- 1 medium onion, diced *(estimated)*
- 2 cups broth *(estimated based on typical soup)*

Mark inferred quantities with (estimated) so user knows what came from the source vs inference.

Combined Extraction Flow

1. TRY CAPTION (instant)
   └── yt-dlp --dump-json β†’ parse description
   └── Recipe found? β†’ DONE βœ…
   └── Check for "pinned" / "in comments" / "check comments" β†’ FLAG
   
2. IF FLAGGED: CHECK FOR CREATOR COMMENT
   └── Look through comments for creator's username
   └── If creator comment found with recipe β†’ DONE βœ…
   └── If not found β†’ continue + notify user

3. TRY AUDIO (30-60 sec)
   └── Download video
   └── Extract audio with ffmpeg
   └── Transcribe with Whisper (base model)
   └── Parse transcript for recipe
   └── Infer missing measurements
   └── Recipe found? β†’ DONE βœ…

4. PRESENT RESULTS + PROMPT IF NEEDED
   └── Show what was extracted from audio
   └── If "pinned" was flagged, tell user:
       "The creator mentioned the full recipe is pinned in the comments.
        I extracted what I could from the audio, but if you want the 
        exact measurements, paste the pinned comment here and I'll 
        merge it with what I found."
   
5. TRY FRAME ANALYSIS (if audio incomplete)
   └── Extract 5-8 key frames with ffmpeg
   └── Send to Claude vision
   └── Ask: "Extract any recipe text, ingredients, or measurements shown"
   └── Merge findings with audio transcript
   
6. FALLBACK (nothing found)
   └── Inform user: "Recipe wasn't in caption or audio/video"
   └── Offer: search for similar recipe based on video title/description

Frame Analysis

Extract key frames and analyze with vision model.

Extract frames:

# Extract 1 frame every 5 seconds
ffmpeg -i /tmp/reel.mp4 -vf "fps=1/5" /tmp/frame_%02d.jpg

# Or extract specific number of frames evenly distributed
ffmpeg -i /tmp/reel.mp4 -vf "select='not(mod(n,30))'" -vsync vfr /tmp/frame_%02d.jpg

Send to vision model: Use Claude's image analysis to read each frame:

  • Recipe cards / title screens
  • Ingredient lists shown on screen
  • Measurements in text overlays
  • Step-by-step instructions displayed

Vision prompt:

Analyze this frame from a cooking video. Extract any:
- Recipe name or title
- Ingredients with quantities
- Cooking instructions
- Nutritional information / macros
- Any other recipe-related text shown

If no recipe text is visible, respond with "No recipe text found."

Merge strategy:

  • Audio transcript = primary source (spoken instructions)
  • Frame analysis = supplement (exact measurements, recipe cards)
  • Combine both, prefer specific measurements from visual over inferred from audio

Pinned Comment Detection

Scan caption for these phrases (case-insensitive):

  • "recipe pinned"
  • "pinned in comments"
  • "check comments"
  • "in the comments"
  • "comment below"
  • "recipe below"
  • "full recipe in comments"

If detected, flag and notify user after extraction:

"Heads up β€” the creator said the recipe is pinned in the comments. I got what I could from the audio, but yt-dlp can't access pinned comments without login. If you want the exact recipe, copy the pinned comment and send it to me β€” I'll format it properly."

Requirements

  • yt-dlp β€” brew install yt-dlp
  • ffmpeg β€” brew install ffmpeg
  • whisper β€” pip3 install openai-whisper (runs locally, no API key)
  • No Instagram login required for public reels