Transcription Automation

Comprehensive skill for automating audio/video transcription and content processing.

Core Workflows

1. Transcription Pipeline

TRANSCRIPTION FLOW:
┌─────────────────┐
│  Audio/Video    │
│     Input       │
└────────┬────────┘
         ▼
┌─────────────────┐
│  Pre-Processing │
│  - Convert      │
│  - Enhance      │
│  - Split        │
└────────┬────────┘
         ▼
┌─────────────────┐
│  Transcription  │
│  - STT Engine   │
│  - Diarization  │
└────────┬────────┘
         ▼
┌─────────────────┐
│ Post-Processing │
│  - Format       │
│  - Timestamps   │
│  - Speakers     │
└────────┬────────┘
         ▼
┌─────────────────┐
│     Output      │
│  - Text/SRT/VTT │
│  - Summary      │
└─────────────────┘

2. Transcription Configuration

transcription_config:
  engine: whisper  # whisper, assembly_ai, deepgram
  
  audio_settings:
    sample_rate: 16000
    channels: mono
    format: wav
    
  transcription:
    language: auto  # or specific: en, zh, es
    model: large  # tiny, base, small, medium, large
    task: transcribe  # transcribe or translate
    
  features:
    speaker_diarization: true
    word_timestamps: true
    punctuation: true
    profanity_filter: false
    
  output:
    formats:
      - txt
      - srt
      - vtt
      - json
    include_confidence: true
    include_timestamps: true

Meeting Transcription

Meeting Notes Template

meeting_transcript:
  metadata:
    title: "{{meeting_title}}"
    date: "{{date}}"
    duration: "{{duration}}"
    attendees: "{{speakers}}"
    
  output_template: |
    # {{title}}
    
    **Date:** {{date}}
    **Duration:** {{duration}}
    **Attendees:** {{attendees}}
    
    ## Summary
    {{ai_summary}}
    
    ## Key Points
    {{#each key_points}}
    - {{this}}
    {{/each}}
    
    ## Action Items
    {{#each action_items}}
    - [ ] {{task}} - @{{assignee}} - Due: {{due_date}}
    {{/each}}
    
    ## Full Transcript
    {{#each segments}}
    **[{{timestamp}}] {{speaker}}:** {{text}}
    
    {{/each}}

Speaker Diarization

diarization_config:
  min_speakers: 2
  max_speakers: 10
  
  speaker_labels:
    - name: "Speaker 1"
      voice_sample: "sample_1.wav"  # Optional
    - name: "Speaker 2"
      voice_sample: "sample_2.wav"
      
  output_format:
    speaker_prefix: true
    speaker_timestamps: true
    
  example_output: |
    [00:00:05] SPEAKER_1: Welcome everyone to today's meeting.
    [00:00:12] SPEAKER_2: Thanks for having us.
    [00:00:18] SPEAKER_1: Let's start with the agenda.

Subtitle Generation

SRT Format

subtitle_config:
  format: srt
  
  timing:
    max_duration: 7  # seconds per subtitle
    min_gap: 0.1     # seconds between subtitles
    chars_per_line: 42
    max_lines: 2
    
  style:
    case: sentence  # sentence, upper, lower
    numbers: words  # words, digits
    
  example_output: |
    1
    00:00:05,000 --> 00:00:08,500
    Welcome to today's presentation
    about transcription automation.
    
    2
    00:00:09,000 --> 00:00:12,000
    Let me start by explaining
    the basic concepts.

VTT Format

vtt_config:
  format: vtt
  
  features:
    cue_settings: true
    styling: true
    
  example_output: |
    WEBVTT
    
    00:00:05.000 --> 00:00:08.500 align:center
    Welcome to today's presentation
    about transcription automation.
    
    00:00:09.000 --> 00:00:12.000 align:center
    <v Speaker 1>Let me start by explaining
    the basic concepts.

Integration Workflows

Zoom Integration

zoom_transcription:
  trigger:
    event: recording_completed
    
  workflow:
    - step: download_recording
      source: zoom_cloud
      
    - step: transcribe
      engine: whisper
      language: auto
      
    - step: diarize
      identify_speakers: true
      
    - step: generate_notes
      template: meeting_notes
      include_summary: true
      extract_action_items: true
      
    - step: distribute
      destinations:
        - notion_page
        - slack_channel
        - email_attendees

YouTube Integration

youtube_subtitles:
  trigger:
    event: video_uploaded
    
  workflow:
    - step: download_audio
      source: youtube_video
      
    - step: transcribe
      engine: whisper
      task: transcribe
      
    - step: generate_subtitles
      formats: [srt, vtt]
      
    - step: translate
      target_languages: [es, zh, ja, de, fr]
      
    - step: upload_subtitles
      destination: youtube
      as_cc: true

Podcast Processing

podcast_workflow:
  input:
    source: rss_feed
    format: audio/mp3
    
  processing:
    - transcribe:
        engine: whisper
        model: large
        
    - generate_chapters:
        detect_topics: true
        min_duration: 60  # seconds
        
    - create_show_notes:
        summarize: true
        extract_links: true
        highlight_quotes: true
        
    - create_searchable_index:
        full_text: true
        timestamps: true
        
  output:
    - transcript_txt
    - chapters_json
    - show_notes_md
    - search_index

Language Support

Multi-Language Transcription

multilingual:
  auto_detect: true
  
  supported_languages:
    - code: en
      name: English
      model: large
      
    - code: zh
      name: Chinese
      model: large
      
    - code: es
      name: Spanish
      model: large
      
    - code: ja
      name: Japanese
      model: medium
      
  translation:
    enabled: true
    target: en
    preserve_original: true

Code-Switching

code_switching:
  enabled: true
  primary_language: en
  secondary_languages: [zh, es]
  
  output: |
    [00:01:23] The next topic is about 人工智能,
    which has been muy importante in recent years.
    
  handling:
    detect_language_per_segment: true
    tag_language_switches: true

Quality Enhancement

Post-Processing

post_processing:
  text_cleanup:
    - remove_filler_words: ["um", "uh", "like"]
    - fix_common_errors: true
    - normalize_numbers: true
    
  formatting:
    - add_punctuation: true
    - capitalize_sentences: true
    - paragraph_breaks: true
    
  speaker_attribution:
    - merge_short_segments: true
    - min_segment_duration: 1.0
    
  output_enhancement:
    - add_timestamps: true
    - highlight_keywords: true
    - generate_summary: true

Accuracy Metrics

TRANSCRIPTION QUALITY REPORT
═══════════════════════════════════════

File: meeting_2024_01_15.mp3
Duration: 45:32
Engine: Whisper Large

METRICS:
Word Error Rate (WER):  4.2%
Character Error Rate:   2.8%
Confidence Score:       0.94

SPEAKER DIARIZATION:
Speakers Detected: 4
Diarization Accuracy: 91%

PROCESSING TIME:
Total: 8m 23s
Real-time Factor: 0.18x

DETECTED ISSUES:
• Low confidence at 12:34 (background noise)
• Overlapping speech at 23:45
• Unknown speaker at 34:12

API Examples

OpenAI Whisper

import openai

# Transcribe audio
with open("meeting.mp3", "rb") as audio_file:
    transcript = openai.Audio.transcribe(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

# Access results
for segment in transcript.segments:
    print(f"[{segment.start:.2f}] {segment.text}")

AssemblyAI

import assemblyai as aai

transcriber = aai.Transcriber()

config = aai.TranscriptionConfig(
    speaker_labels=True,
    auto_chapters=True,
    entity_detection=True
)

transcript = transcriber.transcribe(
    "https://example.com/meeting.mp3",
    config=config
)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Best Practices

Quality Audio: Clean input = better output
Choose Right Model: Balance speed vs accuracy
Use Diarization: Identify speakers clearly
Post-Process: Clean up automated output
Verify Critical Content: Human review important
Consider Privacy: Handle sensitive content
Store Efficiently: Compress and index
Provide Context: Vocabulary hints help

Transcription Automation

Safety Notice

Copy this and send it to your AI assistant to learn

Transcription Automation

Core Workflows

1. Transcription Pipeline

2. Transcription Configuration

Meeting Transcription

Meeting Notes Template

Speaker Diarization

Subtitle Generation

SRT Format

VTT Format

Integration Workflows

Zoom Integration

YouTube Integration

Podcast Processing

Language Support

Multi-Language Transcription

Code-Switching

Quality Enhancement

Post-Processing

Accuracy Metrics

API Examples

OpenAI Whisper

AssemblyAI

Best Practices

Source Transparency

Related Skills

excel-automation

ppt-visual

smart-ocr

stock-analysis