Audio Script Writer
Overview
Content transformation tool that converts written medical and scientific materials into professionally structured audio scripts suitable for podcasts, educational videos, audiobooks, and voiceover narration.
Key Capabilities:
- Format Conversion: Research papers → podcast scripts
- Spoken Word Optimization: Sentence restructuring for listening
- Pronunciation Guides: Medical terminology phonetic spelling
- Timing Estimation: Duration calculations for production planning
- Multi-Format Output: Podcast, video, lecture, audiobook templates
- Voice Direction: Tone, pace, and emphasis cues for narrators
When to Use
✅ Use this skill when:
- Creating medical education podcasts from journal articles
- Converting conference presentations to video scripts
- Developing audiobook versions of medical textbooks
- Scripting patient education audio materials
- Producing research summary videos for social media
- Adapting written case reports for audio case studies
- Creating voiceover scripts for e-learning modules
❌ Do NOT use when:
- Live presentation without script → Use improvisation
- Highly visual content (surgery videos) → Use visual-focused tools
- Interactive audio (Q&A format) → Use dialogue scripting tools
- Music or sound design planning → Use audio production software
- Voice recording itself → This creates scripts, not audio
Integration:
- Upstream:
abstract-summarizer(content condensation),lay-summary-gen(patient-friendly language) - Downstream:
medical-translation(multi-language scripts),voice-cloning-tool(AI narration)
Core Capabilities
1. Spoken Word Transformation
Convert written text to conversational audio style:
from scripts.audio_writer import AudioScriptWriter
writer = AudioScriptWriter()
# Transform written content
script = writer.convert_to_audio(
source_text=research_paper,
format="podcast", # podcast, video, lecture, audiobook
target_audience="medical_students",
duration_minutes=15
)
print(script.spoken_text)
# Converts: "The pathophysiology of diabetes mellitus involves..."
# To: "So what exactly happens in diabetes? Well, it all starts when..."
Transformation Rules:
| Written Style | Audio Style | Example |
|---|---|---|
| "Furthermore" | "Plus" | Less formal transitions |
| " et al." | "and their colleagues" | Expand abbreviations |
| Numbers in text | Spoken numbers | "15%" → "15 percent" |
| Long sentences | 15-20 word max | Break into digestible chunks |
| Passive voice | Active voice | "was observed" → "we saw" |
| Citations | Omit or footnote | "(Smith et al., 2024)" → [reference tone] |
2. Pronunciation Guide Generation
Create phonetic spelling for medical terms:
# Generate pronunciation guide
pronunciation = writer.create_pronunciation_guide(
text=script,
include_phonetic=True,
include_syllables=True
)
# Output:
# "Hyperlipidemia: hi-per-lip-i-DEE-mee-uh"
# "Metformin: met-FOR-min"
# "Atherosclerosis: ath-er-oh-skleh-ROH-sis"
Guide Elements:
- Phonetic Spelling: IPA or simplified phonetics
- Syllable Breaks: hy-per-ten-sion
- Emphasis Marking: Primary stress (CAPS), secondary stress
- Alternative Pronunciations: Regional variations (UK vs US)
- Sound-Alikes: "rhymes with..." for difficult terms
3. Timing and Pacing
Calculate speaking duration and mark pacing cues:
# Analyze timing
timing = writer.calculate_timing(
script=script,
speaking_rate="conversational", # slow, conversational, fast
include_pauses=True
)
print(f"Estimated duration: {timing.duration_minutes} minutes")
print(f"Word count: {timing.word_count}")
print(f"Pace: {timing.words_per_minute} WPM")
Speaking Rates:
| Style | WPM | Use Case |
|---|---|---|
| Slow/Educational | 120-130 | Patient education, complex topics |
| Conversational | 140-160 | Podcasts, general audience |
| Fast/News | 170-190 | Time-constrained content |
| Variable | Varies | Dynamic pacing with pauses |
Pacing Cues:
[BREATHE] - Brief pause for narrator
[PAUSE 2s] - Two-second pause for emphasis
[SLOW DOWN] - Reduce pace for key point
[SPEED UP] - Increase energy/excitement
[BEAT] - Dramatic pause
4. Multi-Format Templates
Generate scripts for different audio formats:
# Podcast episode
podcast = writer.create_podcast_script(
content=article,
episode_format="interview", # solo, interview, panel
include_intro_music=True,
ad_breaks=[5, 12] # minutes
)
# Educational video
video = writer.create_video_script(
content=lecture_slides,
visual_cues=True, # Mark where visuals change
b_roll_notes=True # Suggest supplemental footage
)
Format Types:
| Format | Characteristics | Best For |
|---|---|---|
| Podcast | Conversational, segments, ads | Long-form content, interviews |
| Video | Visual cues, B-roll notes | YouTube, educational platforms |
| Lecture | Structured, Q&A breaks | Online courses, training |
| Audiobook | Chapter markers, consistent tone | Textbooks, memoirs |
| News | Tight, factual, quick | Research briefs, updates |
Common Patterns
Pattern 1: Research Paper to Podcast
Scenario: Convert published study to 15-minute podcast episode.
# Convert paper to podcast script
python scripts/main.py \
--input paper.pdf \
--format podcast \
--duration 15 \
--style conversational \
--include-intro-outro \
--output podcast_script.txt
# Generate pronunciation guide
python scripts/main.py \
--input podcast_script.txt \
--generate-pronunciation \
--output pronunciation_guide.txt
Structure:
[INTRO MUSIC 5s]
HOST: Welcome to Medical Research Today. I'm your host...
[BREATHE]
HOST: Today we're diving into a fascinating study about...
[PAUSE]
HOST: So what did the researchers find? Well...
[BREATHE]
HOST: Dr. Smith, one of the study authors, explains...
[SOUND BITE: Interview clip]
...
[OUTRO MUSIC]
Pattern 2: Medical Lecture Recording
Scenario: Convert lecture notes to video script for online course.
# Create lecture script
lecture = writer.create_lecture_script(
notes=lecture_content,
duration=45, # minutes
break_intervals=[15, 30], # minutes for student breaks
interaction_points=True # "Pause and think..." prompts
)
# Add visual cues
script = writer.add_visual_cues(
script=lecture,
slide_transitions=True,
animation_notes=True
)
Lecture Elements:
- Learning objectives at start
- Periodic comprehension checks
- Break reminders
- Transition phrases between topics
- Summary and key takeaways
Pattern 3: Patient Education Audio
Scenario: Create audio guide for diabetes management.
# Patient-friendly script
patient_script = writer.create_patient_script(
medical_content=diabetes_guide,
reading_level=6, # 6th grade
empathetic_tone=True,
key_points_highlighted=True
)
# Slow, clear pacing
patient_script.adjust_pacing(
wpm=130,
pause_after_sentences=1.5 # seconds
)
Patient Script Features:
- Simple language (avoid medical jargon)
- Empathetic tone
- Clear action steps
- Reassuring statements
- Repetition of key points
Pattern 4: Conference Presentation to Video
Scenario: Adapt live presentation to YouTube video format.
# Convert presentation script
python scripts/main.py \
--input presentation_transcript.txt \
--format video \
--platform youtube \
--include-hooks true \
--engagement-cues true \
--output youtube_script.txt
YouTube Optimization:
- Hook in first 30 seconds
- Engagement questions for comments
- Call to action (subscribe, like)
- Timestamp markers for chapters
- B-roll suggestions for visual interest
Complete Workflow Example
From research paper to published podcast:
# Step 1: Extract and summarize content
python scripts/main.py \
--input paper.pdf \
--extract-key-points \
--output key_points.txt
# Step 2: Convert to audio script
python scripts/main.py \
--input key_points.txt \
--format podcast \
--duration 20 \
--output raw_script.txt
# Step 3: Add production elements
python scripts/main.py \
--input raw_script.txt \
--add-music-cues \
--add-sound-effects \
--add-pacing-marks \
--output production_script.txt
# Step 4: Generate pronunciation guide
python scripts/main.py \
--input production_script.txt \
--generate-pronunciation \
--output pronunciations.txt
# Step 5: Create timing breakdown
python scripts/main.py \
--input production_script.txt \
--calculate-timing \
--output timing_breakdown.txt
Python API:
from scripts.audio_writer import AudioScriptWriter
from scripts.pronunciation import PronunciationGuide
from scripts.timing import TimingCalculator
# Initialize
writer = AudioScriptWriter()
pronouncer = PronunciationGuide()
timing = TimingCalculator()
# Read source material
with open("research_article.txt", "r") as f:
content = f.read()
# Step 1: Convert to spoken format
script = writer.convert_to_audio(
text=content,
format="podcast",
target_duration=15, # minutes
audience="general_medical"
)
# Step 2: Add production elements
script_with_cues = writer.add_production_cues(
script=script,
music_stings=True,
transition_effects=True
)
# Step 3: Generate pronunciation guide
medical_terms = pronouncer.extract_terms(script_with_cues)
pronunciation_guide = pronouncer.create_guide(medical_terms)
# Step 4: Calculate timing
timing_analysis = timing.calculate(
script=script_with_cues,
speaking_rate=150 # WPM
)
# Export complete production package
writer.export_production_package(
script=script_with_cues,
pronunciation=pronunciation_guide,
timing=timing_analysis,
output_dir="podcast_production/"
)
Quality Checklist
Content Quality:
- Written content accurate and current
- Sources cited (even if not spoken)
- Medical facts verified by expert
- Appropriate for target audience level
- No confidential patient information
Audio Optimization:
- Sentences 15-20 words maximum
- Abbreviations expanded on first use
- Complex terms have pronunciation guides
- Active voice preferred over passive
- Transitions smooth and conversational
Production Quality:
- Timing realistic for content density
- Pacing cues appropriate for subject
- Music/sound cues marked clearly
- Pronunciation guide comprehensive
- Script formatted for easy reading
Before Recording:
- CRITICAL: Script read aloud for flow
- Difficult pronunciations practiced
- Timing tested with stopwatch
- Technical terms confirmed with subject expert
- Copyright cleared for any quoted material
Common Pitfalls
Content Issues:
-
❌ Too dense → Information overload for listeners
- ✅ Break complex topics into multiple episodes
-
❌ Visual dependencies → "As shown in Figure 3..."
- ✅ Describe visuals or omit visual-dependent content
-
❌ Citation overload → Every sentence has reference
- ✅ Save citations for show notes, not narration
Audio Issues:
-
❌ Written-style language → "Furthermore, the aforementioned..."
- ✅ Conversational: "Plus, this thing we talked about..."
-
❌ No pauses → Relentless information delivery
- ✅ Build in breathing room; let points sink in
-
❌ Ignoring pronunciation → Mispronounced medical terms
- ✅ Research and practice all technical terms
Production Issues:
-
❌ Underestimating time → 10 minutes of script takes 12+ to record
- ✅ Add 20% buffer for retakes and natural pacing
-
❌ Complex sentence structures → Tongue twisters for narrator
- ✅ Short sentences; avoid nested clauses
References
Available in references/ directory:
audio_writing_best_practices.md- Broadcast writing guidelinesmedical_pronunciation_guide.md- Common terms phoneticspodcast_production_standards.md- Industry format standardsaccessibility_guidelines.md- Inclusive audio contentplatform_requirements.md- YouTube, Spotify, Apple specsvoice_care_tips.md- Narrator health and performance
Scripts
Located in scripts/ directory:
main.py- CLI interface for script conversionaudio_writer.py- Core text-to-audio transformationpronunciation.py- Medical terminology phoneticstiming.py- Duration calculation and pacingformat_templates.py- Podcast, video, lecture templatesvoice_direction.py- Narrator cues and directionaccessibility.py- Alternative format generation
Limitations
- Voice Performance: Script is text only; actual delivery varies by narrator
- Accent Variations: Pronunciation guides may not match all dialects
- Cultural Context: Humor and references may not translate across cultures
- Copyright: Cannot use copyrighted material without permission
- Technical Accuracy: Does not verify medical content (input-dependent)
- Live Elements: Cannot script unscripted interviews or Q&A
Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
--input, -i | string | - | No | Input text file path |
--output, -o | string | - | No | Output JSON file path (default: stdout) |
--text | string | - | No | Direct text input (alternative to --input) |
--duration, -d | int | 5 | No | Target duration in minutes |
--pace, -p | string | normal | No | Speaking pace (slow, normal, fast) |
--style, -s | string | conversational | No | Script style (conversational, formal, educational) |
Usage
Basic Usage
# Convert from file
python scripts/main.py --input article.txt --duration 5 --output script.json
# Direct text input
python scripts/main.py --text "Medical research findings..." --duration 3
# From stdin
cat article.txt | python scripts/main.py --duration 5 --style conversational
# With specific style and pace
python scripts/main.py --input paper.txt --style educational --pace slow
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python script executed locally | Low |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Low |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output saved only to specified location | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access
- Output does not expose sensitive information
- Prompt injection protections in place
- Input validation for file paths
- Output directory restricted to workspace
- Script execution in sandboxed environment
Prerequisites
# Python 3.7+
# No additional packages required (uses standard library)
Evaluation Criteria
Success Metrics
- Successfully converts text to audio-optimized script
- Expands abbreviations and converts numbers to words
- Calculates estimated duration based on word count
- Applies style-specific formatting
- Provides pronunciation notes for medical terms
Test Cases
- Basic Conversion: Convert text file → Returns audio script with metadata
- Abbreviation Handling: Text with "e.g., i.e., etc." → All expanded in output
- Number Conversion: Input with "1 in 4" → Output with "one in four"
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Add support for custom abbreviation dictionaries
- Integrate with text-to-speech engines
- Add multilingual support
🎙️ Pro Tip: The best audio scripts sound natural when spoken. Always read your script aloud before finalizing—if you stumble over a sentence, your narrator will too. Revise for the ear, not the eye.