Polish Transcriptions Skill

Transform raw, machine-generated transcriptions into polished, cognitively-ordered Obsidian notes that are both readable and complete.

Objective

Convert poorly transcribed audio/video content (workshops, lectures, meetings, interviews) into well-structured, publication-ready documents while preserving 100% of the original information.

Core Principles

Zero Information Loss

[!danger] Critical Requirement Never omit, summarize, or compress information from the original. Every detail, example, tangent, question, and answer must be preserved. The output should contain MORE structure, not LESS content.

Cognitive Reorganization

Transform stream-of-consciousness speech into logical document sections:

Speech Pattern Transforms To

Topic jumping Grouped sections with headers

Repetition Single consolidated statement

Filler words/false starts Clean prose

Tangents Callouts or integrated context

Q&A interruptions Blockquote dialogues

Semantic Structure Over Chronological Order

Reorganize content by meaning, not by when things were said. A 2-hour rambling lecture about three topics becomes three clean sections, even if the speaker jumped between them.

Transformation Process

Phase 1: Analysis

Before writing anything:

Read the entire transcript — Understand all topics covered
Identify main themes — What are the 3-7 core topics?
Categorize content types:
Core instruction/information
Examples and anecdotes
Q&A interactions
Meta-commentary (jokes, digressions)
Action items or recommendations
Map relationships — Which topics depend on others?

Phase 2: Structure Design

Create a logical outline:

[Main Topic 1]

[Subtopic 1.1]

[Subtopic 1.2]

[Main Topic 2]

...

Use horizontal rules (--- ) to separate major topic shifts.

Phase 3: Content Transformation

Apply these transformations systematically:

Headers and Hierarchy

Main Section

Subsection

Point or Example

Dialogues and Q&A

Preserve speaker identities with blockquotes:

Participante: ¿Cómo funciona X? Instructor: X funciona de esta manera...

For multi-turn exchanges:

Estudiante: Primera pregunta Profesora: Respuesta inicial Estudiante: Pregunta de seguimiento Profesora: Respuesta expandida

Callouts for Special Content

Content Type Callout to Use

Key concept/principle

[!important]

Practical advice

[!tip] Recomendación

Warning/caution

[!warning]

Interesting aside

[!note]

Real-world example

[!example]

Quoted wisdom

[!quote]

Action items

[!todo]

Summary

[!abstract] or > [!tldr]

Success/conclusion

[!success]

Tables for Structured Data

Convert comparison discussions into tables:

Columna 1	Columna 2	Columna 3
Dato 1	Dato 2	Dato 3

Lists for Enumerated Content

When the speaker lists things (even implicitly):

Item one
Item two
- Sub-item
Item three

Mermaid Diagrams for Processes

When a process or flow is described:

graph LR
    A[Paso 1] --> B[Paso 2]
    B --> C[Paso 3]
    C --> D[Resultado]

Code Blocks for Technical Content

# Example code from the presentation
def example():
    return "formatted code"

Formatting Standards

Frontmatter

Always include appropriate YAML frontmatter:

date: YYYY-MM-DD professor: "[[Speaker Name]]"

or

speaker: "[[Speaker Name]]"

optional

tags:

workshop
topic

Text Formatting

Purpose Syntax Example

Key terms first mention bold

machine learning

Technical terms code

SQL

Emphasis italic

very important

Highlighting ==text==

==critical deadline==

Links

Create wikilinks for concepts that deserve their own notes:

Esto se relaciona con [[machine learning]] y [[data science]].

Anti-Patterns (What NOT To Do)

❌ Summarizing

El instructor habló sobre varios temas de datos.

El instructor cubrió tres áreas principales:

Integración de datos — consolidar información de múltiples fuentes
Limpieza y transformación — ordenar, depurar y preparar los datos
Análisis exploratorio — comprender patrones y comportamientos

❌ Removing "Unimportant" Content

(omitted anecdote about COVID impact)

[!example] Caso Real: El Impacto del COVID-19 En un banco donde trabajé, teníamos modelos de predicción de mora...

❌ Flattening Dialogue

Se discutió que SQL es el lenguaje principal.

Estudiante: ¿Qué es SQL? Profesora: SQL es el lenguaje de programación de bases de datos.

❌ Over-Structuring

Definición de Dato

Tipo 1

Subtipo A

Tipos de Datos

Tipo 1: Descripción
- Subtipo A

Quality Checklist

Before delivering the polished document:

Information complete — All original content is present
Logical structure — Grouped by topic, not chronology
Frontmatter present — Date, speaker/professor, optional tags
Headers used correctly — H2 for sections, H3 for subsections
Dialogues preserved — Q&A in blockquote format with speaker names
Callouts appropriate — Important points in [!tip] , [!important] , etc.
Tables where helpful — Comparisons and structured data formatted
Mermaid diagrams — Processes visualized when described
Bold for key terms — First mention of important concepts
Wikilinks created — Concepts linked with [[concept]]
Horizontal rules — Major topic separations marked with ---
Clean prose — No filler words, false starts, or transcription artifacts
No orphan headers — Every header has content below it

Example Transformation

Before (Raw Transcription)

bueno entonces ehh vamos a ver lo de las bases de datos entonces una base de datos es pues como un lugar donde guardas cosas no? ah esperen me olvidé de decirles mi nombre soy Carmen ehh entonces como les decía hay diferentes tipos de bases de datos algunas son relacionales otras no relacionales las relacionales usan SQL que es un lenguaje de programación bueno no exactamente programación pero sirve para consultar datos entonces SQL significa structured query language y sirve para hacer consultas a la base de datos...

After (Polished Document)

date: 2025-08-08 professor: "[[Carmen Marín]]"

Introducción a las Bases de Datos

Una base de datos es un almacén centralizado donde se guardan y organizan datos para su posterior acceso y manipulación.

Tipos de Bases de Datos

Tipo	Características
Relacional	Utiliza SQL, estructura tabular
No relacional	NoSQL, estructuras flexibles

SQL (Structured Query Language)

SQL es el lenguaje estándar para interactuar con bases de datos relacionales. Permite realizar consultas, inserciones, actualizaciones y eliminaciones de datos.

[!note] Aclaración Aunque SQL contiene elementos de programación, técnicamente es un lenguaje de consulta, no un lenguaje de programación de propósito general.

Workflow Integration

Suggested Process

Read the obsidian-markdown skill first for syntax reference
Analyze the complete raw transcript
Outline the logical structure
Transform section by section
Review against the quality checklist
Verify no information was lost by comparing key facts

Output Location

Polish transcriptions should be saved to the appropriate location in the user's vault, typically:

03 resources/ for workshops and external content
01 projects/.../classes/ for academic lectures
Same directory as source with a new filename

Success Criteria

A successfully polished transcription:

Reads like a well-written article — Not like speech
Contains all original information — Nothing omitted
Uses Obsidian features effectively — Callouts, tables, diagrams
Has clear cognitive structure — Easy to navigate and reference
Preserves speaker personality — Quotes and dialogues maintain voice
Is immediately usable — No further editing needed by user

References

Obsidian Callouts Documentation
Obsidian Properties and Frontmatter
Mermaid Diagram Syntax
CommonMark Specification

polish transcriptions

Safety Notice

Copy this and send it to your AI assistant to learn