Meeting To Text

Use this skill when the job is a local file-to-transcript workflow.

Do not use this skill if the user only wants audio extraction, a meeting summary, environment setup, or an explanation of the models.

Inputs To Collect

Always collect:

Output target rules:

If the target ends with .txt, write exactly to that file.
Otherwise treat it as a directory and write <source-stem>_transcript.txt inside it.

Supported source types:

Read references/runtime_paths.md before running the script.

Run the bundled entrypoint with the local ASR environment:

& '<YOUR_CONDA_ENV_PYTHON_PATH>' 'C:\path\to\your\meeting-to-text\scripts\meeting_to_text.py' --input '<SOURCE_PATH>' --output '<OUTPUT_TARGET>'

If you need a stable temp location, add:

--work-dir '<YOUR_WORKSPACE_TEMP_PATH>'

The script may print library noise before the final machine-readable result.

Always treat the last non-empty stdout line as the JSON result object.

Interpret results this way:

Exit code 0 with status: success: transcript file was created with no warnings.
Exit code 0 with status: warning: transcript file was created, but you must report the warnings and any skipped segments.
Non-zero exit code or status: error: do not claim success; surface the warning list and the intended output path.

Important fields in the final JSON:

The entrypoint already enforces the workflow. Do not rewrite the pipeline ad hoc in the conversation.

The script will:

normalize audio with FFmpeg instead of renaming extensions
use local SenseVoiceSmall for ASR
use local 3D-Speaker embeddings plus clustering for diarization
write a plain text transcript with timestamps and 说话人N
stop on diarization failure instead of silently emitting a non-speaker-separated transcript

On success, report:

On failure, report:

the exit code category
the warning message from the JSON result
whether the failure happened during validation, media normalization, diarization, transcription, or output writing

Read these only when needed: