senseaudio-meeting-summarizer

Transcribe meetings with SenseAudio ASR speaker diarization, timestamps, and meeting-note extraction workflows. Use when users need meeting transcription, meeting notes, speaker-separated transcripts, or action-item extraction from recordings.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "senseaudio-meeting-summarizer" with this command: npx skills add scikkk/meetingsummarizer

SenseAudio Meeting Summarizer

Transform meeting recordings into structured transcripts with speaker identification, timestamps, and local meeting summaries.

What This Skill Does

  • Transcribe recorded meetings with official SenseAudio ASR endpoints
  • Separate speakers with diarization on supported models
  • Generate word and segment timestamps for navigation
  • Produce local summaries and action-item candidates from transcripts
  • Support optional realtime transcription for live meetings
  • Export transcript and notes in text-friendly formats

Credential and Dependency Rules

  • Read the API key from SENSEAUDIO_API_KEY.
  • Send auth only as Authorization: Bearer <API_KEY>.
  • Do not place API keys in query parameters, logs, or saved examples.
  • If Python helpers are used, this skill expects python3, requests, and websockets.
  • This skill should work without any external LLM credentials.
  • If a user explicitly asks for LLM-based summarization, require a separately declared credential for that provider instead of assuming one exists.

Official ASR Constraints

Use the official SenseAudio ASR rules summarized below:

  • HTTP endpoint: POST https://api.senseaudio.cn/v1/audio/transcriptions
  • WebSocket endpoint: wss://api.senseaudio.cn/ws/v1/audio/transcriptions
  • File upload limit: <=10MB per request
  • Meeting-oriented HTTP model: sense-asr-pro
  • Realtime WebSocket model: sense-asr-deepthink
  • enable_speaker_diarization is supported only on sense-asr / sense-asr-pro
  • max_speakers is documented only for sense-asr-pro
  • enable_sentiment and timestamp_granularities[] are supported only on sense-asr / sense-asr-pro
  • WebSocket audio must be pcm, 16000Hz, mono

Recommended Workflow

  1. Validate the meeting asset:
  • Prefer clear audio with limited background noise.
  • Split files larger than 10MB before upload.
  1. Transcribe with the right mode:
  • Use HTTP sense-asr-pro for recorded meetings needing diarization and timestamps.
  • Use WebSocket sense-asr-deepthink only for live streaming scenarios.
  1. Request only needed features:
  • For meeting notes, use response_format=verbose_json.
  • Enable diarization, timestamps, and sentiment only when the user needs them.
  • Provide max_speakers only when known and using sense-asr-pro.
  1. Summarize locally first:
  • Build summaries, decisions, and action-item candidates from the transcript itself.
  • Keep the no-extra-credentials path as the default behavior.
  1. Handle sensitive output carefully:
  • Treat returned session_id, trace_id, and transcript contents as potentially sensitive.
  • Do not expose provider identifiers unless needed for debugging.

Minimal HTTP Transcription Helper

import os

import requests

API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/audio/transcriptions"


def transcribe_meeting(audio_file, max_speakers=None, language=None, target_language=None):
    with open(audio_file, "rb") as handle:
        response = requests.post(
            API_URL,
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": handle},
            data={
                "model": "sense-asr-pro",
                "response_format": "verbose_json",
                "enable_speaker_diarization": "true",
                "enable_sentiment": "true",
                "enable_punctuation": "true",
                "timestamp_granularities[]": ["word", "segment"],
                **({"max_speakers": max_speakers} if max_speakers else {}),
                **({"language": language} if language else {}),
                **({"target_language": target_language} if target_language else {}),
            },
            timeout=300,
        )
    response.raise_for_status()
    return response.json()

Transcript Processing Pattern

  • Read text for the full transcript
  • Read segments for speaker-separated timeline entries
  • Use speaker, start, end, text, and optional sentiment fields when present
  • Use words only when word timestamps were requested

Local Summary Pattern

Generate notes from transcript structure without external services:

  • summary: 3-6 bullets capturing the meeting arc
  • decisions: statements containing agreements or final choices
  • action_items: statements with owners, deadlines, or explicit follow-ups
  • participants: derived from speaker labels
  • timeline: ordered segments with timestamps

Heuristics that work without an LLM:

  • Detect action items from patterns like will, need to, follow up, by Friday
  • Detect decisions from patterns like decided, agreed, we will, final choice
  • Aggregate speaker time by summing end - start

Realtime Meeting Pattern

For live meetings, use WebSocket only when streaming audio is actually available.

import asyncio
import json
import os

import websockets

API_KEY = os.environ["SENSEAUDIO_API_KEY"]
WS_URL = "wss://api.senseaudio.cn/ws/v1/audio/transcriptions"


async def transcribe_live_meeting(audio_stream):
    async with websockets.connect(
        WS_URL,
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.recv()
        await ws.send(json.dumps({
            "event": "task_start",
            "model": "sense-asr-deepthink",
            "audio_setting": {
                "sample_rate": 16000,
                "format": "pcm",
                "channel": 1,
            },
        }))

        async for audio_chunk in audio_stream:
            await ws.send(audio_chunk)

        await ws.send(json.dumps({"event": "task_finish"}))

Output Options

  • Full transcript with timestamps in txt or json
  • Meeting notes in md
  • Action-item list in json or csv
  • Speaker statistics in json
  • Optional sentiment timeline when requested

Error Handling

  • Poor audio quality: run an audio-quality check first or request a cleaner recording
  • Large files: split recordings into chunks under 10MB
  • Wrong language detection: set language explicitly on HTTP transcription
  • Too many speakers: provide max_speakers only with sense-asr-pro
  • Realtime failures: inspect WebSocket task_failed and base_resp.status_msg

Safety Notes

  • Do not assume any external summarization provider credential exists.
  • Do not add an LLM call unless the user asks for it and the skill metadata is updated to declare that credential.
  • Prefer local transcript-based summarization for the default path.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Paired — Bluetooth Phone Bridge

Bridge an OpenClaw agent to the user's own phone via Bluetooth and ADB-over-USB. Provides SMS receive (MAP/MNS), SMS send (ADB autosend), outgoing calls (HFP...

Registry SourceRecently Updated
Automation

Stripe Setup

Add Stripe payments to any agent-built app. Covers checkout sessions, subscription billing, webhook handling, customer portal, and test-mode validation. Use...

Registry SourceRecently Updated
Automation

Memori

Long-term memory for OpenClaw agents using the Memori SDK. Automatically captures conversations and equips the agent with explicit tools to recall context ac...

Registry SourceRecently Updated
Automation

Email Excel Transfer

Automatyzuje workflow pobierania danych z email i wstawiania ich do arkuszy kalkulacyjnych. Użyj gdy użytkownik chce przenieść informacje z poczty do Excela....

Registry SourceRecently Updated