实时说话人日志
Real-time Speaker Diarization
接上麦克风 / 电话 / 会议音频,一边说、一边把每句话打上说话人标签。 基于 WebSocket 的流式接口:每当一句完整的话音结束,服务器立刻推送 partial_result—— 适合实时字幕、客服转写、多人会议现场分离。
Pipe a microphone / phone / conference feed in and get each utterance tagged with a speaker as it is spoken. A WebSocket streaming endpoint: the moment a full utterance ends, the server pushes a partial_result — ideal for live captions, call-center transcription, and meeting-room diarization.
限制与约束
Limits & Constraints
超出任一限制将返回 413 Payload Too Large 或 422 Validation Error——
参见 §05 错误码。
Exceeding any limit returns 413 Payload Too Large or 422 Validation Error —
see §05 Error Codes.
说明:其他格式组合会被拒绝并立刻关闭(close code 1008)。
浏览器端可直接用 AudioContext({ sampleRate: 16000 }),由浏览器自动重采样到 16 kHz。
Any other format combination is rejected with an immediate close
(close code 1008). Browsers can just use
AudioContext({ sampleRate: 16000 }) and let the browser resample.
性能指标
Performance
在线体验
Try It
点击话筒开始录音;每说完一句话,右侧就会实时多出一段结果。再次点击话筒或发送 close
触发最终结果。需要上传文件请看 说话人日志 · 文件。
Click the mic to start recording; after each utterance ends, a new segment appears on
the right. Click the mic again or send close to finalize. For file uploads,
see Speaker Diarization · File.
协议与响应
Protocol & Response
先看下方消息序列 + 字段定义,再切到最下方语言 tab 复制可直接运行的代码片段。 不需要实时请看 说话人日志 · 文件模式。
Start by reading the message sequence and field definitions below, then scroll to the language tabs for copy-paste-ready snippets. Don't need real-time? See Speaker Diarization · File mode.
WebSocket 握手 → 发 start JSON → 持续推 PCM 二进制帧(200 ms/帧推荐)→ 每句 VAD 闭合收到 partial_result → 发 close 得到 final_result 结束。
WebSocket handshake → send a start JSON → stream PCM binary frames (200 ms recommended) → receive partial_result per closed utterance → send close to get a final_result and end.
请求字段
Request fields
AUTH_ENABLED=True 时需要。URL 查询参数:?token=<token>。WebSocket 握手不支持自定义 Header 所以不能用 Authorization。AUTH_ENABLED=True. URL query parameter: ?token=<token>. WebSocket handshakes don't carry custom headers so Authorization isn't available.start 消息字段。必须为 16000,其他值以 close code 1008 拒绝。start message. Must be 16000; other values are rejected with close code 1008.start 消息字段。必须为 1(单声道)。start message. Must be 1 (mono).start 消息字段。字节宽度,必须为 2(16-bit PCM)。start message. Byte width; must be 2 (16-bit PCM).start 消息字段。必须为 "pcm_s16le"(16-bit 小端 PCM)。start message. Must be "pcm_s16le" (16-bit little-endian PCM).消息序列
Message sequence
{"type":"start","sample_rate":16000,"channels":1,"sample_width":2,"format":"pcm_s16le"}。仅支持 16 kHz · mono · s16le。{"type":"start","sample_rate":16000,"channels":1,"sample_width":2,"format":"pcm_s16le"}. Only 16 kHz · mono · s16le is accepted.{"type":"ready","session_id":"..."}。收到后即可推音频。{"type":"ready","session_id":"..."}. Start pushing audio once this arrives.segments 是到目前为止的完整快照,不是增量——客户端整体替换而非 append。segments is a full snapshot, not a delta — clients should replace, not append.{"type":"close"}。触发 pending 音频的 VAD 收尾。{"type":"close"}. Triggers VAD finalization on pending audio.partial_result 相同,本会话终态。服务器随后关闭连接。partial_result; the session's terminal state. The server then closes.{"type":"error","detail":"..."}。随后服务端以 close code 关闭(详见 §05)。{"type":"error","detail":"..."}. The server then closes with a close code (see §05).响应样本
Response sample
// 每完成一句话推一次;segments 是当前会话的完整快照,不是增量 { "type": "partial_result", "session_id": "6261ae2e072449438ee3d06f8e822ba1", "data": { "segments": [ { "start_time": 0.0, "end_time": 4.81, "speaker_id": 0, "text": "" } ] }, "stats": { "utterances": 1, "chunks": 6, "pipeline_ms": 29.5 } }
代码示例
Code snippets
错误码
Error Codes
失败时服务端先发一条 { "type": "error", "detail": "..." },随后以下列 WebSocket
close code 关闭连接。客户端对 4xxx 范围不要自动重试。
On error the server first sends { "type": "error", "detail": "..." }, then
closes with one of the WebSocket close codes below. Do not auto-retry on 4xxx codes.
Invalid audio payload。PCM 字节数非偶数,或格式不符。Invalid audio payload. Odd byte count, or format mismatch.Invalid streaming message。JSON 解析失败或字段不符合 StreamingStartRequest。Invalid streaming message. JSON parse failed or fields do not match StreamingStartRequest.?token= 缺失或不匹配。清 token 重新登录。?token= is missing or does not match. Clear the token and sign in again.close,服务端返回 final_result 后正常关闭。close; server responded with final_result and closed normally.start。start.AI 集成 — 一键复制提示词
AI Integration — Copy the Prompt
将下面这段提示词复制到 Claude / Cursor / ChatGPT 里,让 AI 替你写接入代码。 提示词含接口契约、鉴权、重试、错误处理——很难出错。
Paste the prompt below into Claude / Cursor / ChatGPT and let the AI write your client. It encodes the contract, auth, retries, and error handling — hard to get wrong.
用 AI 快速集成
Integrate fast with AI
粘贴到对话框,加一句「用我的技术栈实现」即可。已覆盖边界情况:大文件分片建议、限流退避、鉴权缺失、模型加载中等。
Paste into the chat and add "implement in my stack". Covers edge cases: large-file chunking, rate-limit backoff, missing auth, model-loading state.