说话人分离

Speaker Diarization

上传一段多人对话音频,服务返回带时间戳的说话人分段结果。 适用于会议纪要、播客转写、通话质检、访谈剪辑—— 谁在什么时间说了话,一次调用说清楚。

Upload a multi-speaker audio file and the service returns timestamped speaker-labeled segments. Great for meeting minutes, podcast transcription, call QA, and interview cutting — who spoke, and when, in a single request.

POST /diarize 稳定Stable v1.2.4 最近更新 2026-04-23Updated 2026-04-23
01

限制与约束

Limits & Constraints

超出任一限制将返回 413 Payload Too Large422 Validation Error—— 参见 §05 错误码

Exceeding any limit returns 413 Payload Too Large or 422 Validation Error — see §05 Error Codes.

单文件大小
FILE SIZE
500MB
硬上限;推荐 ≤ 200 MB
Hard cap; ≤ 200 MB recommended
音频时长
DURATION
2小时hours
推荐 ≤ 60 分钟,越长内存占用越高
≤ 60 min recommended; longer clips need more memory
格式
FORMATS
5types
WAV · MP3 · FLAC · M4A · OGG

说明:音频被内部重采样至 16 kHz 单声道;采样率、声道数、位深不受限制。 模型无语音识别输出,返回的 text 字段始终为空字符串。

Note: audio is internally resampled to 16 kHz mono; sample rate, channels, and bit depth are unrestricted on input. The model does not perform ASR — the text field is always an empty string.

02

性能指标

Performance

端到端延迟 · 音频时长 → 处理时长End-to-end latency · audio duration → processing time P50 P99
P50 · 中位数 一半请求比它快、一半比它慢。代表典型的日常体验。
P50 · median Half of the requests are faster, half are slower. This is the typical experience.
P99 · 尾部 只有 1% 的请求比它更慢。用它来评估最差情况是否仍可接受。
P99 · tail Only 1% of requests are slower than this. Use it to judge worst-case acceptability.
03

在线体验

Try It

上传一段音频(或拖进虚线框)。返回结果后可点击时间轴或列表定位播放,一目了然。

Upload an audio file (or drag into the dashed box). After the result comes back, click the timeline or list to jump to playback.

POST /diarize · 文件模式POST /diarize · File mode multipart/form-data · file 需要实时?→ 流式页面Need real-time? → Stream page
选择一个音频文件,然后点击 开始分离
Choose an audio file, then click Diarize.
⌘/Ctrl + ↵ 快捷提交
⌘/Ctrl + ↵ to submit
04

请求与响应

Request & Response

先看下面请求字段和响应结构,再切到最下方语言 tab 复制可直接运行的代码片段。 需要边说边分请看 说话人日志 · 实时流

Start by reading the request fields and response shape below, then scroll to the language tabs at the bottom for copy-paste-ready snippets. Need diarization as-you-speak? See Speaker Diarization · Stream.

POST /diarize

上传一段音频,返回说话人分段列表(segments[],按起始时间升序)。 请求为 multipart/form-data。本接口 不做 ASRtext 恒为空串。

Upload a single audio file, get a speaker-segment list (segments[]) sorted by start time. Request body is multipart/form-data. This endpoint does not perform ASR; the text field is always an empty string.

请求字段

Request fields

字段
Field
类型
Type
要求
Required
说明
Description
file
file
必填
required
音频文件。接受 wav / mp3 / flac / m4a / ogg。任意采样率/声道,内部重采样至 16 kHz 单声道。
Audio file. Accepts wav / mp3 / flac / m4a / ogg. Any sample rate or channel layout — internally resampled to 16 kHz mono.
Authorization
header
可选
optional
仅当服务端 AUTH_ENABLED=True 时需要。格式 Bearer <token>,token 由 POST /auth/login 获取。
Required only when the server has AUTH_ENABLED=True. Format: Bearer <token>. Obtain via POST /auth/login.

响应

Response

200 OK · APPLICATION/JSON TranscriptionResponse
// segments[] 按 start_time 升序;text 恒为空串(本接口不做 ASR)
{
  "segments": [
    {
      "start_time": 0.00,
      "end_time":   4.71,
      "speaker_id": 0,
      "text": ""
    },
    {
      "start_time": 5.21,
      "end_time":  11.22,
      "speaker_id": 1,
      "text": ""
    }
  ]
}

响应字段

Response fields

字段
Field
类型
Type
要求
Required
说明
Description
segments
array
说话人分段列表,按起始时间升序排列。
Speaker-labeled segments sorted by start time ascending.
segments[].start_time
float
段起点,单位秒,自音频 0 时刻起算。
Segment start time in seconds, relative to the start of the audio.
segments[].end_time
float
段终点,单位秒。end_time - start_time 即为该段时长。
Segment end time in seconds. Segment duration is end_time - start_time.
segments[].speaker_id
int
说话人编号,0 起;同一 id 代表同一人。本次请求内稳定,不跨请求对齐
Speaker ID starting at 0; same id = same speaker. Stable within a single request; not aligned across requests.
segments[].text
string
本接口不做 ASR,始终为空串。需要文本请拼接 ASR 服务。
Diarization-only endpoint — always empty. Pipe to an ASR service if you need text.

代码示例

Code snippets



    
05

错误码

Error Codes

成功响应返回 200segments[];失败响应为 application/json, 形如 { "detail": "..." }

Success returns 200 with segments[]. Errors are application/json in the form { "detail": "..." }.

400
BAD_REQUEST
文件字段缺失或表单格式错误。
BAD_REQUEST
Missing file field or malformed multipart body.
401
UNAUTHORIZED
启用鉴权时,Authorization 头缺失或 token 不匹配。
UNAUTHORIZED
Auth is enabled but Authorization header is missing or token mismatch.
413
PAYLOAD_TOO_LARGE
文件超限,见 §01。
PAYLOAD_TOO_LARGE
File exceeds the limits in §01.
422
VALIDATION_ERROR
字段校验失败;一般为类型或格式错误。
VALIDATION_ERROR
Field validation failed — usually a type or format issue.
500
INTERNAL_ERROR
推理异常。重试通常可恢复;若持续失败,请联系运维。
INTERNAL_ERROR
Inference error. A retry usually recovers; contact ops if it persists.
503
MODEL_NOT_LOADED
模型尚未加载完成。GET /readiness 可探活。
MODEL_NOT_LOADED
Model is still loading. Use GET /readiness to probe.
06

AI 集成 — 一键复制提示词

AI Integration — Copy the Prompt

将下面这段提示词复制到 Claude / Cursor / ChatGPT 里,让 AI 替你写接入代码。 提示词含接口契约、鉴权、重试、错误处理——很难出错

Paste the prompt below into Claude / Cursor / ChatGPT and let the AI write your integration code. It bakes in the contract, auth, retries, and error handling — hard to get wrong.

AI-READY PROMPT
行 · 含接口契约 / 鉴权 / 重试 / 代码骨架 lines · contract · auth · retry · code skeleton

AI 快速集成

Integrate fast with AI

粘贴到对话框,加一句「用我的技术栈实现」即可。已覆盖边界情况:大文件分片建议、限流退避、鉴权缺失、模型加载中等。

Paste into the chat and add "implement in my stack". Covers edge cases: large-file chunking, rate-limit backoff, missing auth, model-loading state.