说话人分离

Speaker Diarization

上传一段多人对话音频，服务返回带时间戳的说话人分段结果。适用于会议纪要、播客转写、通话质检、访谈剪辑—— 谁在什么时间说了话，一次调用说清楚。

Upload a multi-speaker audio file and the service returns timestamped speaker-labeled segments. Great for meeting minutes, podcast transcription, call QA, and interview cutting — who spoke, and when, in a single request.

POST /diarize 稳定Stable v1.2.4 最近更新 2026-04-23Updated 2026-04-23

WAV

3:04

POST /diarize

0:000:461:322:183:04

ASpeaker A BSpeaker B CSpeaker C

限制与约束

Limits & Constraints

超出任一限制将返回 413 Payload Too Large 或 422 Validation Error—— 参见 §05 错误码。

Exceeding any limit returns 413 Payload Too Large or 422 Validation Error — see §05 Error Codes.

单文件大小

FILE SIZE

500MB

硬上限；推荐 ≤ 200 MB

Hard cap; ≤ 200 MB recommended

音频时长

DURATION

2小时hours

推荐 ≤ 60 分钟，越长内存占用越高

≤ 60 min recommended; longer clips need more memory

格式

FORMATS

5种types

WAV · MP3 · FLAC · M4A · OGG

说明：音频被内部重采样至 16 kHz 单声道；采样率、声道数、位深不受限制。模型无语音识别输出，返回的 text 字段始终为空字符串。

Note: audio is internally resampled to 16 kHz mono; sample rate, channels, and bit depth are unrestricted on input. The model does not perform ASR — the text field is always an empty string.

性能指标

Performance

端到端延迟 · 音频时长 → 处理时长End-to-end latency · audio duration → processing time P50 P99

P50 · 中位数 一半请求比它快、一半比它慢。代表典型的日常体验。

P50 · median Half of the requests are faster, half are slower. This is the typical experience.

P99 · 尾部 只有 1% 的请求比它更慢。用它来评估最差情况是否仍可接受。

P99 · tail Only 1% of requests are slower than this. Use it to judge worst-case acceptability.

在线体验

Try It

上传一段音频（或拖进虚线框）。返回结果后可点击时间轴或列表定位播放，一目了然。

Upload an audio file (or drag into the dashed box). After the result comes back, click the timeline or list to jump to playback.

POST /diarize · 文件模式POST /diarize · File mode multipart/form-data · file 需要实时？→ 流式页面Need real-time? → Stream page

音频文件AUDIO FILE

点击选择或拖拽音频到此处

Click to select, or drag an audio file here

WAV · MP3 · FLAC · M4A · OGG · MAX 500 MB

输出结果OUTPUT

选择一个音频文件，然后点击 开始分离。

Choose an audio file, then click Diarize.

⌘/Ctrl + ↵ 快捷提交

⌘/Ctrl + ↵ to submit

请求与响应

Request & Response

先看下面请求字段和响应结构，再切到最下方语言 tab 复制可直接运行的代码片段。需要边说边分请看说话人日志 · 实时流。

Start by reading the request fields and response shape below, then scroll to the language tabs at the bottom for copy-paste-ready snippets. Need diarization as-you-speak? See Speaker Diarization · Stream.

POST /diarize

上传一段音频，返回说话人分段列表（segments[]，按起始时间升序）。请求为 multipart/form-data。本接口 不做 ASR，text 恒为空串。

Upload a single audio file, get a speaker-segment list (segments[]) sorted by start time. Request body is multipart/form-data. This endpoint does not perform ASR; the text field is always an empty string.

请求字段

Request fields

字段

Field

类型

Type

要求

Required

说明

Description

file

必填

required

音频文件。接受 wav / mp3 / flac / m4a / ogg。任意采样率/声道，内部重采样至 16 kHz 单声道。

Audio file. Accepts wav / mp3 / flac / m4a / ogg. Any sample rate or channel layout — internally resampled to 16 kHz mono.

Authorization

header

可选

optional

仅当服务端 AUTH_ENABLED=True 时需要。格式 Bearer <token>，token 由 POST /auth/login 获取。

Required only when the server has AUTH_ENABLED=True. Format: Bearer <token>. Obtain via POST /auth/login.

响应

Response

200 OK · APPLICATION/JSON TranscriptionResponse

// segments[] 按 start_time 升序；text 恒为空串（本接口不做 ASR）
{
  "segments": [
    {
      "start_time": 0.00,
      "end_time":   4.71,
      "speaker_id": 0,
      "text": ""
    },
    {
      "start_time": 5.21,
      "end_time":  11.22,
      "speaker_id": 1,
      "text": ""
    }
  ]
}

响应字段

Response fields

字段

Field

类型

Type

要求

Required

说明

Description

segments

array

—

说话人分段列表，按起始时间升序排列。

Speaker-labeled segments sorted by start time ascending.

segments[].start_time

float

—

段起点，单位秒，自音频 0 时刻起算。

Segment start time in seconds, relative to the start of the audio.

segments[].end_time

float

—

段终点，单位秒。end_time - start_time 即为该段时长。

Segment end time in seconds. Segment duration is end_time - start_time.

segments[].speaker_id

int

—

说话人编号，0 起；同一 id 代表同一人。本次请求内稳定，不跨请求对齐。

Speaker ID starting at 0; same id = same speaker. Stable within a single request; not aligned across requests.

segments[].text

string

—

本接口不做 ASR，始终为空串。需要文本请拼接 ASR 服务。

Diarization-only endpoint — always empty. Pipe to an ASR service if you need text.

代码示例

Code snippets

错误码

Error Codes

成功响应返回 200 及 segments[]；失败响应为 application/json，形如 { "detail": "..." }。

Success returns 200 with segments[]. Errors are application/json in the form { "detail": "..." }.

400

BAD_REQUEST

文件字段缺失或表单格式错误。

BAD_REQUEST

Missing file field or malformed multipart body.

401

UNAUTHORIZED

启用鉴权时，Authorization 头缺失或 token 不匹配。

UNAUTHORIZED

Auth is enabled but Authorization header is missing or token mismatch.

413

PAYLOAD_TOO_LARGE

文件超限，见 §01。

PAYLOAD_TOO_LARGE

File exceeds the limits in §01.

422

VALIDATION_ERROR

字段校验失败；一般为类型或格式错误。

VALIDATION_ERROR

Field validation failed — usually a type or format issue.

500

INTERNAL_ERROR

推理异常。重试通常可恢复；若持续失败，请联系运维。

INTERNAL_ERROR

Inference error. A retry usually recovers; contact ops if it persists.

503

MODEL_NOT_LOADED

模型尚未加载完成。GET /readiness 可探活。

MODEL_NOT_LOADED

Model is still loading. Use GET /readiness to probe.

AI 集成 — 一键复制提示词

AI Integration — Copy the Prompt

将下面这段提示词复制到 Claude / Cursor / ChatGPT 里，让 AI 替你写接入代码。提示词含接口契约、鉴权、重试、错误处理——很难出错。

Paste the prompt below into Claude / Cursor / ChatGPT and let the AI write your integration code. It bakes in the contract, auth, retries, and error handling — hard to get wrong.

AI-READY PROMPT

— 行 · 含接口契约 / 鉴权 / 重试 / 代码骨架 lines · contract · auth · retry · code skeleton

用 AI 快速集成

Integrate fast with AI

粘贴到对话框，加一句「用我的技术栈实现」即可。已覆盖边界情况：大文件分片建议、限流退避、鉴权缺失、模型加载中等。

Paste into the chat and add "implement in my stack". Covers edge cases: large-file chunking, rate-limit backoff, missing auth, model-loading state.