Message Types
1. Initial Message (Client to Server)
Purpose
Establish the transcription session with initial parameters.
Sent
Once per session, immediately after the websocket connection is established.
Format
JSON
Fields
multilingual
: Indicates if multilingual transcription is enabled (boolean)language
: Language code for transcription (string, e.g., "en" for English)task
: Task to perform, e.g., "transcribe", "translate" (string)auth
: Authentication token or API key (string, optional)
Example request
{
"multilingual": false,
"language": "en",
"task": "transcribe",
"auth": "your-auth-token"
}
Example response
{
"uid": "2c0b8c68-c708-4d44-9c0e-01b7871458a7",
"message": "SERVER_READY"
}
2. Audio Data (Client to Server)
Purpose
Stream audio data for transcription.
Sent
Continuously after the initial message, as audio is captured.
Format
Binary
Content
Audio data in Float32Array format, resampled to 16kHz.
3. Transcription Results (Server to Client)
Purpose
Receive transcribed text from the server.
Format
JSON
Fields
uid
: Version 4 UUIDsegments
: An array of segments containing the start time, end time, and transcribed text:start
: floatend
: floattext
: string
Example response
{
"uid": "2c0b8c68-c708-4d44-9c0e-01b7871458a7",
"segments": [
{"start": 0, "end": 1.2, "text": "Hello, how are you?"},
{"start": 1.5, "end": 3, "text": "I'm good, thank you for asking."}
]
}