Message Types

1. Initial Message (Client to Server)

Purpose

Establish the transcription session with initial parameters.

Sent

Once per session, immediately after the websocket connection is established.

Format

JSON

Fields

multilingual: Indicates if multilingual transcription is enabled (boolean)
language: Language code for transcription (string, e.g., "en" for English)
task: Task to perform, e.g., "transcribe", "translate" (string)
auth: Authentication token or API key (string, optional)

Example request

{
    "multilingual": false,
    "language": "en",
    "task": "transcribe",
    "auth": "your-auth-token"
}

Example response

{
    "uid": "2c0b8c68-c708-4d44-9c0e-01b7871458a7",
    "message": "SERVER_READY"
}

2. Audio Data (Client to Server)

Purpose

Stream audio data for transcription.

Sent

Continuously after the initial message, as audio is captured.

Format

Binary

Content

Audio data in Float32Array format, resampled to 16kHz.

3. Transcription Results (Server to Client)

Purpose

Receive transcribed text from the server.

Format

JSON

Fields

uid: Version 4 UUID
segments: An array of segments containing the start time, end time, and transcribed text:
- start: float
- end: float
- text: string

Example response

{
 "uid": "2c0b8c68-c708-4d44-9c0e-01b7871458a7",
 "segments": [
   {"start": 0, "end": 1.2, "text": "Hello, how are you?"},
   {"start": 1.5, "end": 3, "text": "I'm good, thank you for asking."}
 ]
}

Message Types

1. Initial Message (Client to Server)​

Purpose​

Sent​

Format​

Fields​

Example request​

Example response​

2. Audio Data (Client to Server)​

Purpose​

Sent​

Format​

Content​

3. Transcription Results (Server to Client)​

Purpose​

Format​

Fields​

Example response​

1. Initial Message (Client to Server)

Purpose

Sent

Format

Fields

Example request

Example response

2. Audio Data (Client to Server)

Purpose

Sent

Format

Content

3. Transcription Results (Server to Client)

Purpose

Format

Fields

Example response