Skip to main content

Message Types

1. Initial Message (Client to Server)

Purpose

Establish the transcription session with initial parameters.

Sent

Once per session, immediately after the websocket connection is established.

Format

JSON

Fields

  • multilingual: Indicates if multilingual transcription is enabled (boolean)
  • language: Language code for transcription (string, e.g., "en" for English)
  • task: Task to perform, e.g., "transcribe", "translate" (string)
  • auth: Authentication token or API key (string, optional)

Example request

{
"multilingual": false,
"language": "en",
"task": "transcribe",
"auth": "your-auth-token"
}

Example response

{
"uid": "2c0b8c68-c708-4d44-9c0e-01b7871458a7",
"message": "SERVER_READY"
}

2. Audio Data (Client to Server)

Purpose

Stream audio data for transcription.

Sent

Continuously after the initial message, as audio is captured.

Format

Binary

Content

Audio data in Float32Array format, resampled to 16kHz.

3. Transcription Results (Server to Client)

Purpose

Receive transcribed text from the server.

Format

JSON

Fields

  • uid: Version 4 UUID
  • segments: An array of segments containing the start time, end time, and transcribed text:
    • start: float
    • end: float
    • text: string

Example response

{
"uid": "2c0b8c68-c708-4d44-9c0e-01b7871458a7",
"segments": [
{"start": 0, "end": 1.2, "text": "Hello, how are you?"},
{"start": 1.5, "end": 3, "text": "I'm good, thank you for asking."}
]
}