Skip to main content

Protocol

Secure WebSocket (WSS)

Message Types

  1. Initial Message (Client to Server)
  2. Audio Data (Client to Server)
  3. Transcription Results (Server to Client)

For detailed message structures, see Message Types.

Audio Data Requirements

The server expects audio data to adhere to specific standards to ensure proper processing and transcription. Here's what you need to know:

Audio Format: S16LE

Bit Depth: 16 bits Signed: Uses signed integers, which means it can represent both positive and negative values. Endian: Little Endian, where the least significant byte is stored first.

Channel Configuration:

  • The audio data must be mono (single channel). Stereo or multi-channel audio can introduce unnecessary complexity and data redundancy, so converting to mono ensures uniformity.

Example Code for Resampling Audio

To help you meet these requirements, here's an example of how you can resample an audio file using Python and FFmpeg:

def resample_audio(input_file: str, new_sample_rate: int = 16000):
"""
Open an audio file, read it as mono waveform, resample if needed,
and save the modified audio file.
"""
try:
# Use ffmpeg to decode audio with resampling
output, _ = (
ffmpeg.input(input_file, threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=new_sample_rate)
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Error loading audio: {e.stderr.decode()}") from e
np_audio_buffer = np.frombuffer(output, dtype=np.int16)

modified_audio_file = f"{input_file.split('.')[0]}_modified.wav"
scipy.io.wavfile.write(modified_audio_file, new_sample_rate, np_audio_buffer.astype(np.int16))
return modified_audio_file

This function reads an audio file, converts it to mono, resamples it to the specified sample rate (default is 16000 Hz), and saves the modified audio file in S16LE format, ready to be sent to the server as a chunk.

Chunk Size Configuration:

class Client:
...
def __init__(self, server_host=None, server_port=None, api_key=None, multilingual=False, language=None, translate=False):
"""
Initialize an AudioClient instance for recording and streaming audio.
"""
self.audio_chunk_size = 1024 # Chunk size in bytes
self.audio_format = pyaudio.paInt16
self.audio_channels = 1
self.audio_rate = 16000
...

The chunk size is set to 1024 bytes, and the audio format is pyaudio.paInt16, which corresponds to 16-bit signed integer PCM format. The audio_channels is set to 1 for mono audio, and the audio_rate is set to 16000 Hz (16 kHz).