Pass the token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
headers
type:object
model
type:enum
TTS model to use for this session
Available options: s1
Start TTS Session
type:object
Initiates a TTS streaming session with configuration.
This must be the first message sent after connecting. It contains all the
configuration for voice, audio format, and generation parameters.
Send Text Chunk
type:object
Sends a chunk of text for synthesis.
You can send multiple TextEvent messages in sequence. The server will buffer
and synthesize text according to the chunk_length parameter from StartEvent.
Flush Buffered Text
type:object
Forces immediate synthesis of all buffered text.
Use this when you want audio generated immediately without waiting for more
text or for the buffer to fill up. Useful for ensuring low latency in
interactive applications.
End TTS Session
type:object
Signals the end of the text stream.
After sending this event, the server will finish synthesizing any remaining
buffered text and send a FinishEvent before closing the connection.
Audio Chunk
type:object
Contains generated audio bytes.
You will receive multiple AudioEvent messages as audio is generated. Each
message contains a chunk of audio in the format you specified. Concatenate
all chunks to get the complete audio.
Session Complete
type:object
Signals that the TTS session has completed.
If reason='stop', synthesis completed successfully
If reason='error', an error occurred (client should handle gracefully)
The WebSocket connection will close after this event.