> ## Documentation Index
> Fetch the complete documentation index at: https://docs.myrouter.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Fish Audio Text-to-Speech

<Note>
  For best results, it is recommended to upload reference audio using the [Voice Cloning](/docs/models/reference-fish-audio-voice-cloning) API before using this API. This will improve voice quality and reduce latency.
</Note>

Fish Audio converts text to speech.

Supported audio formats:

* WAV / PCM
  * Sample rates: 8kHz, 16kHz, 24kHz, 32kHz, 44.1kHz
  * Default sample rate: 44.1kHz
  * 16-bit, mono

* MP3
  * Sample rates: 32kHz, 44.1kHz
  * Default sample rate: 44.1kHz
  * Mono
  * Bitrates: 64kbps, 128kbps (default), 192kbps

* Opus
  * Sample rate: 48kHz
  * Default sample rate: 48kHz
  * Mono
  * Bitrates: -1000 (auto), 24kbps, 32kbps (default), 48kbps, 64kbps

## Request Headers

<ParamField header="Content-Type" type="string" required={true}>
  Enum: `application/json`
</ParamField>

<ParamField header="Authorization" type="string" required={true}>
  Bearer authentication format: Bearer \{\{API Key}}.
</ParamField>

## Request Body

<ParamField body="text" type="string" required={true}>
  The text to be converted to speech.
</ParamField>

<ParamField body="temperature" type="number" default={0.9}>
  Controls the randomness of speech generation. Higher values (e.g., 1.0) make the output more random, lower values (e.g., 0.1) make it more deterministic. We recommend `0.9` for the `s1` model.

  Required range: `0 <= x <= 1`
</ParamField>

<ParamField body="top_p" type="number" default={0.9}>
  Controls diversity through nucleus sampling. Lower values (e.g., 0.1) make the output more focused, higher values (e.g., 1.0) allow more diversity. We recommend `0.9` for the `s1` model.

  Required range: `0 <= x <= 1`
</ParamField>

<ParamField body="references" type="ReferenceAudio · object[] | null">
  Reference audio for the voice. This requires MessagePack serialization, which will override reference\_voices and reference\_texts.

  <Expandable title="properties">
    <ParamField body="audio" type="file" required={true}>
      Reference audio file.
    </ParamField>

    <ParamField body="text" type="string" required={true}>
      Reference text corresponding to the audio.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField body="reference_id" type="string | null">
  Reference model ID for the voice.
</ParamField>

<ParamField body="prosody" type="ProsodyControl · object">
  Prosody control for the voice.

  <Expandable title="properties">
    <ParamField body="speed" type="number" default={1}>
      Voice speed control.
    </ParamField>

    <ParamField body="volume" type="number" default={0}>
      Voice volume control.
    </ParamField>
  </Expandable>
</ParamField>

<ParamField body="chunk_length" type="integer" default={200}>
  Chunk length for the voice.

  Required range: `100 <= x <= 300`
</ParamField>

<ParamField body="normalize" type="boolean" default={true}>
  Whether to normalize the voice. This will reduce latency but may decrease performance on numbers and dates.
</ParamField>

<ParamField body="format" type="enum<string>" default="mp3">
  Format for the voice.

  Possible values: `wav`, `pcm`, `mp3`, `opus`
</ParamField>

<ParamField body="sample_rate" type="integer | null">
  Sample rate for the voice.
</ParamField>

<ParamField body="mp3_bitrate" type="enum<integer>" default={128}>
  MP3 bitrate for the voice.

  Possible values: `64`, `128`, `192`
</ParamField>

<ParamField body="opus_bitrate" type="enum<integer>" default={32}>
  Opus bitrate for the voice.

  Possible values: `-1000`, `24`, `32`, `48`, `64`
</ParamField>

<ParamField body="latency" type="enum<string>" default="normal">
  Latency setting for the voice. balanced will reduce latency but may result in decreased performance.

  Possible values: `normal`, `balanced`
</ParamField>

## Response

The API will return an audio stream in the format specified by the `format` parameter (Default: mp3).
