> ## Documentation Index
> Fetch the complete documentation index at: https://hanabiaiinc-auto-go-api-docs.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Cloning

> Clone voices and create custom voice models with the Fish Audio Python SDK

export const AudioTranscript = ({voices = []}) => {
  const [selectedVoice, setSelectedVoice] = useState(0);
  const [isPlaying, setIsPlaying] = useState(false);
  const [currentTime, setCurrentTime] = useState(0);
  const [duration, setDuration] = useState(0);
  const [isDropdownOpen, setIsDropdownOpen] = useState(false);
  const audioRef = useRef(null);
  const dropdownRef = useRef(null);
  useEffect(() => {
    const audio = audioRef.current;
    if (!audio) return;
    const updateTime = () => setCurrentTime(audio.currentTime);
    const updateDuration = () => setDuration(audio.duration);
    const handleEnded = () => setIsPlaying(false);
    audio.addEventListener('timeupdate', updateTime);
    audio.addEventListener('loadedmetadata', updateDuration);
    audio.addEventListener('ended', handleEnded);
    return () => {
      audio.removeEventListener('timeupdate', updateTime);
      audio.removeEventListener('loadedmetadata', updateDuration);
      audio.removeEventListener('ended', handleEnded);
    };
  }, []);
  useEffect(() => {
    const handleClickOutside = event => {
      if (dropdownRef.current && !dropdownRef.current.contains(event.target)) {
        setIsDropdownOpen(false);
      }
    };
    if (isDropdownOpen) {
      document.addEventListener('mousedown', handleClickOutside);
    }
    return () => {
      document.removeEventListener('mousedown', handleClickOutside);
    };
  }, [isDropdownOpen]);
  useEffect(() => {
    if (audioRef.current) {
      audioRef.current.pause();
      audioRef.current.load();
      setIsPlaying(false);
      setCurrentTime(0);
    }
  }, [selectedVoice]);
  const togglePlay = () => {
    if (isPlaying) {
      audioRef.current.pause();
    } else {
      audioRef.current.play();
    }
    setIsPlaying(!isPlaying);
  };
  const handleProgressChange = e => {
    const newTime = parseFloat(e.target.value);
    audioRef.current.currentTime = newTime;
    setCurrentTime(newTime);
  };
  const formatTime = time => {
    if (isNaN(time)) return '0:00';
    const minutes = Math.floor(time / 60);
    const seconds = Math.floor(time % 60);
    return `${minutes}:${seconds.toString().padStart(2, '0')}`;
  };
  const currentVoice = voices[selectedVoice];
  return <div className="border rounded-lg bg-card border-gray-200 dark:border-gray-800">
      {}
      <div className="grid grid-cols-3 items-center px-3 py-1.5 bg-muted border-b border-gray-200 dark:border-gray-800">
        <span className="text-xs font-medium">Listen to Page</span>

        <span className="text-xs font-semibold text-muted-foreground text-center">Powered by Fish Audio S1</span>

        {voices.length > 1 ? <div className="relative justify-self-end" ref={dropdownRef}>
            <button onClick={() => setIsDropdownOpen(!isDropdownOpen)} className="flex items-center gap-1.5 px-3 py-1 rounded-full bg-muted hover:bg-gray-200 dark:hover:bg-gray-700 transition-all duration-200 cursor-pointer text-xs">
              <span className="text-muted-foreground">Voice:</span>
              <span className="font-medium">{voices[selectedVoice]?.name}</span>
              <svg className={`w-3 h-3 transition-transform duration-200 ${isDropdownOpen ? 'rotate-180' : ''}`} fill="none" stroke="currentColor" viewBox="0 0 24 24">
                <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 9l-7 7-7-7" />
              </svg>
            </button>

            {isDropdownOpen && <div className="absolute right-0 mt-1 w-auto bg-white dark:bg-black border border-gray-200 dark:border-gray-700 rounded-lg overflow-hidden z-50">
                {voices.map((voice, index) => <button key={index} onClick={() => {
    setSelectedVoice(index);
    setIsDropdownOpen(false);
  }} className={`w-full px-3 py-1.5 text-left text-xs hover:bg-gray-100 dark:hover:bg-gray-800 transition-colors flex items-center gap-2 ${index === selectedVoice ? 'bg-gray-100 dark:bg-gray-800 font-medium' : ''}`}>
                    {voice.id && <img src={`https://public-platform.r2.fish.audio/coverimage/${voice.id}`} alt={voice.name} className="w-5 h-5 rounded-full m-0 flex-shrink-0 object-cover" />}
                    <span className="flex-1 whitespace-nowrap">{voice.name}</span>
                  </button>)}
              </div>}
          </div> : <div className="justify-self-end" />}
      </div>

      {}
      <div className="px-3 py-1.5 bg-card">
        <audio ref={audioRef} src={currentVoice?.url} preload="metadata" />

        <div className="flex items-center gap-2">
          {}
          <button onClick={togglePlay} className="flex-shrink-0 w-6 h-6 flex items-center justify-center bg-gray-300 dark:bg-gray-600 text-gray-800 dark:text-gray-200 rounded-full hover:opacity-80 transition-opacity relative overflow-hidden" aria-label={isPlaying ? 'Pause' : 'Play'}>
            <div className="transition-transform duration-300 ease-in-out" style={{
    transform: isPlaying ? 'rotate(180deg)' : 'rotate(0deg)'
  }}>
              {isPlaying ? <svg className="w-3 h-3" fill="currentColor" viewBox="0 0 24 24">
                  <path d="M6 4h4v16H6V4zm8 0h4v16h-4V4z" />
                </svg> : <svg className="w-3 h-3 ml-0.5" fill="currentColor" viewBox="0 0 24 24">
                  <path d="M8 5v14l11-7z" />
                </svg>}
            </div>
          </button>

          {}
          <div className="flex-1 flex items-center gap-2">
            <span className="text-xs font-mono text-gray-500 dark:text-gray-400 min-w-[35px]">
              {formatTime(currentTime)}
            </span>

            <div className="flex-1 relative h-1 bg-gray-200 dark:bg-gray-700 rounded-full overflow-hidden">
              <div className="absolute top-0 left-0 h-full bg-gray-400 dark:bg-gray-500 transition-all duration-100" style={{
    width: `${duration ? currentTime / duration * 100 : 0}%`
  }} />
              <input type="range" min="0" max={duration || 0} value={currentTime} onChange={handleProgressChange} className="absolute top-0 left-0 w-full h-full opacity-0 cursor-pointer" />
            </div>
            <span className="text-xs font-mono text-gray-500 dark:text-gray-400 min-w-[35px]">
              {formatTime(duration)}
            </span>
          </div>
        </div>
      </div>
    </div>;
};

<AudioTranscript
  voices={[
{
  "id": "8ef4a238714b45718ce04243307c57a7",
  "name": "E-girl",
  "url": "https://pub-b995142090474379a930b856ab79b4d4.r2.dev/audio/python-voice-cloning/8ef4a238714b45718ce04243307c57a7.mp3"
},
{
  "id": "802e3bc2b27e49c2995d23ef70e6ac89",
  "name": "Energetic Male",
  "url": "https://pub-b995142090474379a930b856ab79b4d4.r2.dev/audio/python-voice-cloning/802e3bc2b27e49c2995d23ef70e6ac89.mp3"
},
{
  "id": "933563129e564b19a115bedd57b7406a",
  "name": "Sarah",
  "url": "https://pub-b995142090474379a930b856ab79b4d4.r2.dev/audio/python-voice-cloning/933563129e564b19a115bedd57b7406a.mp3"
},
{
  "id": "bf322df2096a46f18c579d0baa36f41d",
  "name": "Adrian",
  "url": "https://pub-b995142090474379a930b856ab79b4d4.r2.dev/audio/python-voice-cloning/bf322df2096a46f18c579d0baa36f41d.mp3"
},
{
  "id": "b347db033a6549378b48d00acb0d06cd",
  "name": "Selene",
  "url": "https://pub-b995142090474379a930b856ab79b4d4.r2.dev/audio/python-voice-cloning/b347db033a6549378b48d00acb0d06cd.mp3"
},
{
  "id": "536d3a5e000945adb7038665781a4aca",
  "name": "Ethan",
  "url": "https://pub-b995142090474379a930b856ab79b4d4.r2.dev/audio/python-voice-cloning/536d3a5e000945adb7038665781a4aca.mp3"
}
]}
/>

## Prerequisites

<AccordionGroup>
  <Accordion icon="user-plus" title="Create a Fish Audio account">
    Sign up for a free Fish Audio account to get started with our API.

    1. Go to [fish.audio/auth/signup](https://fish.audio/auth/signup)
    2. Fill in your details to create an account, complete steps to verify your account.
    3. Log in to your account and navigate to the [API section](https://fish.audio/app/api-keys)
  </Accordion>

  <Accordion icon="key" title="Get your API key">
    Once you have an account, you'll need an API key to authenticate your requests.

    1. Log in to your [Fish Audio Dashboard](https://fish.audio/app/api-keys/)
    2. Navigate to the API Keys section
    3. Click "Create New Key" and give it a descriptive name, set a expiration if desired
    4. Copy your key and store it securely

    <Warning>Keep your API key secret! Never commit it to version control or share it publicly.</Warning>
  </Accordion>
</AccordionGroup>

## Instant Voice Cloning

Clone a voice on-the-fly without creating a persistent model using [`ReferenceAudio`](/api-reference/sdk/python/types#referenceaudio-objects):

<CodeGroup>
  ```python Synchronous focus={6-15} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Clone from reference audio
  with open("reference_voice.wav", "rb") as f:
      audio = client.tts.convert(
          text="This will sound like the reference voice",
          references=[ReferenceAudio(
              audio=f.read(),
              text="Text spoken in the reference audio"
          )]
      )
  play(audio)
  ```

  ```python Asynchronous focus={8-17} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Clone from reference audio
      with open("reference_voice.wav", "rb") as f:
          audio = await client.tts.convert(
              text="This will sound like the reference voice",
              references=[ReferenceAudio(
                  audio=f.read(),
                  text="Text spoken in the reference audio"
              )]
          )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

<Note>
  Instant voice cloning is perfect for one-time use cases. For repeated use of the same voice, create a persistent voice model instead.
</Note>

## Multiple Reference Samples

Improve voice quality by providing multiple reference samples:

<CodeGroup>
  ```python Synchronous focus={6-21} theme={null}
  from fishaudio import FishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  client = FishAudio()

  # Load multiple reference samples
  references = []
  samples = [
      ("sample1.wav", "First sample transcript"),
      ("sample2.wav", "Second sample transcript"),
      ("sample3.wav", "Third sample transcript")
  ]

  for audio_file, transcript in samples:
      with open(audio_file, "rb") as f:
          references.append(ReferenceAudio(
              audio=f.read(),
              text=transcript
          ))

  # Generate with multiple references
  audio = client.tts.convert(
      text="This voice is trained on multiple samples",
      references=references
  )
  play(audio)
  ```

  ```python Asynchronous focus={8-23} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio
  from fishaudio.types import ReferenceAudio
  from fishaudio.utils import play

  async def main():
      client = AsyncFishAudio()

      # Load multiple reference samples
      references = []
      samples = [
          ("sample1.wav", "First sample transcript"),
          ("sample2.wav", "Second sample transcript"),
          ("sample3.wav", "Third sample transcript")
      ]

      for audio_file, transcript in samples:
          with open(audio_file, "rb") as f:
              references.append(ReferenceAudio(
                  audio=f.read(),
                  text=transcript
              ))

      # Generate with multiple references
      audio = await client.tts.convert(
          text="This voice is trained on multiple samples",
          references=references
      )
      play(audio)

  asyncio.run(main())
  ```
</CodeGroup>

## Creating Persistent Voice Models

Create a reusable voice model for consistent voice characteristics using [`voices.create()`](/api-reference/sdk/python/resources#create):

<CodeGroup>
  ```python Synchronous focus={5-20} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Prepare voice samples
  voice_samples = []
  with open("voice1.wav", "rb") as f1:
      voice_samples.append(f1.read())
  with open("voice2.wav", "rb") as f2:
      voice_samples.append(f2.read())

  # Create voice model
  voice = client.voices.create(
      title="My Custom Voice",
      voices=voice_samples,
      description="A custom voice for my project",
      tags=["custom", "english"],
      visibility="private"
  )

  print(f"Created voice: {voice.id}")
  ```

  ```python Asynchronous focus={7-22} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Prepare voice samples
      voice_samples = []
      with open("voice1.wav", "rb") as f1:
          voice_samples.append(f1.read())
      with open("voice2.wav", "rb") as f2:
          voice_samples.append(f2.read())

      # Create voice model
      voice = await client.voices.create(
          title="My Custom Voice",
          voices=voice_samples,
          description="A custom voice for my project",
          tags=["custom", "english"],
          visibility="private"
      )

      print(f"Created voice: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

### With Transcripts

Providing transcripts is faster and more accurate than automatic transcription. When you provide transcripts, the system skips running ASR (speech recognition), resulting in better performance and quality:

<CodeGroup>
  ```python Synchronous focus={5-27} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Voice samples with transcripts
  samples = [
      ("voice1.wav", "This is the first sample"),
      ("voice2.wav", "This is the second sample"),
      ("voice3.wav", "This is the third sample")
  ]

  voices = []
  texts = []

  for audio_file, transcript in samples:
      with open(audio_file, "rb") as f:
          voices.append(f.read())
      texts.append(transcript)

  # Create voice with transcripts
  voice = client.voices.create(
      title="High Quality Voice",
      voices=voices,
      texts=texts,
      description="Voice with accurate transcripts",
      enhance_audio_quality=True
  )

  print(f"Created voice: {voice.id}")
  ```

  ```python Asynchronous focus={7-29} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Voice samples with transcripts
      samples = [
          ("voice1.wav", "This is the first sample"),
          ("voice2.wav", "This is the second sample"),
          ("voice3.wav", "This is the third sample")
      ]

      voices = []
      texts = []

      for audio_file, transcript in samples:
          with open(audio_file, "rb") as f:
              voices.append(f.read())
          texts.append(transcript)

      # Create voice with transcripts
      voice = await client.voices.create(
          title="High Quality Voice",
          voices=voices,
          texts=texts,
          description="Voice with accurate transcripts",
          enhance_audio_quality=True
      )

      print(f"Created voice: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

### Audio Quality Enhancement

Enable automatic audio enhancement to clean up noisy reference audio:

```python theme={null}
voice = client.voices.create(
    title="Enhanced Voice",
    voices=voice_samples,
    enhance_audio_quality=True  # Clean up background noise and normalize levels
)
```

<Note>
  Audio enhancement helps process noisy or lower-quality reference audio. If your audio is already clean and well-recorded, this may not provide additional benefit.
</Note>

## Managing Voice Models

### List Voices

Discover available voices with filtering using [`voices.list()`](/api-reference/sdk/python/resources#list):

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # List all voices
  voices = client.voices.list(page_size=20)
  print(f"Total voices: {voices.total}")

  for voice in voices.items:
      print(f"{voice.title}: {voice.id}")
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # List all voices
      voices = await client.voices.list(page_size=20)
      print(f"Total voices: {voices.total}")

      for voice in voices.items:
          print(f"{voice.title}: {voice.id}")

  asyncio.run(main())
  ```
</CodeGroup>

### Filter by Tags and Language

<CodeGroup>
  ```python Synchronous focus={5-21} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Filter by tags
  male_voices = client.voices.list(
      tags=["male", "english"],
      page_size=10
  )

  # Filter by language
  chinese_voices = client.voices.list(
      language="zh",
      page_size=10
  )

  # Get only your own voices
  my_voices = client.voices.list(
      self_only=True,
      page_size=20
  )
  ```

  ```python Asynchronous focus={7-23} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Filter by tags
      male_voices = await client.voices.list(
          tags=["male", "english"],
          page_size=10
      )

      # Filter by language
      chinese_voices = await client.voices.list(
          language="zh",
          page_size=10
      )

      # Get only your own voices
      my_voices = await client.voices.list(
          self_only=True,
          page_size=20
      )

  asyncio.run(main())
  ```
</CodeGroup>

### Get Voice Details

Use [`voices.get()`](/api-reference/sdk/python/resources#get) to retrieve voice details:

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Get specific voice
  voice = client.voices.get("bf322df2096a46f18c579d0baa36f41d")  # Adrian

  print(f"Title: {voice.title}")
  print(f"Description: {voice.description}")
  print(f"Tags: {voice.tags}")
  print(f"Languages: {voice.languages}")
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Get specific voice
      voice = await client.voices.get("bf322df2096a46f18c579d0baa36f41d")  # Adrian

      print(f"Title: {voice.title}")
      print(f"Description: {voice.description}")
      print(f"Tags: {voice.tags}")
      print(f"Languages: {voice.languages}")

  asyncio.run(main())
  ```
</CodeGroup>

### Update Voice Metadata

Update voice information using [`voices.update()`](/api-reference/sdk/python/resources#update):

<CodeGroup>
  ```python Synchronous focus={5-11} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Update voice information
  client.voices.update(
      "bf322df2096a46f18c579d0baa36f41d",  # Adrian
      title="Updated Voice Name",
      description="Updated description",
      visibility="public",  # "public", "unlist", or "private"
      tags=["updated", "english", "male"]
  )
  ```

  ```python Asynchronous focus={7-13} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Update voice information
      await client.voices.update(
          "bf322df2096a46f18c579d0baa36f41d",  # Adrian
          title="Updated Voice Name",
          description="Updated description",
          visibility="public",  # "public", "unlist", or "private"
          tags=["updated", "english", "male"]
      )

  asyncio.run(main())
  ```
</CodeGroup>

### Delete Voice

Remove voice models using [`voices.delete()`](/api-reference/sdk/python/resources#delete):

<CodeGroup>
  ```python Synchronous focus={5-7} theme={null}
  from fishaudio import FishAudio

  client = FishAudio()

  # Delete a voice model
  client.voices.delete("bf322df2096a46f18c579d0baa36f41d")  # Adrian
  print("Voice deleted successfully")
  ```

  ```python Asynchronous focus={7-9} theme={null}
  import asyncio
  from fishaudio import AsyncFishAudio

  async def main():
      client = AsyncFishAudio()

      # Delete a voice model
      await client.voices.delete("bf322df2096a46f18c579d0baa36f41d")  # Adrian
      print("Voice deleted successfully")

  asyncio.run(main())
  ```
</CodeGroup>

<Warning>
  Deleting a voice is permanent and cannot be undone. Make sure you have backups of any important voice models.
</Warning>

## Next Steps

<CardGroup cols={2}>
  <Card title="Text-to-Speech" icon="microphone" href="/developer-guide/sdk-guide/python/text-to-speech">
    Use cloned voices for speech generation
  </Card>

  <Card title="WebSocket Streaming" icon="bolt" href="/developer-guide/sdk-guide/python/websocket">
    Stream audio with custom voices in real-time
  </Card>

  <Card title="Voices API Reference" icon="book" href="/api-reference/sdk/python/resources#voices">
    Complete voice management API documentation
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="/developer-guide/best-practices/">
    Production tips and optimization strategies
  </Card>
</CardGroup>

## Related Resources

* [Voice Types Reference](/api-reference/sdk/python/types#voices) - Voice model data structures
* [Audio Formats Guide](/developer-guide/core-features/text-to-speech#audio-formats) - Supported audio formats
* [Fine-grained Control](/developer-guide/core-features/fine-grained-control) - Advanced voice customization
