Voicebox Logo

Voicebox

Open-source voice cloning with local TTS & dictation tools

Voicebox Overview

Voicebox is a powerful open-source AI voice studio designed for desktop users who prioritize privacy and local control. As a free alternative to cloud-based services like ElevenLabs, it allows you to clone voices, generate high-quality speech, and dictate text directly into any application. By running entirely on your machine using local GPU inference, Voicebox ensures your data never leaves your hardware while providing professional-grade audio tools for creators, developers, and writers.

Voicebox Key Features

  • Near-Perfect Voice Cloning: Create a digital twin of any voice using just three seconds of audio from a file upload, microphone recording, or system audio capture.
  • Multi-Engine TTS & Stories Editor: Utilize seven different TTS engines and a timeline-based editor to arrange tracks, trim clips, and mix multi-voice narratives seamlessly.
  • Local Dictation & Refinement: Hold a global shortcut to dictate into any app with Whisper-powered transcription, further polished by a local LLM to remove filler words and errors.
  • MCP Agent Integration: Give a voice to AI agents like Claude Code or Cursor using the Model Context Protocol (MCP), allowing them to speak to you in cloned voices.

Whether you are producing a podcast, developing accessible software, or building custom AI agents, Voicebox provides a robust, no-subscription solution for modern vocal synthesis.

Reviews

No Reviews Yet

Be the first to share your experience with this AI tool