I am looking for some easy to install text to speech software for Ubuntu that sounds natural. I've installed Festival, Gespeaker, etc., but nothing sounds very natural. All very synthetic and hard to understand.
Any recommendations out there?
I am looking for some easy to install text to speech software for Ubuntu that sounds natural. I've installed Festival, Gespeaker, etc., but nothing sounds very natural. All very synthetic and hard to understand.
Any recommendations out there?
sudo apt install libttspico-utils
A very minimalistic TTS, a better sounding than espeak or mbrola (to my mind). Some information here.
I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. It's small, but sounds really good (natural). Without modification you'll hear a natural sounding female voice.
AND ... compared to Mbrola, it recognise Units and speaks it the right way!
For example:
After installation I use it in a script:
#!/bin/bash
pico2wave -w=/tmp/test.wav "$1"
aplay /tmp/test.wav
rm /tmp/test.wav
Then run it with the desired text:
<scriptname>.sh "hello world"
or read the contents of an entire file:
<scriptname>.sh "$(cat <filename>)"
That's all to have a lightweight, stable working TTS on Ubuntu.
Pico and espeak are fun and easy to get to work, but they're not all that good. The default Festival voices are also not that good. However, Festival is a scheme-based speech framework, where a number of researchers have built much better plug-in voices. You can easily surpass the pico2wave quality on stock Ubuntu, because one of those voices is available as a ready-made package.
To make Festival sound natural, here's what to do:
sudo apt-get install festival
sudo apt-get install festvox-us-slt-hts
festival -i
festival> (voice_cmu_us_slt_arctic_hts)
festival> (SayText "Don't hate me, I'm just doing my job!")
You can do it from the command line by using -b (or --batch) and putting each command into single quotes:
festival -b '(voice_cmu_us_slt_arctic_hts)' \
'(SayText "The temperature is 22 degrees centigrade and there is a slight breeze from the west.")'
You can get other quite good voices from the Nitech repository, but installing them is finicky, and the default paths changed so the file name references in the bundled scheme files may need to be manually edited to work on stock Ubuntu.
I believe Ive found the best TTS software for free using a Google Chrome extension called "SpeakIt". This only works in the Chrome browser for me on Ubuntu. It doesnt work with Chromium for some reason. SpeakIt comes with two female voices which both sound very realistic compared to everything else out there. There are at least four more male & female voices listed s Chrome extensions if you search the Chrome Web Store using "TTS" as your query.
Usage: For use on a website. you highlight the text you want to be read and either right click and "SpeakIt" or click the SpeakIt icon docked on the Chrome top bar.
Firefox users also have two options. Within Firefox addons, do a search for TTS and you should find "Click Speak" and also "Text to Voice". The voices are not as good as the Chrome SpeakIt voices, but are definitely usable.
The SpeakIt extension uses iSpeech technology and for a price of $20 a year, the site can convert text to MP3 audio files. You can input text, URLs, RSS feeds, as well as documents such as TXT, DOC, and PDF and output to MP3. You can make podcast, embed audio, etc. Here is a link, and a sample of their audio (don't know how long the link will last).
A fast, local neural text to speech system. Check site project for installation, download of a voice and usage. For e.g.:
echo 'Welcome to the world of speech synthesis!' | \
./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav
gTTS, a Python library and CLI tool to interface with Google Translate's text-to-speech API. Writes spoken
mp3data to a file, a file-like object (bytestring) for further audio manipulation, orstdout.
Cons: CLI-only. Need to be online as it requires requesting to Google public open endpoint.
sudo -H pip install gTTS # Install
Usage
gtts-cli 'hello' --output hello.mp3
gtts-cli -l es 'Nadie es patria, todos lo somos' | play -t mp3 -
Documentation and more examples
Some were already mentioned
Coqui.ia TTS. Installation:
pip install TTS
Mimic. Installation:
sudo apt-get install gcc make pkg-config automake libtool libasound2-dev
git clone https://github.com/MycroftAI/mimic.git # take a while
cd mimic
./dependencies.sh --prefix="/usr/local" # take a while
./autogen.sh
./configure --prefix="/usr/local"
make # take a while
make check
Mimic 3. Installation of the plugin:
sudo apt-get install libespeak-ng1 # Install system packages
mycroft-pip install --upgrade pip # Ensure that you're using the latest pip
mycroft-pip install mycroft-plugin-tts-mimic3[all] # Install plugin
mycroft-config set tts.module mimic3_tts_plug # Activate plugin
mycroft-start all # Start mycroft
eSpeak + Gespeaker (GUI) (Gespeaker source code)
Cons: Old and ugly
sudo apt install espeak gespeaker
Firefox
Chromium/Brave/Chrome
tacotron and mimic2, based on the Google paper
Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.
Because of the lack of a better alternative I wrote a bash script that interfaces with a perl script by Michal Fapso to provide TTS via Google Translate. From the project description:
The intention is to provide an easy to use interface to text-to-speech output via Google's speech synthesis system. A fallback option using pico2wave automatically provides TTS synthesis in case no Internet connection is found.
As it stands, the wrapper supports reading from standard input, plain text files and the X selection (highlighted text).
The main features are:
Installation and usage are documented on the project page.
I'd be glad if you gave it a try. Bug reports and any other feedback are welcome!
I have looked high and low for text to speech for Ubuntu that is high quality. There is none. My vocal cords are paralyzed so I needed TTS to add voice instructions to my Ubuntu videos. You can get commercial high quality Linux text to speech software here. It's just really expensive. I ended up buying Natural Reader for Windows (doesn't work in Ubuntu under Wine) for $40. Maybe later I will get the Linux one.
I have been conducting research on the best sounding and easily tuned text to speech voices. Below is a listing of what I thought were the top 5 products in order of sound quality. Most of the websites associated with these product have an interactive demo that will allow for you to make your own determination.
Combine SVOX tools (pico) with LibreOffice:
SVOX (pico) tools are easy to install and brings good quality voices in Ubuntu. Install it:
sudo apt-get install libttspico0 libttspico-utils libttspico-data
You can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:
Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY), you may want to experiment some of them.
Now you only have to select some text in LO Writer, Calc, Impress or Draw and clic on the icon added as a tool bar (a happy face with a ballon).
I find Nitech HTS voices on festival very natural and comforting over any other voices I have heard. See this link on how to set up Nitech and other sounds with festival. I have not found a good gui which I can use to configure those voices but setting them via festival.scm still works. That post is very old and you might want to find the actual installation directory using "locate festival" command
I think what we need at this point is the big summary table:
| Tool | Sounds remotely natural | Output to file | Multilingual | Tested on |
|---|---|---|---|---|
| pico2wave (libttspico-utils 1.0+git20130326-14) | y. Some weird distortions, but reasonable. | y | -l fr-FR |
24.04 |
| idiap/coqui-ai-TTS 0.24.1 + Tacotron2 | y. Output is randomly different each time. Most words are awesome. Punctuation timing is off. Sometimes it goes completely crazy and it is hilarious. | --out_path tmp.wav |
24.04 | |
| Speech Note 4.7.0 + Mimic3 Arctic Aew Low | y | grom GUI only | y | 24.10 |
| Speech Note 4.7.0 + Piper Amy Low Female | y | from GUI only | y | 24.10 |
| festival 2.5.0 + festvox-us-slt-hts 2010.10.25 | y. Not amazing, but OK. Slight voice distortion and punctuation off. | n | --language english |
24.04 |
| spd-say (speech-dispatcher 0.12.0) | n | n | -l fr |
24.04 |
| say (gnustep-gui-runtime 0.30.0) | n | n | n | 24.04 |
| espeak 1.48.15 | n | --stdout > tmp.wav |
-v fr |
24.04 |
| festival 2.5.0 | n | n | --language english |
24.04 |
| svox nanotts d8b91f3 | n | 24.04 | ||
| espeak-ng 1.51 | n | 24.04 | ||
| piper | 24.04 | |||
| toirtoise-tts 3.0.0 | 24.04 |
Empty cell means "unknown, untested".
My quick test strings are:
en: "Hello, my name is John Smith. What is your name?"fr: "Bonjour, je m'appelle Jean Jacques. Tu t'appelles comment?""Remotely natural" is of course extremely subjective, and will suffer from the continual moving of AI goalposts as things evolve and we get used to better systems. For now, maybe I'd consider it something along "good enough for an informal video voiceover".
Previously mentioned at: https://askubuntu.com/a/1466489/52975
On Ubuntu 24.04 in a clean virtualenv running:
pip install piper-tts
fails with:
ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.
bug report: https://github.com/rhasspy/piper/issues/509
On Ubuntu 24.04:
sudo apt install libttspico-utils
pico2wave -w tmp.wav "Hello, my name is John Smith. What is your name?"
ffplay -autoexit tmp.wav
https://github.com/idiap/coqui-ai-TTS
pipx install coqui-tts
tts --text "Hello, my name is John Smith. What is your name?" --pipe_out | aplay
The first time you call it it installs the necessary model automatically.
Sound takes 5-10 s to start coming out on each invocation, which is unacceptable for frequent short sentences.
The default model seems to be Tacotron2: https://github.com/NVIDIA/tacotron2 but you can select other models from CLI.
Previously mentioned at: https://askubuntu.com/a/1447599/52975
Does not support python 3.12 (Ubuntu 24.04), pip install TTF fails. Report: https://github.com/coqui-ai/TTS/issues/3257 Collaborator: https://github.com/coqui-ai/TTS/issues/3257#issuecomment-2096792618 says instead use idiap/coqui-ai-TTS
Based on the README similarity it seems to be a fork of https://github.com/mozilla/TTS
Mentioned at: https://askubuntu.com/a/908889/52975 tested on Ubuntu 24.04:
sudo apt install festvox-us-slt-hts
festival -b '(voice_cmu_us_slt_arctic_hts)' '(SayText "Hello, my name is John Smith. What is your name?")'
https://github.com/neonbjb/tortoise-tts
On Ubuntu 24.04:
virtualenv -p python3 .venv
. .venv/bin/activate
pip install tortoise-tts==3.0.0
fails with:
ERROR: Failed building wheel for tokenizers
Bug report: https://github.com/neonbjb/tortoise-tts/issues/728
Speech Note supports it and it worked there.
Previously mentioned at: https://askubuntu.com/a/1447599/52975
At https://github.com/MycroftAI/mimic3/issues/83#issuecomment-2740023510 a maintainer said it's not maintained anymore.
On Ubuntu 24.10 I tried:
sudo apt-get install libespeak-ng1
pipx install 'mycroft-mimic3-tts[all]'
but that failed with:
Fatal error from pip prevented installation. Full pip output in file:
/home/ciro/.local/pipx/logs/cmd_2025-03-20_07.57.51_pip_errors.log
pip failed to build package:
libwapiti
Some possibly relevant errors from pip install:
error: subprocess-exited-with-error
libwapiti/src/api.c:157:36: error: passing argument 4 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types]
libwapiti/src/api.c:157:46: error: passing argument 6 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types]
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (libwapiti)
Error installing mycroft-mimic3-tts from spec 'mycroft-mimic3-tts[all]'.
Bug report: https://github.com/MycroftAI/mimic3/issues/83
Speech Note supports it and it worked there.
https://github.com/mkiol/dsnote
This project is front-end for a bunch of possible backend TTS and STT models on multiple languages. That is cool because with it you can quickly try several models on a given text to decide which one is better, without having to try to install a bunch of differently broken software systems. Trying:
flatpak install flathub net.mkiol.SpeechNote
flatpak run net.mkiol.SpeechNote
opens a GUI.
Then under:
I can download a model. Just note that some of them require "voice samples" presumably to clone from, which might or might not be what you want.
Then you can type your text on the GUI an click the "Read" button to hear it.
And to save to a file:
Now for CLI-only attempt:
flatpak run net.mkiol.SpeechNote --print-available-models tts
lists models I've downloaded:
en_coqui_fairseq_eng "English (Coqui MMS) / en"
en_piper_us_amy_low "English (Piper Amy Low Female) / en"
en_rhvoice_alan "English (RHVoice Alan Male) / en"
en_whisperspeech_q4_base_enpl "English (WhisperSpeech Base) / en"
but TODO also opens the GUI. And finally TODO CLI-only TTS? This is a starting point:
flatpak run net.mkiol.SpeechNote \
--id en_coqui_fairseq_eng \
--text 'Hello, my name is John Smith. What is your name?'
After changing a setting under:
I can also make it speak with:
flatpak run net.mkiol.SpeechNote \
--id en_coqui_fairseq_eng \
--text 'Hello, my name is John Smith. What is your name?' \
--action start-reading-text
but I don't know how to save to file from the CLI.
Related: https://github.com/mkiol/dsnote/issues/83
Tested on Speech Note 4.7.0, Ubuntu 24.10.
No easy CLI instructions:
Bibliography:
Here is what I did to have pure natural speech for pdf and other text files(other solutions are not natural or they're just paid services). This is actually a work around using chromium or chrome but works fast and easy.
There's also ways to open other files like .doc and .txt in chrome and do the same. There's other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. Another extension called 'Speak text' works the same way and has natural speech.
When searching for a better tts engine to use with the new firefox 49 narrative mode I found pico tts (svox) - my favorite TTS engine.
sudo apt install espeak libttspico0 libttspico-data libttspico-utils
How to change the default speech synthesis engine system wide?
People at arch linux brought me to the right path:
Uncomment the module you like and make it default in speech-dispatcher settings:
# sudo vim /etc/speech-dispatcher/speechd.conf
[...]
# -----OUTPUT MODULES CONFIGURATION-----
# Each AddModule line loads an output module.
#AddModule "espeak" "sd_espeak" "espeak.conf"
AddModule "pico-generic" "sd_generic" "pico-generic.conf"
[...]
#DefaultModule espeak
DefaultModule pico-generic
Restart the daemon:
# sudo systemctl restart speech-dispatcher.service
BUT, when starting firefox again, nothing happens. According to the link above (arch forum post #10 and #16) works with festival (did not try), but the speech-dispatcher for pico does not list available voices. It won't run.
Any idea out there would be highly appreciated ;-)
Yes! I encounter the exact same problem you are describing myself. One year ago I created a custom TTS I am using myself since almost two years now, and I open sourced it. It works offline and for free, using AI-based high-quality voice. You can you it everywhere: Firefox browser, PDF reader, chrome, LibreOffice, etc. It supports both Ubuntu and windows.
Feel free to have a look, I just created a video tutorial with installation steps and DEMO: https://youtu.be/hb1ZVwUcPCU
Download link and Project page: https://github.com/MattePalte/Verbify-TTS
Feel free to leave comment/open issue to discuss new ideas, problems or constructive criticism.
Hoping it will help you.
My favorite text-to-speech program is called Magic English, but like Natural Reader mentioned by Joe Steiger, it is a Windows program and I'm not sure if it will run under Wine.
AT&T Natural Voices is available online as a demo, but that's more of a work-around than a solution...
For that I build Intelligent Speaker - extension for Google Chrome. It can read pages even without selection (when text detention is correct).
Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.
Pico, mbrola, cmu, festival, flite, all SUCK in 2017 (They were amazing in the 90s). AT&T natural speech (which is fantastic) isn't linux compat and it's not free, therefore we use Google
git clone https://github.com/Glutanimate/simple-google-tts.git
sudo apt install xsel libnotify-bin libttspico0 libttspico-utils libttspico-data libwww-perl libwww-mechanize-perl libhtml-tree-perl so$
cd simple-google-tts
sudo ln -s `pwd`/simple_google_tts /usr/local/bin
simple_google_tts en "Text to speech is now installed"
cd -
In Linux systems, you can dump X selection (the text you have selected on your screen with the mouse) to a text file, then read with some TTS (currently I use Google Translate Python script gTTS):
#!/bin/bash
TXT="/tmp/speak.txt"
save X text selection to a file
xclip -out > $TXT
remove smiles
sed -i 's/ :[pP]/./' $TXT
sed -i 's/ ://./' $TXT
sed -i 's/ :D/./' $TXT
sed -i 's/ ;D/./' $TXT
sed -i 's/ :(/./' $TXT
Abbreviations:
sed -i 's/[^a-z]IPv6[^a-z]/I P version 6/gi' $TXT
sed -i 's/[^a-z]MR[^a-z]/merge request/gi' $TXT
sed -i 's/[^a-z]btw[^a-z]/by the way/gi' $TXT
sed -i 's/[^a-z]WIP[^a-z]/work in progress/gi' $TXT
sed -i 's/[^a-z]CLI[^a-z]/command line/gi' $TXT
Latin
sed -i 's/i.e./that is/gi' $TXT
sed -i 's/e.g./for example/gi' $TXT
gtts-cli -f $TXT | play -t mp3 -
Bind this script to some key, for example, right menu key, and every time you select some text in any program: Firefox, Thunderbird, LibreOffice Write, PDF reader, or even Terminal, you will hear the text.
PS. you can also add --slow option to gtts-cli.