128

I am looking for some easy to install text to speech software for Ubuntu that sounds natural. I've installed Festival, Gespeaker, etc., but nothing sounds very natural. All very synthetic and hard to understand.

Any recommendations out there?

Jorge Castro
  • 73,717

17 Answers17

68

SVOX pico2wave

sudo apt install libttspico-utils

A very minimalistic TTS, a better sounding than espeak or mbrola (to my mind). Some information here.

I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. It's small, but sounds really good (natural). Without modification you'll hear a natural sounding female voice.

AND ... compared to Mbrola, it recognise Units and speaks it the right way!
For example:

  • 2°C → two degrees
  • 2m → two meters
  • 2kg → two kilograms

After installation I use it in a script:

#!/bin/bash
pico2wave -w=/tmp/test.wav "$1"
aplay /tmp/test.wav
rm /tmp/test.wav

Then run it with the desired text:

<scriptname>.sh "hello world"

or read the contents of an entire file:

<scriptname>.sh "$(cat <filename>)"

That's all to have a lightweight, stable working TTS on Ubuntu.

jkoop
  • 69
user85321
  • 1,425
36

Pico and espeak are fun and easy to get to work, but they're not all that good. The default Festival voices are also not that good. However, Festival is a scheme-based speech framework, where a number of researchers have built much better plug-in voices. You can easily surpass the pico2wave quality on stock Ubuntu, because one of those voices is available as a ready-made package.

To make Festival sound natural, here's what to do:

sudo apt-get install festival
sudo apt-get install festvox-us-slt-hts
festival -i
festival> (voice_cmu_us_slt_arctic_hts) 
festival> (SayText "Don't hate me, I'm just doing my job!")

You can do it from the command line by using -b (or --batch) and putting each command into single quotes:

festival -b '(voice_cmu_us_slt_arctic_hts)' \
    '(SayText "The temperature is 22 degrees centigrade and there is a slight breeze from the west.")'

You can get other quite good voices from the Nitech repository, but installing them is finicky, and the default paths changed so the file name references in the bundled scheme files may need to be manually edited to work on stock Ubuntu.

Jon Watte
  • 526
21

SpeakIt!

I believe Ive found the best TTS software for free using a Google Chrome extension called "SpeakIt". This only works in the Chrome browser for me on Ubuntu. It doesnt work with Chromium for some reason. SpeakIt comes with two female voices which both sound very realistic compared to everything else out there. There are at least four more male & female voices listed s Chrome extensions if you search the Chrome Web Store using "TTS" as your query.

Usage: For use on a website. you highlight the text you want to be read and either right click and "SpeakIt" or click the SpeakIt icon docked on the Chrome top bar.


Firefox users also have two options. Within Firefox addons, do a search for TTS and you should find "Click Speak" and also "Text to Voice". The voices are not as good as the Chrome SpeakIt voices, but are definitely usable.

The SpeakIt extension uses iSpeech technology and for a price of $20 a year, the site can convert text to MP3 audio files. You can input text, URLs, RSS feeds, as well as documents such as TXT, DOC, and PDF and output to MP3. You can make podcast, embed audio, etc. Here is a link, and a sample of their audio (don't know how long the link will last).

Pablo Bianchi
  • 17,371
14

Piper

A fast, local neural text to speech system. Check site project for installation, download of a voice and usage. For e.g.:

echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav

gTTS, Google Text-to-Speech

gTTS, a Python library and CLI tool to interface with Google Translate's text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation, or stdout.

Cons: CLI-only. Need to be online as it requires requesting to Google public open endpoint.

sudo -H pip install gTTS  # Install

Usage

gtts-cli 'hello' --output hello.mp3
gtts-cli -l es 'Nadie es patria, todos lo somos' | play -t mp3 -

Documentation and more examples

Others

Some were already mentioned

Pablo Bianchi
  • 17,371
14

Simple Google™ TTS

Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.


Because of the lack of a better alternative I wrote a bash script that interfaces with a perl script by Michal Fapso to provide TTS via Google Translate. From the project description:

The intention is to provide an easy to use interface to text-to-speech output via Google's speech synthesis system. A fallback option using pico2wave automatically provides TTS synthesis in case no Internet connection is found.

As it stands, the wrapper supports reading from standard input, plain text files and the X selection (highlighted text).

The main features are:

  • online TTS synthesis via Google translate
  • offline TTS synthesis via pico2wave
  • supports a variety of different languages
  • can read from CLI, text files and highlighted text
  • supports reading highlighted text with fixed formatting (e.g. PDF files)

Installation and usage are documented on the project page.

I'd be glad if you gave it a try. Bug reports and any other feedback are welcome!

Pablo Bianchi
  • 17,371
Glutanimate
  • 21,763
13

I have looked high and low for text to speech for Ubuntu that is high quality. There is none. My vocal cords are paralyzed so I needed TTS to add voice instructions to my Ubuntu videos. You can get commercial high quality Linux text to speech software here. It's just really expensive. I ended up buying Natural Reader for Windows (doesn't work in Ubuntu under Wine) for $40. Maybe later I will get the Linux one.

Pablo Bianchi
  • 17,371
8

I have been conducting research on the best sounding and easily tuned text to speech voices. Below is a listing of what I thought were the top 5 products in order of sound quality. Most of the websites associated with these product have an interactive demo that will allow for you to make your own determination.

  1. NeoSpeech
  2. iVona
  3. Acapela
  4. AT&T Natural voices
  5. CereProc Voices
Jim
  • 81
6

Combine SVOX tools (pico) with LibreOffice:

SVOX (pico) tools are easy to install and brings good quality voices in Ubuntu. Install it:

sudo apt-get install libttspico0 libttspico-utils libttspico-data

You can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY), you may want to experiment some of them.

Now you only have to select some text in LO Writer, Calc, Impress or Draw and clic on the icon added as a tool bar (a happy face with a ballon).

leoperbo
  • 763
5

I find Nitech HTS voices on festival very natural and comforting over any other voices I have heard. See this link on how to set up Nitech and other sounds with festival. I have not found a good gui which I can use to configure those voices but setting them via festival.scm still works. That post is very old and you might want to find the actual installation directory using "locate festival" command

razor
  • 398
4

Comparison table of offline free CLI software

I think what we need at this point is the big summary table:

Tool Sounds remotely natural Output to file Multilingual Tested on
pico2wave (libttspico-utils 1.0+git20130326-14) y. Some weird distortions, but reasonable. y -l fr-FR 24.04
idiap/coqui-ai-TTS 0.24.1 + Tacotron2 y. Output is randomly different each time. Most words are awesome. Punctuation timing is off. Sometimes it goes completely crazy and it is hilarious. --out_path tmp.wav 24.04
Speech Note 4.7.0 + Mimic3 Arctic Aew Low y grom GUI only y 24.10
Speech Note 4.7.0 + Piper Amy Low Female y from GUI only y 24.10
festival 2.5.0 + festvox-us-slt-hts 2010.10.25 y. Not amazing, but OK. Slight voice distortion and punctuation off. n --language english 24.04
spd-say (speech-dispatcher 0.12.0) n n -l fr 24.04
say (gnustep-gui-runtime 0.30.0) n n n 24.04
espeak 1.48.15 n --stdout > tmp.wav -v fr 24.04
festival 2.5.0 n n --language english 24.04
svox nanotts d8b91f3 n 24.04
espeak-ng 1.51 n 24.04
piper 24.04
toirtoise-tts 3.0.0 24.04

Empty cell means "unknown, untested".

My quick test strings are:

  • en: "Hello, my name is John Smith. What is your name?"
  • fr: "Bonjour, je m'appelle Jean Jacques. Tu t'appelles comment?"

"Remotely natural" is of course extremely subjective, and will suffer from the continual moving of AI goalposts as things evolve and we get used to better systems. For now, maybe I'd consider it something along "good enough for an informal video voiceover".

Piper

Previously mentioned at: https://askubuntu.com/a/1466489/52975

On Ubuntu 24.04 in a clean virtualenv running:

pip install piper-tts

fails with:

ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.

bug report: https://github.com/rhasspy/piper/issues/509

pico2wave

On Ubuntu 24.04:

sudo apt install libttspico-utils
pico2wave -w tmp.wav "Hello, my name is John Smith. What is your name?"
ffplay -autoexit tmp.wav

idiap/coqui-ai-TTS

https://github.com/idiap/coqui-ai-TTS

pipx install coqui-tts
tts --text "Hello, my name is John Smith. What is your name?" --pipe_out | aplay

The first time you call it it installs the necessary model automatically.

Sound takes 5-10 s to start coming out on each invocation, which is unacceptable for frequent short sentences.

The default model seems to be Tacotron2: https://github.com/NVIDIA/tacotron2 but you can select other models from CLI.

coqui-ai/TTS

Previously mentioned at: https://askubuntu.com/a/1447599/52975

Does not support python 3.12 (Ubuntu 24.04), pip install TTF fails. Report: https://github.com/coqui-ai/TTS/issues/3257 Collaborator: https://github.com/coqui-ai/TTS/issues/3257#issuecomment-2096792618 says instead use idiap/coqui-ai-TTS

Based on the README similarity it seems to be a fork of https://github.com/mozilla/TTS

festival + festvox-us-slt-hts

Mentioned at: https://askubuntu.com/a/908889/52975 tested on Ubuntu 24.04:

sudo apt install festvox-us-slt-hts
festival -b '(voice_cmu_us_slt_arctic_hts)' '(SayText "Hello, my name is John Smith. What is your name?")'

tortoise-tts

https://github.com/neonbjb/tortoise-tts

On Ubuntu 24.04:

virtualenv -p python3 .venv
. .venv/bin/activate
pip install tortoise-tts==3.0.0

fails with:

ERROR: Failed building wheel for tokenizers

Bug report: https://github.com/neonbjb/tortoise-tts/issues/728

Speech Note supports it and it worked there.

Mimic3

Previously mentioned at: https://askubuntu.com/a/1447599/52975

At https://github.com/MycroftAI/mimic3/issues/83#issuecomment-2740023510 a maintainer said it's not maintained anymore.

On Ubuntu 24.10 I tried:

sudo apt-get install libespeak-ng1
pipx install 'mycroft-mimic3-tts[all]'

but that failed with:

Fatal error from pip prevented installation. Full pip output in file:
    /home/ciro/.local/pipx/logs/cmd_2025-03-20_07.57.51_pip_errors.log

pip failed to build package: libwapiti

Some possibly relevant errors from pip install: error: subprocess-exited-with-error libwapiti/src/api.c:157:36: error: passing argument 4 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types] libwapiti/src/api.c:157:46: error: passing argument 6 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types] error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1 ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (libwapiti)

Error installing mycroft-mimic3-tts from spec 'mycroft-mimic3-tts[all]'.

Bug report: https://github.com/MycroftAI/mimic3/issues/83

Speech Note supports it and it worked there.

Speech Note

https://github.com/mkiol/dsnote

This project is front-end for a bunch of possible backend TTS and STT models on multiple languages. That is cool because with it you can quickly try several models on a given text to decide which one is better, without having to try to install a bunch of differently broken software systems. Trying:

flatpak install flathub net.mkiol.SpeechNote
flatpak run net.mkiol.SpeechNote

opens a GUI.

enter image description here

Then under:

  • Languages
  • English
  • Text to Speech

I can download a model. Just note that some of them require "voice samples" presumably to clone from, which might or might not be what you want.

f

Then you can type your text on the GUI an click the "Read" button to hear it.

And to save to a file:

  • File
  • Export to a file

Now for CLI-only attempt:

flatpak run net.mkiol.SpeechNote --print-available-models tts

lists models I've downloaded:

        en_coqui_fairseq_eng          "English (Coqui MMS) / en"
        en_piper_us_amy_low           "English (Piper Amy Low Female) / en"
        en_rhvoice_alan               "English (RHVoice Alan Male) / en"
        en_whisperspeech_q4_base_enpl "English (WhisperSpeech Base) / en"

but TODO also opens the GUI. And finally TODO CLI-only TTS? This is a starting point:

flatpak run net.mkiol.SpeechNote \
  --id en_coqui_fairseq_eng \
  --text 'Hello, my name is John Smith. What is your name?'

After changing a setting under:

  • Settings
  • Acessibility
  • Allow external applications to invoke actions

I can also make it speak with:

flatpak run net.mkiol.SpeechNote \
  --id en_coqui_fairseq_eng \
  --text 'Hello, my name is John Smith. What is your name?' \
  --action start-reading-text

but I don't know how to save to file from the CLI.

Related: https://github.com/mkiol/dsnote/issues/83

Tested on Speech Note 4.7.0, Ubuntu 24.10.

Others

No easy CLI instructions:

Bibliography:

4

Here is what I did to have pure natural speech for pdf and other text files(other solutions are not natural or they're just paid services). This is actually a work around using chromium or chrome but works fast and easy.

  1. Install SpeakIt! extension on your chrome or chromium.
  2. Install PDF Viewer if you're using chromium(chrome already has a pdf viewer for free) and check 'Allow in incognito' and 'Allow access to file URLs' options in extensions settings of chromium.
  3. Drag and drop your pdf to browser.
  4. Now highlight some text and right click and select SpeakIt! so you can listen to pure natural text-to-speech.

There's also ways to open other files like .doc and .txt in chrome and do the same. There's other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. Another extension called 'Speak text' works the same way and has natural speech.

3

When searching for a better tts engine to use with the new firefox 49 narrative mode I found pico tts (svox) - my favorite TTS engine.

sudo apt install espeak libttspico0 libttspico-data libttspico-utils

How to change the default speech synthesis engine system wide?

People at arch linux brought me to the right path:

Uncomment the module you like and make it default in speech-dispatcher settings:

# sudo vim /etc/speech-dispatcher/speechd.conf

[...]
# -----OUTPUT MODULES CONFIGURATION-----
# Each AddModule line loads an output module.
#AddModule "espeak"       "sd_espeak"   "espeak.conf"
AddModule "pico-generic"  "sd_generic"   "pico-generic.conf"

[...]
#DefaultModule espeak
DefaultModule pico-generic

Restart the daemon:

# sudo systemctl restart speech-dispatcher.service

BUT, when starting firefox again, nothing happens. According to the link above (arch forum post #10 and #16) works with festival (did not try), but the speech-dispatcher for pico does not list available voices. It won't run.

Any idea out there would be highly appreciated ;-)

Pablo Bianchi
  • 17,371
apos
  • 529
1

Verbify-TTS

Yes! I encounter the exact same problem you are describing myself. One year ago I created a custom TTS I am using myself since almost two years now, and I open sourced it. It works offline and for free, using AI-based high-quality voice. You can you it everywhere: Firefox browser, PDF reader, chrome, LibreOffice, etc. It supports both Ubuntu and windows.

Feel free to have a look, I just created a video tutorial with installation steps and DEMO: https://youtu.be/hb1ZVwUcPCU

Download link and Project page: https://github.com/MattePalte/Verbify-TTS

Feel free to leave comment/open issue to discuss new ideas, problems or constructive criticism.

Hoping it will help you.

1

My favorite text-to-speech program is called Magic English, but like Natural Reader mentioned by Joe Steiger, it is a Windows program and I'm not sure if it will run under Wine.

AT&T Natural Voices is available online as a demo, but that's more of a work-around than a solution...

1

For that I build Intelligent Speaker - extension for Google Chrome. It can read pages even without selection (when text detention is correct).

1

Simple Google™ TTS

Update from project page (2016): This project is currently unmaintained and will remain so for the foreseeable future.


Pico, mbrola, cmu, festival, flite, all SUCK in 2017 (They were amazing in the 90s). AT&T natural speech (which is fantastic) isn't linux compat and it's not free, therefore we use Google

git clone https://github.com/Glutanimate/simple-google-tts.git
sudo apt install xsel libnotify-bin libttspico0 libttspico-utils libttspico-data libwww-perl libwww-mechanize-perl libhtml-tree-perl so$
cd simple-google-tts
sudo ln -s `pwd`/simple_google_tts /usr/local/bin
simple_google_tts en "Text to speech is now installed"
cd -
Pablo Bianchi
  • 17,371
Jonathan
  • 3,984
0

In Linux systems, you can dump X selection (the text you have selected on your screen with the mouse) to a text file, then read with some TTS (currently I use Google Translate Python script gTTS):

#!/bin/bash
TXT="/tmp/speak.txt"

save X text selection to a file

xclip -out > $TXT

remove smiles

sed -i 's/ :[pP]/./' $TXT sed -i 's/ ://./' $TXT sed -i 's/ :D/./' $TXT sed -i 's/ ;D/./' $TXT sed -i 's/ :(/./' $TXT

Abbreviations:

sed -i 's/[^a-z]IPv6[^a-z]/I P version 6/gi' $TXT sed -i 's/[^a-z]MR[^a-z]/merge request/gi' $TXT sed -i 's/[^a-z]btw[^a-z]/by the way/gi' $TXT sed -i 's/[^a-z]WIP[^a-z]/work in progress/gi' $TXT sed -i 's/[^a-z]CLI[^a-z]/command line/gi' $TXT

Latin

sed -i 's/i.e./that is/gi' $TXT sed -i 's/e.g./for example/gi' $TXT

gtts-cli -f $TXT | play -t mp3 -

Bind this script to some key, for example, right menu key, and every time you select some text in any program: Firefox, Thunderbird, LibreOffice Write, PDF reader, or even Terminal, you will hear the text.

PS. you can also add --slow option to gtts-cli.

Pablo Bianchi
  • 17,371
asashnov
  • 99
  • 1
  • 3