How to text-to-speech output using command-line?

Question

How to get speech output from entered text by using command-line?

Also facility to change speech rate, pitch, volume etc using simple command.

score 183 · Accepted Answer · edited Sep 15 '16 at 11:49

In order of descending popularity:

say converts text to audible speech using the GNUstep speech engine.
```
sudo apt-get install gnustep-gui-runtime
say "hello"
```

festival General multi-lingual speech synthesis system.

sudo apt-get install festival
echo "hello" | festival --tts

spd-say sends text-to-speech output request to speech-dispatcher
```
sudo apt-get install speech-dispatcher
spd-say "hello"
```
espeak is a multi-lingual software speech synthesizer.
```
sudo apt-get install espeak
espeak "hello"
```

score 21 · Answer 2 · answered Jan 16 '11 at 12:29

espeak is a nice little tool.

I just like playing around with it in a command line. You might find it conflicts with Pulseaudio so I'm using a long-winded version that negates having to set it up properly.

sudo apt-get install espeak
espeak --stdout "this is a test" | paplay

espeak --help will show you the options to calibrate reading speed, pitch, voice, etc.

When you're doing your notes, save them as a text file and then:

echo "these are my notes" > text.txt
espeak --stdout -f text.txt > text.wav
paplay text.wav # you should hear "these are my notes"

You can then play around with ffmeg et al to compress this down from PCM to something more manageable like MP3 or OGG. But that's a different story.

score 17 · Answer 3 · answered Jul 24 '14 at 07:05

From man spd-say:

NAME
       spd-say - send text-to-speech output request to speech-dispatcher

SYNOPSIS
       spd-say [options] "some text"

DESCRIPTION
       spd-say  sends text-to-speech output request to speech-dispatcher process which handles it and ideally outputs the result
       to the audio system.

OPTIONS
       -r, --rate
              Set the rate of the speech (between -100 and +100, default: 0)

       -p, --pitch
              Set the pitch of the speech (between -100 and +100, default: 0)

       -i, --volume
              Set the volume (intensity) of the speech (between -100 and +100, default: 0)

Hence you can get text-to-speech by following command:

spd-say "<type text>"

Ex:

spd-say "Welcome to Ubuntu Linux"

You can also set speech rate, pitch, volume etc. see man-page.

score 11 · Answer 4 · edited Aug 14 '20 at 13:24

Python Google Speech :

pip install google_speech
google_speech "Test the hello world"

Svox From Android :

apt-get install svox-pico
pico2wave --wave=test.wav "Test the hello world"
play test.wav

Svox Nanotts :

git clone https://github.com/gmn/nanotts.git
cd nanotts
make
./nanotts -v en-US "Test the hello world"

Linked resource: Comparison of speech synthesizers
Post source: Linuxhacks.org
Disclosure: I am the owner of Linuxhacks.org

score 6 · Answer 5 · answered Dec 12 '13 at 19:53

Mbrola doesn't work since 11.10.

SVOX (pico) tools are easy to install, easy to use and brings good quality voices in Ubuntu. Install it:

sudo apt-get install libttspico0 libttspico-utils libttspico-data

Even more easy, you can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY).

score 4 · Answer 6 · edited Oct 17 '20 at 03:01

SVOX pico2wave

That's what I use. And it sounds natural, it's easy to understand and it recognizes units (m, °C,kg, ...).

Here is my first post about pico2wave.

All you have to do is: Go to Ubuntu Software Center and search for "pico". You'll find 4 or 5 entries with "Small Footprint Ling...". Install them.

A possible use of pico2wave is described in my first posting (follow the link above).

Ciro Santilli OurBigBook.com · Answer 7 · 2025-03-20T12:17:44.797

Comparison table

I think what we need at this point is the big summary table, notably looking out for any tool that sounds remotely natural given our 2024-ongoing "deep learning revolution" (the problem now with this Cambrian Explosion is that the packages break every week and only work on certain systems).

Tool	Sounds remotely natural	Output to file	Multilingual	Tested on
pico2wave (libttspico-utils 1.0+git20130326-14)	y. Some weird distortions, but reasonable.	y	`-l fr-FR`	24.04
idiap/coqui-ai-TTS 0.24.1 + Tacotron2	y. Output is randomly different each time. Most words are awesome. Punctuation timing is off. Sometimes it goes completely crazy and it is hilarious.	`--out_path tmp.wav`		24.04
Speech Note 4.7.0 + Mimic3 Arctic Aew Low	y	grom GUI only	y	24.10
Speech Note 4.7.0 + Piper Amy Low Female	y	from GUI only	y	24.10
festival 2.5.0 + festvox-us-slt-hts 2010.10.25	y. Not amazing, but OK. Slight voice distortion and punctuation off.	n	`--language english`	24.04
spd-say (speech-dispatcher 0.12.0)	n	n	`-l fr`	24.04
say (gnustep-gui-runtime 0.30.0)	n	n	n	24.04
espeak 1.48.15	n	`--stdout > tmp.wav`	`-v fr`	24.04
festival 2.5.0	n	n	`--language english`	24.04
svox nanotts d8b91f3	n			24.04
espeak-ng 1.51	n			24.04
piper				24.04
toirtoise-tts 3.0.0				24.04

Empty cell means "unknown, untested".

My quick test strings are:

en: "Hello, my name is John Smith. What is your name?"
fr: "Bonjour, je m'appelle Jean Jacques. Tu t'appelles comment?"

"Remotely natural" is of course extremely subjective, and will suffer from the continual moving of AI goalposts as things evolve and we get used to better systems. For now, maybe I'd consider it something along "good enough for an informal video voiceover".

Piper

Previously mentioned at: https://askubuntu.com/a/1466489/52975

At https://github.com/MycroftAI/mimic3/issues/83#issuecomment-2740023510 a maintainer said it's not maintained anymore.

On Ubuntu 24.04 in a clean virtualenv running:

pip install piper-tts

fails with:

ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies.

bug report: https://github.com/rhasspy/piper/issues/509

pico2wave

On Ubuntu 24.04:

sudo apt install libttspico-utils
pico2wave -w tmp.wav "Hello, my name is John Smith. What is your name?"
ffplay -autoexit tmp.wav

idiap/coqui-ai-TTS

https://github.com/idiap/coqui-ai-TTS

pipx install coqui-tts
tts --text "Hello, my name is John Smith. What is your name?" --pipe_out | aplay

The first time you call it it installs the necessary model automatically.

Sound takes 5-10 s to start coming out on each invocation, which is unacceptable for frequent short sentences.

The default model seems to be Tacotron2: https://github.com/NVIDIA/tacotron2 but you can select other models from CLI.

coqui-ai/TTS

Previously mentioned at: https://askubuntu.com/a/1447599/52975

Does not support python 3.12 (Ubuntu 24.04), pip install TTF fails. Report: https://github.com/coqui-ai/TTS/issues/3257 Collaborator: https://github.com/coqui-ai/TTS/issues/3257#issuecomment-2096792618 says instead use idiap/coqui-ai-TTS

Based on the README similarity it seems to be a fork of https://github.com/mozilla/TTS

festival + festvox-us-slt-hts

Mentioned at: https://askubuntu.com/a/908889/52975 tested on Ubuntu 24.04:

sudo apt install festvox-us-slt-hts
festival -b '(voice_cmu_us_slt_arctic_hts)' '(SayText "Hello, my name is John Smith. What is your name?")'

tortoise-tts

https://github.com/neonbjb/tortoise-tts

On Ubuntu 24.04:

virtualenv -p python3 .venv
. .venv/bin/activate
pip install tortoise-tts==3.0.0

fails with:

ERROR: Failed building wheel for tokenizers

Bug report: https://github.com/neonbjb/tortoise-tts/issues/728

Speech Note supports it and it worked there.

Mimic3

Previously mentioned at: https://askubuntu.com/a/1447599/52975

On Ubuntu 24.10 I tried:

sudo apt-get install libespeak-ng1
pipx install 'mycroft-mimic3-tts[all]'

but that failed with:

Fatal error from pip prevented installation. Full pip output in file:
    /home/ciro/.local/pipx/logs/cmd_2025-03-20_07.57.51_pip_errors.log
pip failed to build package:
    libwapiti
Some possibly relevant errors from pip install:
    error: subprocess-exited-with-error
    libwapiti/src/api.c:157:36: error: passing argument 4 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types]
    libwapiti/src/api.c:157:46: error: passing argument 6 of ‘tag_nbviterbi’ from incompatible pointer type [-Wincompatible-pointer-types]
    error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
    ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (libwapiti)
Error installing mycroft-mimic3-tts from spec 'mycroft-mimic3-tts[all]'.

Bug report: https://github.com/MycroftAI/mimic3/issues/83

Speech Note supports it and it worked there.

Speech Note

https://github.com/mkiol/dsnote

This project is front-end for a bunch of possible backend TTS and STT models on multiple languages. That is cool because with it you can quickly try several models on a given text to decide which one is better, without having to try to install a bunch of differently broken software systems. Trying:

flatpak install flathub net.mkiol.SpeechNote
flatpak run net.mkiol.SpeechNote

opens a GUI.

Then under:

Languages
English
Text to Speech

I can download a model. Just note that some of them require "voice samples" presumably to clone from, which might or might not be what you want.

Then you can type your text on the GUI an click the "Read" button to hear it.

And to save to a file:

File
Export to a file

Now for CLI-only attempt:

flatpak run net.mkiol.SpeechNote --print-available-models tts

lists models I've downloaded:

        en_coqui_fairseq_eng          "English (Coqui MMS) / en"
        en_piper_us_amy_low           "English (Piper Amy Low Female) / en"
        en_rhvoice_alan               "English (RHVoice Alan Male) / en"
        en_whisperspeech_q4_base_enpl "English (WhisperSpeech Base) / en"

but TODO also opens the GUI. And finally TODO CLI-only TTS? This is a starting point:

flatpak run net.mkiol.SpeechNote \
  --id en_coqui_fairseq_eng \
  --text 'Hello, my name is John Smith. What is your name?'

After changing a setting under:

Settings
Acessibility
Allow external applications to invoke actions

I can also make it speak with:

flatpak run net.mkiol.SpeechNote \
  --id en_coqui_fairseq_eng \
  --text 'Hello, my name is John Smith. What is your name?' \
  --action start-reading-text

but I don't know how to save to file from the CLI.

Related: https://github.com/mkiol/dsnote/issues/83

Tested on Speech Note 4.7.0, Ubuntu 24.10.

Others

No easy CLI instructions:

Bibliography:

score 3 · Answer 8 · answered Jan 16 '11 at 15:03

3

And yet another espeak gui: gespeaker. It uses both espeak and mbrola engines. Also, it has more options than espeak-gui.

answered Jan 16 '11 at 15:03

luri

4,132

Peter.O · Answer 9 · 2011-03-25T13:46:52.423

The following is not a FLOSS solution, but you may find it worthwhile. (it is a wine solution),

I'm personally very keen on TTS, I use it quite often... eg. listening to a rambling discourse which I would never bother to stick with otherise (because I need to get another cup of coffee... :)

A few things I've discovered along the way.. or should I say, things I haven't discovered along the way... To put it bluntly: Every piece of FOSS TTS voice software I've tried is under par and therefore unsuitable for any semi-protracted listening...

I currently use ATnT's NaturalVoices. It is only available for Windows (maybe the Mac), but it does run under wine in Ubuntu .. (it has minor glytch, where I sometimes need to click on the panel when I move away from the reader... It is a minor issue when compared to the advantage gained by quality of speech from NatualVoices.

Some other things I've found to be virtually essential for a half-sensible listening experience, are;...

These TTS progamas are not intelligent (well maybe as intelligent as a young baboon) .. so they need every bit of help they can get. and there is one (and only one Reader program I've found which helps greatly in this.. The app is called ReadPlease (2003 Pro)... It allowd you to specially modify words and groups of word to be pronounced as you want them... It is by no means perfect, but for me, it made the difference between the entire process being usable and not usable...
The speech in Natural Voices is "okay", but it is a bit boring. There are other good products too, but they are all for Windows, unfortunately)..
It infeclts surprisingl well sometimes .. but OMG, initially it is a pain! .. so #2 is *patience... and lots of updating of your "special words" list ... By patience, I mean you(I) actually became accustomed to my particular baboon's speech patterns :)... and by the way, I currently have about 3000 words that now sound "Human" enough that I no longer cringe when I hear them.

3.. "Follow the Bouncing Ball" ... Again because the voice is never as good as a real speaker, things sometimes need to be clarified .. . The Reader program I use has one feature for which I even put up with its clunky looking interface.... Is has a "select the currently being read" word option.. Many readers have this, but ReadPlease keeps the current line bang on center of the screen .. This is invaluable to be able to see ahead and behind to quickly re-read what you just missed (so auto-centering the curent line is good)...

Well that's my experience.. I'm going to make a coffee now, and while I'm doing it, I'll be listening to this, to see how it "reads".... TTS is surprisingl good for picking up typos (I make lots of typos)...

If something as good as ATnT NaturalVoices turns up on the Ubuntu repository, I'll jump at it.

Here is a link to some samples of Natural Voices: I use "MIke"

score 3 · Answer 10 · edited Oct 17 '20 at 02:36

For festival (the voice seems more natural to me):

sudo apt-get install festival
echo "hello" | festival --tts

Pitch and speed configuration:

create ~/.festivalrc with the following content:

(Parameter.set 'Audio_Command "play -b 16 -c 1 -e signed-integer -r $SR -t raw $FILE tempo 1.5 pitch -100")
(Parameter.set 'Audio_Method 'Audio_Command)

See also http://www.solomonson.com/content/ubuntu-linux-text-speech

Update: tried on another Ubuntu computer. Had to install English speech engine package to work with festival properly:

sudo apt-get install festvox-kallpc16k

Also play is a cli command which comes with the sox package:

sudo apt-get install sox

score 2 · Answer 11 · edited Jun 12 '20 at 14:37

Meet espeak-ng - A multi-lingual software speech synthesizer:

espeak-ng "text to read"
espeak-ng -f "~/file to read"

It uses a default English voice, but there are numerous other voices for other languages and even dialects available and can be listed with espeak-ng --voices (for all) or e.g. espeak-ng --voices=en (for English). They can be set with -v together with either the language abbreviation or the file name, e.g. for Scottish or Swahili:

espeak-ng -v en-gb-scotland "text to read" # language name
espeak-ng -v bnt/sw "text to read" # file name: “bnt” for Bantu, “sw” for Swahili

There are many other options available, e.g. -s for the speed and -w to write the output to a wave file, see the manpage linked below.

Piper

A fast, local neural text to speech system. Check site project for installation, download of a voice and usage. For e.g.:

echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model blizzard_lessac-medium.onnx --output_file welcome.wav

score 0 · Answer 16 · edited Jul 26 '21 at 22:57

0

Balabolka under Wine works fine (for me) with SAPI4 voices (SAPI5 voices are not detected on my Linux system). It can open files and start reading.

Here is link to wine's AppDB entry for Balabolka.

edited Jul 26 '21 at 22:57

Pablo Bianchi

17,371

answered Jan 04 '17 at 05:01

Hemantkumar Garach

1

How to text-to-speech output using command-line?

16 Answers16

SVOX pico2wave

Comparison table

Piper

pico2wave

idiap/coqui-ai-TTS

coqui-ai/TTS

festival + festvox-us-slt-hts

tortoise-tts

Mimic3

Speech Note

Others

Further reading

Piper

Linked

Related