Convert speech (mp3 audio files) to text

Question

I am looking for simple converter from mp3 to txt. I have tried, without success: julius, CMU Sphinx, ... In the past 4 hours I did not find a way how to use them (or properly install them).

What I am looking for is something like:

$ converterapp -infile myspeech.mp3 -outfile myspeech.txt

I am also fine with GUI application since I only have a few files to convert and can click around.

Edit: With the help of this answer Speech-recognition app to convert MP3 to text? I manged to get it working but it produces no output. Well, actually it produces a couple of blank lines (no words detected)...

score 14 · Answer 1 · answered Apr 30 '18 at 18:27

pocketsphinx will do speech to text from an existing audio file. Depending on the initial format of the mp3, you may need two separate commands.

First convert your existing audio file to the mandatory input format:

    ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav

The run pocketsphinx

    pocketsphinx_continuous -infile file.wav 2> pocketsphinx.log > myspeech.txt

the created file myspeech.txt will have what you're looking for.

In case you are new to ubuntu, you would need to install the above programs using this command:

    sudo apt install pocketsphinx pocketsphinx-en-us ffmpeg

MayeulC · Answer 2 · 2024-03-27T15:15:34.583

OpenAI's Whisper (link to press release) is a relatively new free and open-source alternative, with pretty good performance in multiple languages.

There are a few ways to install it, you can do so via pip, python's package manager: pip install -U openai-whisper

$ whisper audio.mp3 --model medium

A comment below points out that the use of a python "virtual environment" may be suggested. This is a way for python's pip to install software in a subdirectory, therefore not impacting the rest of your system:

$ # Creates a new environment called "newenv" (also creates a subfolder with the same name)
$ python -m venv newenv
$ # Activate the new environment by sourcing the bin/activate script from the new folder
$ source ./newenv/bin/activate
(newenv)$ # pip will now install modules in the venv, and python will use modules from there
(newenv)$ pip install -U openai-whisper
(newenv)$ whisper audio.mp3 --model medium
(newenv)$ deactivate  # exit the venv (once you are done)
$

score 2 · Answer 3 · answered Jan 05 '20 at 13:34

Mozilla SpeechDeep opensource speech-to-text tool will do. You will need to install the application on your linux desktop. Or you can try Transcribear a browser based speech-to-text tool that does not require installation, but you will need to be connected online to upload the recording to the server.

Convert speech (mp3 audio files) to text

3 Answers3