24

I have a video that I want to create subtitles for. Is there a program that can perform rudimentary speech-to-text in order to

  1. Set the correct start/stop of each individual subtitle
  2. Create rudimentary text subtitles (using some sort of speech-to-text)

I know about gnome-subtitles. However, it requires extensive effort to create those subtitles manually. You need to select yourself the start and stop for each sentence.

YouTube has the above features (creates rudimentary text subtitles at the correct timings, using speech-to-text). However, I would rather not upload the videos to YouTube just to get my subtitles. Is it possible to do the subtitles efficiently on Ubuntu?

Update: I plan to use the .srt subtitles only, and do not need to hard code them on the videos. My biggest requirement is to have the program automatically find the start/stop for each sentence, so that I write the text in it.

Update #2: There is Speech-to-Text software for Linux, with the CMU Sphinx package. It is possible to use CMU Sphinx with a subtitle program according to this post. In addition, one subtitle tool is aware of this CMU Sphinx feature (web based tool), however there is no reference in the latest source code that they added CMU Sphinx. The quest continues to find a program that uses CMU Sphinx for rudimentary speech to text (which would set the correct timings as well), as YouTube already does.

Pablo Bianchi
  • 17,371
user4124
  • 9,241

10 Answers10

11

You have several alternatives:

YouTube

For the ones who do accept having to temporarily upload the video to YouTube (is mandatory to select video language) to get its subtitle (close caption, lyrics): Is possible to extract/download it with youtube-dl or yt-dlp:

yt-dlp --write-auto-sub \  # Write automatically generated subtitle file (YouTube only)
  --write-sub \                # Write subtitle file
  --sub-lang en,de,es \        # Languages of the subtitles to download (optional) separated by commas, use --list- subs for available language tags
  --convert-subs srt \         # Convert the subtitles to other format (currently supported: srt|ass|vtt|lrc)
  -o "~/%(uploader)s/%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s" \  # OUTPUT TEMPLATE
  --skip-download \            # Do not download the video
  --ignore-errors vidURLorID   # Continue on download errors, for example to skip unavailable videos in a playlist

In one line and simplified:

yt-dlp --write-auto-sub --write-sub --sub-lang en --convert-subs srt --skip-download vidURLorID

If the conversion didn't work, convert it with FFmpeg:

ffmpeg -i myTitle.en.vtt output.srt

To convert from srt to txt:

sed -r -e 's/^\xef\xbb\xbf//' -e 's/\r//' -e 's/^[0-9]*$//' -e '/^[0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3} --> [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}$/d' -e 's/^\s*$//' -e '/^$/d;s/<[^>]*>//g' output.srt | uniq > output.txt

Related answer.

Whisper (OpenAI)

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

You can get some tools based on Whisper from this awesome list.

Live Captions

Live Captions is an application that provides live captioning for the Linux desktop.

Only the English language is supported currently. Other languages may produce gibberish or a bad phonetic translation.

On Flathub.

Kdenlive

Kdenlive have an Automatic Subtitling/Speech to text feature (optionally using Whisper).

Pablo Bianchi
  • 17,371
4

UPDATE:

autosub is no longer mantained. Another fork with GUI called pyTranscriber can be used.


You can use this Command-line utility

Autosub is a utility for automatic speech recognition and subtitle generation. It takes a video or an audio file as input, performs voice activity detection to find speech regions, makes parallel requests to Google Web Speech API to generate transcriptions for those regions, (optionally) translates them to a different language, and finally saves the resulting subtitles to disk.

https://github.com/agermanidis/autosub/

Python3 users, do this:

pip install git+https://github.com/BingLingGroup/autosub.git@alpha

Make sure you have ffmpeg installed.

thyu
  • 477
3

I personally like Gnome Subtitles it is available in the repositories.

sudo apt-get install gnome-subtitles
Marlinc
  • 801
3

I used Aegisub on Windows some years ago, and was really happy with it. Apparently it is available for Linux. It is pretty self explaining.

Aegisub only creates the subtitles file, e.g an .srt file. To combine the video and the subtitle to create a hard-coded subtitle you still need to use a second program.
On Windows I used VirtualDub, but it is not available for Linux. You can use VLC to do this on Linux:

Create your subs in Aegisub, saving it as usual as a .ass file.

Use VLC to add that subtitle track to your video. Subtitle -> Add subtitle file...

Configure the subtitle display style and settings so they display to your liking. Tools -> Preferences -> Subtitles/OSD

You can now watch the video to make sure the subs are displaying as you intended. For example I can check certain subs that I've specified in Aegisub to be displayed at the top of the screen rather than the bottom.

The output will be identical to how it looks now, so make sure all is good.

  1. Go to Media -> Convert/Save... (Ctrl + R).

  2. Under File Selection, add your video file. Tick "Use a subtitle file" and browse to your .ass sub file.

  3. Click the down arrow on the Convert/Save button and click Convert...(Alt + O).

  4. Under Settings, ensure the Convert option is ticked. Tick the Display the output option. Subs aren't added for some reason unless you tick this.

  5. Edit the profile so the video and audio settings are what you want. Under the subtitle tab, tick the Subtitles box, and use DVB subtitle codec. Make sure you tick 'Overlay subtitles on the video'. Press save.

  6. Enter a destination folder and filename in the Destination box.

  7. Press start.

Wait for it to be done, and that's it. The caveat with this method is that the encoding will happen in real-time with the video, so if you have a 2 hour video, it will take 2 hours. This is due to ticking the 'Display the output' box. But for some reason it only works when you tick this.

There are also other subtitle-editors.

Update:
I don't remember Aegisub having a functionality to automatically set beginning and end of a spoken sentence in the subtitles file. And I don't see a mention of such a function anywhere on the site. It is however with (key-combinations) pretty easy to set those times manually.

Is there even any program which has such a function (in any OS)?

Pit
  • 1,234
3

I did not find a way to get the subtitle program to automatically add rudimentary subtitles, by analysing the voices in the video.

Therefore, the alternative that I use is

  1. Upload the video to Youtube (for example, privately) and use the in-build facility to create automatically rudimentary subtitles.

Then,

  1. Add the video to http://www.universalsubtitles.org/ and create manually the timeframes for each sentence, if the automated way in Youtube did not work, or sentences are mising.
  2. Use GNOME Subtitles (found in the Software Center) in order to clean up the subtitles and fix any timings.
user4124
  • 9,241
1

Inside Kdenlive video editor, in the top bar > project > subtitles > "Speech recognition" . You must first download the language pack from https://alphacephei.com/vosk/models , in kdenlive go to Settings > configure kdenlive > "Speech To Text".

0

Speech note is worth mentioning here as it is free and can transcribe a vast number of languages. It comes as a flatpack so it is trivial to install. At the moment, you would have to use a tool like mpeg to extract the audio and feed it, but you can be the first to contribute to the completely open source code in github.... ;)

TLDR; soon this will hopefully be an option, but we are not completely there yet.

ntg
  • 571
0

OpenAI Whisper (fully offline and MIT licensed)

This software was previously mentioned at: https://askubuntu.com/a/1378514/52975 but I wanted to provide a minimal runnable example.

Tested on Ubuntu 24.04:

sudo apt install ffmpeg
pipx install openai-whisper==20231117

Sample usage with this video: https://commons.wikimedia.org/wiki/File:Goldstone_Apple_Valley_Radio_Telescope_(GAVRT)_Solar_Patrol_(SVS14530).webm

wget -O gavrt.webm https://upload.wikimedia.org/wikipedia/commons/4/45/Goldstone_Apple_Valley_Radio_Telescope_%28GAVRT%29_Solar_Patrol_%28SVS14530%29.webm--2024-08-09
time whisper gavrt.webm

The video is 1:16 long and features a lady speaking in perfect American English about a technical subject. There is light background music throughout.

The terminal now contains:

/home/ciro/.local/pipx/venvs/openai-whisper/lib/python3.12/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead                             
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")                                                                                                                           
Detecting language using up to the first 30 seconds. Use `--language` to specify the language                                                                                                 
Detected language: English

[... transcript ...]

real 0m32.569s
user 3m40.130s
sys 0m8.885s

and cwd now has among others, a file gavrt.srt:

1
00:00:00,000 --> 00:00:05,080
The Gavart Solar Patrol program is a heliophysics program aimed at citizen

2 00:00:05,080 --> 00:00:09,120 scientists and K through 12 students both locally, nationally and throughout the

3 00:00:09,120 --> 00:00:14,480 world. The goal of Gavart Solar Patrol is to monitor active regions on the Sun in

4 00:00:14,480 --> 00:00:18,120 order to understand how they're connected to explosive events that we

5 00:00:18,120 --> 00:00:23,100 categorize under space weather. Participants can remote in and actually

6 00:00:23,100 --> 00:00:26,480 control the telescope themselves. So a common observing mode with the Gavart

7 00:00:26,480 --> 00:00:30,400 Solar Patrol is that we'll have classrooms actually operate the

8 00:00:30,400 --> 00:00:34,760 telescope themselves, collect some data and then generate maps of what the Sun

9 00:00:34,760 --> 00:00:39,560 looks like at radio frequencies. They're gaining a really unique experience that I

10 00:00:39,560 --> 00:00:43,960 think is really special to the Gavart program and that's the ability to walk

11 00:00:43,960 --> 00:00:48,280 through the scientific process from the very beginning, from the steps of

12 00:00:48,280 --> 00:00:52,120 collecting the data themselves, all the way to reducing that data and

13 00:00:52,120 --> 00:00:56,880 interpreting scientific results from their studies. I get excited anytime I

14 00:00:56,880 --> 00:01:00,040 get to operate a radio telescope and so I really enjoy it when other people get

15 00:01:00,040 --> 00:01:04,800 to have that same opportunity and that same learning process.

Amazing! The transcription was perfect or almost perfect! And the installation/usage seamless!

Benchmarked on a .

vosk-transcriber

This is a convenient CLI for Vosk, tested on Ubuntu 24.04, you can install with the English model as:

pipx install vosk==0.3.45
mkdir -p ~/var/lib/vosk
cd ~/var/lib/vosk
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
cd -

and then use as:

time vosk-transcriber -m ~/var/lib/vosk/vosk-model-en-us-0.22 -i gavrt.webm -o gavrt.srt -t srt

it took:

real    0m26.538s                                                                              
user    0m22.677s                                                                              
sys     0m5.617s 

and gavrt.srt contains:

1
00:00:00,690 --> 00:00:03,030
the gabbert solar patrol program is a

2 00:00:03,030 --> 00:00:05,760 helium physics program aimed at citizen scientists

3 00:00:05,760 --> 00:00:07,620 and keep you told students both locally

4 00:00:07,620 --> 00:00:10,260 nationally and throughout the world the goal

5 00:00:10,260 --> 00:00:12,960 of gabbert solar control is to monitor

6 00:00:12,990 --> 00:00:14,850 active regions on the sun in order

7 00:00:14,850 --> 00:00:17,610 to understand how they're connected to explosive

8 00:00:17,610 --> 00:00:20,160 events that we categorize under space weather

9 00:00:20,610 --> 00:00:23,400 participants can remote in and actually control

10 00:00:23,400 --> 00:00:25,680 the telescope themselves so a common observing

11 00:00:25,680 --> 00:00:27,870 mode with the gabbert solo patrol is

12 00:00:27,870 --> 00:00:30,420 that we'll have classrooms actually operate the

13 00:00:30,420 --> 00:00:33,330 telescope themselves collect some data and then

14 00:00:33,360 --> 00:00:35,010 generate maps of what the sun looks

15 00:00:35,010 --> 00:00:37,920 like at radio frequencies they're gaining a

16 00:00:37,920 --> 00:00:39,900 really unique experience that i think is

17 00:00:39,990 --> 00:00:40,320 is real

18 00:00:40,320 --> 00:00:42,390 early special to the gabbert program and

19 00:00:42,390 --> 00:00:44,370 that's the ability to walk through the

20 00:00:44,370 --> 00:00:47,430 scientific process from the very beginning from

21 00:00:47,430 --> 00:00:49,830 the steps of collecting the data themselves

22 00:00:50,160 --> 00:00:51,990 all the way to producing that data

23 00:00:52,080 --> 00:00:55,380 and interpreting scientific results from their studies

24 00:00:55,770 --> 00:00:57,120 i get excited anytime i get to

25 00:00:57,120 --> 00:00:58,890 operate a radio telescope and i really

26 00:00:58,890 --> 00:01:00,240 enjoy it when other people get to

27 00:01:00,240 --> 00:01:00,450 have

28 00:01:00,570 --> 00:01:03,060 seeing opportunity and that same learning process

29 00:01:04,080 --> 00:01:08,010 and

So it is clearly worse than Whisper.

Benchmarked on a Lenovo ThinkPad P14s amd laptop.

0

Ok, found some tool which looks nice and similar to subtitle workshop - subtitle editor (apt-get install subtitleeditor).

Tried to compare it to Gnome Subtitles, subtitle editor looks more advance tool.

idgar
  • 3,010
0

For KDE, a good subtitle editor is subtitlecomposer. Install it with the command:

sudo apt-get install subtitlecomposer
karel
  • 122,292
  • 133
  • 301
  • 332
Anwar
  • 77,855