How can instantaneously extract text from a screen area using OCR tools?

Question

In Ubuntu 12.10, if I type

gnome-screenshot -a | tesseract output

it returns:

** Message: Unable to use GNOME Shell's builtin screenshot interface, resorting to fallback X11.

How can I select a text from the screen and convert it to text (clipboard or document)?

Thank you!

score 64 · Accepted Answer · edited Jan 17 '20 at 09:30

Maybe there is already some tool that does that, but you can also create a simple script with some screenshot tool and tesseract, as you are trying to use.

Take as an example this script (in my system I saved it as /usr/local/bin/screen_ts):

#!/bin/bash
# Dependencies: tesseract-ocr imagemagick scrot

select tesseract_lang in eng rus equ ;do break;done
# Quick language menu, add more if you need other languages.

SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT

scrot -s $SCR_IMG.png -q 100 
# increase quality with option -q from default 75 to 100
# Typo "$SCR_IMG.png000" does not continue with same name.


mogrify -modulate 100,0 -resize 400% $SCR_IMG.png 
#should increase detection rate

tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
cat $SCR_IMG.txt
exit

And with clipboard support:

#!/bin/bash 
# Dependencies: tesseract-ocr imagemagick scrot xsel

select tesseract_lang in eng rus equ ;do break;done
# quick language menu, add more if you need other languages.

SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT

scrot -s $SCR_IMG.png -q 100    
# increase image quality with option -q from default 75 to 100

mogrify -modulate 100,0 -resize 400% $SCR_IMG.png 
#should increase detection rate

tesseract $SCR_IMG.png $SCR_IMG &> /dev/null
cat $SCR_IMG.txt | xsel -bi

exit

It uses scrot to take the screen, tesseract to recognize the text and cat to display the result. The clipboard version additionally utilizes xsel to pipe the output into the clipboard.

sample usage

NOTE: scrot, xsel, imagemagick and tesseract-ocr are not installed by default but are available from the the default repositories.

You may be able to replace scrot with gnome-screenshot, but it may take a lot of work. Regarding the output you can use anything that can read a text file (open with Text Editor, show the recognized text as a notification, etc).

GUI version of the script

Here's a simple graphical version of the OCR script including a language selection dialog:

#!/bin/bash
# DEPENDENCIES: tesseract-ocr imagemagick scrot yad
# AUTHOR:       Glutanimate 2013 (http://askubuntu.com/users/81372/)
# NAME:         ScreenOCR
# LICENSE:      GNU GPLv3
#
# BASED ON:     OCR script by Salem (http://askubuntu.com/a/280713/81372)

TITLE=ScreenOCR # set yad variables
ICON=gnome-screenshot

# - tesseract won't work if LC_ALL is unset so we set it here
# - you might want to delete or modify this line if you 
#   have a different locale:

export LC_ALL=en_US.UTF-8

# language selection dialog
LANG=$(yad \
    --width 300 --entry --title "$TITLE" \
    --image=$ICON \
    --window-icon=$ICON \
    --button="ok:0" --button="cancel:1" \
    --text "Select language:" \
    --entry-text \
    "eng" "ita" "deu")

# - You can modify the list of available languages by editing the line above
# - Make sure to use the same ISO codes tesseract does (man tesseract for details)
# - Languages will of course only work if you have installed their respective
#   language packs (https://code.google.com/p/tesseract-ocr/downloads/list)

RET=$? # check return status

if [ "$RET" = 252 ] || [ "$RET" = 1 ]  # WM-Close or "cancel"
  then
      exit
fi

echo "Language set to $LANG"

SCR_IMG=$(mktemp) # create tempfile
trap "rm $SCR_IMG*" EXIT # make sure tempfiles get deleted afterwards

scrot -s "$SCR_IMG".png -q 100 #take screenshot of area
mogrify -modulate 100,0 -resize 400% "$SCR_IMG".png # postprocess to prepare for OCR
tesseract -l "$LANG" "$SCR_IMG".png "$SCR_IMG" # OCR in given language
xsel -bi < "$SCR_IMG".txt # pass to clipboard
exit

Aside from the dependencies listed above you will need to install the Zenity fork YAD from the webupd8 PPA to make the script work.

danpla · Answer 2 · 2022-12-24T04:51:01.367

21

I created a free and open source program for this purpose:

https://danpla.github.io/dpscreenocr/

edited Dec 24 '22 at 04:51

answered Mar 17 '19 at 11:19

danpla

341

score 4 · Answer 3 · answered May 02 '18 at 11:58

Don't know if any one need my solution. Here is one that runs with wayland.

It shows the character-recognition in a Text-Editor and if you add the paramter "yes" you got the translation from the goggle trans tool (Internet connection is mandatory) Before you can use it install tesseract-ocr imagemagick and google-trans. Start the script i.e. in gnome with Alt+F2 when you see your text that you want to recognize. Move the courser arround the text. Thats it. This script was testetd only for gnome. For other window manager it musst be accommodate. To translate the text in other languages replace the language ID in line 25.

#!/bin/bash
# Dependencies: tesseract-ocr imagemagick google-trans

translate="no"
translate=$1

SCR_IMG=`mktemp`
trap "rm $SCR_IMG*" EXIT

gnome-screenshot -a -f $SCR_IMG.png  
# increase quality with option -q from default 75 to 100
# Typo "$SCR_IMG.png000" does not continue with same name.


mogrify -modulate 100,0 -resize 400% $SCR_IMG.png 
#should increase detection rate

tesseract $SCR_IMG.png $SCR_IMG &> /dev/null

if [ $translate = "yes" ] ; then

        trans :de file://$SCR_IMG.txt -o $SCR_IMG.translate.txt
        gnome-text-editor $SCR_IMG.translate.txt
        else
        gnome-text-editor $SCR_IMG.txt
fi

exit

Dominic108 · Answer 4 · 2020-04-29T09:19:19.737

This single line script (based on https://askubuntu.com/a/1084151/456438) is to be used with a keyboard shortcut, so that you can ocr anywhere in the screen without having to open a terminal, just as illustrated in this image.

#!/bin/bash
# Dependencies: convert imagemagick xsel tesseract-ocr-fra [tesseract-ocr-jpn ...]  

convert x: -modulate 100,0 -resize 400% -set density 300 png:- |
  tesseract stdin stdout -l fra+eng+jpn --psm 3 | 
  sed 's/'$(printf '%b' '\014')'//g;s/|/I/g' | 
  xsel -bi

Change the --psm and -l options as needed. Examples:

--psm 10 is for an image with a single character
--psm 3 is the default.
-l fra+eng+jpn will first consider French, then English, then Japanese.

Adapt the sed post processing to your need. sed 's/'$(printf '%b' '\014')'//g;s/|/I/g' removes the Form Feed character (octal 014) and replaces the vertical bar "|" with "I".

Use the standard instructions https://help.ubuntu.com/stable/ubuntu-help/keyboard-shortcuts-set.html.en to create a keyboard shortcut that will execute the script. Put the script in ~/bin or anywhere in your path environment variable to avoid using the full path to your script.

score 2 · Answer 5 · answered Apr 12 '16 at 09:52

I just done a blogging about how to use screenshot in modern day. Even though i target Chinese but the screen cast and code is in english. OCR is merely one of the feature.

Feature for my OCR:

Open in konsole+vimx OR gedit to further edit.
For vimx+english, enable spelling checking.
Support dynamic language selection without hard code.
Progress dialog when converting and tesseracting which is slow.

Function code:

function ocr () {
    tmpj="$1"
    tmpocr="$2"
    tmpocr_p="$3"
    atom="$(tesseract --list-langs 2>&1)"; atom=(`echo "${atom#*:}"`); atom=(`echo "$(printf 'FALSE\n%s\n' "${atom[@]}")"`); atom[0]='True'
    ans=(`yad --center --height=200 --width=300 --separator='|' --on-top --list --title '' --text='Select Languages:' --radiolist --column '✓' --column 'Languages' "${atom[@]}" 2>/dev/null`) && ans="$(echo "${ans:5:-1}")" &&  convert "$tmpj[x2000]" -unsharp 15.6x7.8+2.69+0 "$tmpocr_p" | yad --on-top --title '' --text='Converting ...' --progress --pulsate --auto-close 2>/dev/null && tesseract "$tmpocr_p" "$tmpocr" -l "$ans" 2>>/tmp/tesseract.log | yad --percentage=50 --on-top --title '' --text='Tesseracting ...' --progress --pulsate --auto-close 2>/dev/null && if [[ "$ans" == 'eng' ]]; then konsole -e "vimx -c 'setlocal spell spelllang=en_us' -n $tmpocr.txt" 2>/dev/null; else gedit "$tmpocr.txt"; fi
    rm "$tmpocr_p"
}

Caller code:

for cmd in "mktemp" "convert" "tesseract" "gedit" "konsole" "vimx" "yad"; do 
    command -v $cmd >/dev/null 2>&1 || {  LANG=POSIX; xmessage "Require $cmd but it's not installed.  Aborting." >&2; exit 1; }; :;
done
tmpj="$(mktemp /tmp/`date +"%s_%Y-%m-%d"`_XXXXXXXXXX.png)"
tmpocr="$(mktemp -u /tmp/`date +"%s_%Y-%m-%d"`_ocr_XXXXX)"
tmpocr_p="$tmpocr"+'.png'
gnome-screenshot -a -f "$tmpj" 2>&1 >/dev/null | ts >>/tmp/gnome_area_PrtSc_error.log
ocr $tmpj $tmpocr $tmpocr_p &

Combine this 2 code in single shell script to run.

Screenshot 1:

Screenshot 2:

Eduard Florinescu · Answer 6 · 2018-12-14T13:45:37.400

The idea is anytime a new screenshot files appear in the folder run tesseract OCR on it and open in a file editor.

You can leave this running script in the output directory of your favorite screen shot output directory

#cat wait_for_it.sh
inotifywait -m . -e create -e moved_to |
    while read path action file; do
        echo "The file '$file' appeared in directory '$path' via '$action'"
        cd "$path"
        if [ ${file: -4} == ".png" ]; then
                tesseract "$file" "$file"
                sleep 1
                gedit "$file".txt &
        fi

    done

You will need this to be istalled

sudo apt install tesseract-ocr
sudo apt install inotify-tools

score 1 · Answer 7 · edited Mar 24 '22 at 20:23

I wasn't able to install mogrify and scrot so here's another way using gnome-screenshot in Ubuntu. I added a keyboard shortcut to run the command to trigger a snapshot in Ubuntu and send the OCR output text to the clipboard.

#!/bin/bash
SCR_IMG=".screentemp.jpg"
TEMP_TXT=".screentext.txt"
gnome-screenshot -a --file=$SCR_IMG
tesseract $SCR_IMG $TEMP_TXT -l eng
cat $TEMP_TXT* | xsel -b
rm $SCR_IMG $TEMP_TXT*

Syed Sazid Hossain Rezvi · Answer 8 · 2023-03-03T12:37:37.157

Short Answer: https://github.com/SR-Hossain/image2textSR

Details: I use flameshot for this like below...

sudo apt-get install flameshot
pip install pytesseract pyperclip
sudo nano /image2text.py

Paste this code in the image2text.py and save using ctrl+x > y > Enter

from PIL import Image
import pytesseract
import pyperclip
extracted_text = pytesseract.image_to_string(Image.open('/tmp/a.png'))
pyperclip.copy(extracted_text)

Now run this command whenever you want to copy text from selected part of your screen...

rm -f /tmp/a.png && flameshot gui --path /tmp/a.png && python3 /image2text.py

I have created a custom shortcut (mapped to shift+prtsc) in my linux, so I just press shift+prtsc whenever I want to copy text from image file...

score 1 · Answer 9 · edited May 27 '24 at 14:46

1

I found NormCap the tool with the least friction. Using Tesseract directly can lead to many wrong characters.

edited May 27 '24 at 14:46

karel

122,292
133
301
332

answered Jan 22 '24 at 09:41

koppor

263
1
3
9

score 1 · Answer 10 · answered Feb 26 '24 at 12:36

1

I ended up using frog. It is available as a snap and also works with Wayland.

answered Feb 26 '24 at 12:36

moi

111

Ahmad Ismail · Answer 11 · 2022-08-09T06:25:40.893

I use the following command.

maim -s | convert - -units PixelsPerInch -resample 300 -sharpen 12x6.0 - | tesseract -l eng stdin stdout | sed '$d' | perl -0777 -pe 's/^(\s*\n)+|(\s*\n)+$//g' | xclip -in -selection clipboard

Here I am using sed '$d' | perl -0777 -pe 's/^(\s*\n)+|(\s*\n)+$//g' to remove blank lines from beginning and end of the output.

You can also use normcap which is a simple python program for the specific task you are looking for.

score 0 · Answer 12 · answered Jun 12 '24 at 21:40

0

Just use google lens on www.google.com

First, click on the camera icon:

screenshot 1

Then, upload your image or paste it pressing ctrl + v if you have it on clipboard, and you will be able to select the text on the image :

screenshot 2

answered Jun 12 '24 at 21:40

Cebo

1

How can instantaneously extract text from a screen area using OCR tools?

12 Answers12

Linked

Related