Questions tagged [tesseract]

An open-source optical character recognition engine

Tesseract is an open-source optical character recognition engine. Character data sets for various scripts and languages pre-exist and the engine allows training of additional (custom) data sets.

Tesseract's output will have very poor quality if the input images are not preprocessed to suit it: Images (especially screenshots) must be scaled up such that the text x-height is at least 20 pixels, any rotation or skew must be corrected or no text will be recognized, low-frequency changes in brightness must be high-pass filtered, or Tesseract's binarization stage will destroy much of the page, and dark borders must be manually removed, or they will be misinterpreted as characters.

20 questions

votes

2 answers

ocrfeeder doesn't detect anything

When I try to detect text on my jpeg, it shows correctly all areas where it suspects text and images, but when I export it to ODT it only creates an ODT with empty text- and imageframes. Do I have to configure tesseract somehow? (I use Ubuntu 14.10…

ocr tesseract

asked Jul 03 '15 at 23:29

rubo77

34,024
52
172
299

votes

2 answers

What program is suitable for making scanned PDF files searchable?

I would like to be able to scan paper documents to PDF files and make the text searchable. I believe the Tesseract program can assist this, but don't know how to begin, and don't know what would be the best program to use. Is anybody making…

pdf scanner search ocr tesseract

asked Jul 13 '23 at 10:45

Hedley Finger

1,040

votes

1 answer

How to improve tesseract performance?

By all accounts, tesseract is superb. However, my results are dismal. I need to convert (digital, as opposed to from a book) text that I only have as a png. For instance: 2 3 academics 1 1711 2 3 Achlmbobelmann 211 191—2 1 3 Aoqusmono|Food…

command-line image-processing ocr tesseract

asked Jan 27 '14 at 07:27

katriel

votes

0 answers

Tesseract giving errors

This morning I tried to use tesseract and I'm getting the following error messages: $ tesseract --list-langs Error in pixReadMemTiff: function not present Error in pixReadMem: tiff: no pix returned Error in pixaGenerateFontFromString: pix not…

19.04 tesseract

asked Oct 07 '19 at 10:06

To Do

15,833

votes

0 answers

OCR with two-page layout

I'm trying to do OCR on a pdf with a two-page layout - in a landscape-orientation page of the PDF, the left half is one (portrait-orientation) page, the right half is the next (portrait-orientation) page. Sometimes the layout messes up tesseract.…

pdf ocr tesseract

asked Feb 22 '21 at 18:39

Raffi

vote

1 answer

Tesseract ocr - problems finding languages

I was having problems with Ubuntu 22.04 on my Framework laptop, and did a complete re-installation, using Ubuntu 24.04.1. I have just reinstalled Tesseract using snap. It was was working ok previously on Ubuntu 22.04, but now gives the…

snap ocr tesseract

asked Dec 27 '24 at 13:18

Andrew Corser

vote

0 answers

I'm having trouble installing OCRopy, I want to use it to create train data for an old manuscript in latin. What prereqs are needed and lines to write

So I am new to using Ubuntu and I am trying to install OCRopy to make train data with the end goal of creating a transcript for a 15th c. manuscript. So far I am considering that my problem may be a lack of prerequisites. I have installed python3…

opencv github ocr tesseract training

asked May 02 '21 at 03:48

mumbot

vote

1 answer

Cannot make .box files -Training Tessearct

I am trying to train Tesseract in Ubuntu 20.04.1 LTS.I have downloaded tesseract and the training tools required. For the training data I am using jTessBoxEditor.I have the .tiff files but I am unable to make the .box files.When I type the following…

tesseract training

asked Aug 16 '20 at 13:38

Hula

vote

2 answers

How to write bash script to run the same command for all files in a directory

I want to run this command for all files in a directory. tesseract /home/kong/Documents/input/248.jpg stdout --psm 1 --oem 1 --dpi 300 tsv >/home/kong/Documents/input/ocr_output/input/248.tsv The input and output should have same number like…

18.04 bash tesseract

asked Jul 31 '19 at 16:22

BloodThirst

vote

1 answer

Can Qt-box-editor be used for tesseract 4.0?

I am using tesseract 4.0 for character recognition. In many blogs, it is written that Qt-box-editor can be used with tesseract 3.x. My question is:- Can Qt-box-editor be used with tesseract 4.0?

ocr tesseract

asked Jul 12 '19 at 05:19

Ashna Eldho

vote

3 answers

Tesseract -tessdata-dir option not working in ubuntu 18.04

I am trying to use the best model from tesseract. However, I am getting the following error: tesseract sample.jpg stdout --tessdata-dir tessdata/ Error opening data file tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment…

18.04 tesseract

asked Jan 19 '19 at 11:59

Monster

vote

3 answers

Ubuntu 18.04 error install tesseract

I've installed Ubuntu 18.04. I've installed tesseract using sudo apt-get install tesseract-ocr When I type: tesseract -v I had an error: tesseract: symbol lookup error: /usr/lib/x86_64-linux-gnu/libtesseract.so.4: undefined symbol:…

php ocr tesseract

asked Jan 11 '19 at 14:55

mayur panchal

votes

2 answers

How can I get Tesseract OCR to recognise the large digits of an electricity meter?

I want to use an OCR program on an RPi to recognise the digits from a photo of my electricity meter. The digits are large and are very obvious to me, but Tesseract appears unable to recognise them at all - at best it detects a few random wrong…

ocr tesseract

asked Aug 07 '17 at 20:13

Shaka Zulu

votes

1 answer

KDE Wayland: Taking region screenshots faster?

I'm using this script from HN* to select regions on the screen and copy their text, I took out the line with mogrify. It uses spectacle but it takes a moment before opening the UI, is it possible and would it be faster if Spectacle stayed open in…

kde wayland desktop-recording capture tesseract

asked Mar 28 '25 at 06:13

Oneechan69

votes

1 answer

How can I get tesseract-ocr v5 to find the eng.traineddata file?

Ubuntu 22.04.3 LTS tesseract 5.3.2 XSane 0.999 YAGF 0.9.5 Epson Workforce WF-4835 printer/scanner This set up works together to a point. Clicking the Scan button in YAGF causes XSane to start up, scan the document in the scanner, and display the…

scanner ocr tesseract

asked Dec 24 '23 at 11:04

Hedley Finger

1,040

2 Next