1

I am trying to use the best model from tesseract. However, I am getting the following error:

tesseract sample.jpg stdout --tessdata-dir tessdata/
Error opening data file tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Here is the folder structure: .

├── sample.jpg
└── tessdata
    └── eng.traineddata

Ubuntu Version:
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:        18.04
Codename:       bionic
tesseract version:
tesseract 4.0.0-beta.1
 leptonica-1.75.3
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Jos
  • 30,529
  • 8
  • 89
  • 96
Monster
  • 21

3 Answers3

3

I had the same problem, I did a fair bit of looking around for a solution and these looked complicated but not always successful - then I realised that the problem was actually rather simple, a quick fix was right there in that the error message is explicit about where the files are expected to be, in the parent folder of tessdata.

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory

It seems a configuration file expects files to be one level up so in my case /usr/share/tesseract-ocr/4.00/

By copying the language files and the training data (in my case eng.traineddata and osd.traineddata) in the tessdata folder /usr/share/tesseract-ocr/4.00/tessdata to the parent folder one level up

After this tesseract did not have any more problems

These were the correct locations for an Ubuntu 19.10 installation

Nelly
  • 41
0

Some interesting findings, --tessdata-dir doesn't work as of now inside my python script,But however it works in the command line.

tesseract sample.png --tessdata-dir /home/nmpai/MyPC/tesseract/ocrb/tessdata -l ocrb_int

As other answers suggest externally setting TESSDATA_PREFIX environment variable works, but that's a hassle..

We can fix this without externally setting TESSDATA_PREFIX, by setting this inside python script so that deployment is smooth I guess.

import os
os.environ['TESSDATA_PREFIX'] = os.path.join(os.getcwd(),"tessdata")

I have removed --tessdata-dir from the config as the above works and is neat inside the python script.

-1

You seem to have not set the TESSDATA_PREFIX variable. Edit ~/.bashrc with any text editor, eg. nano ~/.bashrc' and add a line export TESSDATA_PREFIX='<absolute path to tessdata>' where I suppose tessdata refers to the folder you have mentioned.

Do run source ~/.bashrc once you are done editing and have saved .bashrc. Hope that helps!