1

I am trying to train Tesseract in Ubuntu 20.04.1 LTS.I have downloaded tesseract and the training tools required.

For the training data I am using jTessBoxEditor.I have the .tiff files but I am unable to make the .box files.When I type the following in my terminal:

tesseract --psm 6 --oem 3 Liberation_serif.font.exp0.tif Liberation_serif.font.exp0 makebox

I get the following error:

Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

I have tried downloading eng.traineddata from git and pasting it to tessdata.But I got the same error message.Then I changed TESSDATA_PREFIX several times to make it point tessdata but I got the same error message again. How do I resolve this?

Edit:The tesseract executable and tesseract source code I downloaded are in different locations.

Hula
  • 11

1 Answers1

0

I downloaded tesseract in two locations.The location that TESSDATA_PREFIX was pointing to did not have eng.traineddata.I downloaded it in that directory from github and used cat >> .pam_environment again to make TESSDATA_PREFIX point that location.

I logged in again and I am able to make .box files now.

Hula
  • 11