5

I would like to be able to scan paper documents to PDF files and make the text searchable. I believe the Tesseract program can assist this, but don't know how to begin, and don't know what would be the best program to use.

Is anybody making searchable PDF files successfully?

2 Answers2

4

I can recommend ocrmypdf, see https://github.com/ocrmypdf/OCRmyPDF , also packaged for Ubuntu. You can install it by running:

sudo apt install ocrmypdf

You can use it as follows:

ocrmypdf -l eng infile.pdf outfile.pdf

The ocrmypdf call above is a simple one that specifies the document language as English (-l eng). There are many options in the man page; you might want to discover them as needed over time.

Sven
  • 219
1

There is also OCRthyPDF-Essentials (a GUI for OCRmyPDF) for users unfamiliar with command line tools.

It supports more than 100 languages "out-of-the-box" (all languages that are installed with tesseract), and can be installed directly from the Snap Store.

https://github.com/digidigital/OCRthyPDF-Essentials

Jomo
  • 53
  • 2
  • 11