I have a lot of images and what I want to do is to scan those images and get output in ms word file that can be edited later. For Windows, I have Abbyy fine reader. But I don't want to go back to Windows. Please tell me if there is any application that can do the same for me. Please help me in this.
2 Answers
You can use Abbyy OCR.
ABBYY FineReader Engine CLI for Linux is a ready to use CLI tool based on ABBYY’s advanced Optical Character Recognition (OCR) technologies. The tool automates OCR and document conversion on Linux systems.
For more information, and to download it, visit their website.
Source:Ocr4Linux
- 109,787
First of all, here are some more OCR tools besides Abbyy which have an SDK and you can use on Linux. But note that not all of them support MS Word output:
- Tesseract - text output only
- Ocrad - text output only
- GOCR - text output only
- CuneiForm - RTF output
- OmniPage - Google Docs and PDF output
Here is an article (from 2007, but probably still relevant) benchmarking the first three engines on accuracy and speed: http://www.mathstat.dal.ca/~selinger/ocr-test/
By the way, all the engines including Abbyy are best for unstructured text - in other words, images which don't follow a regular structure. If the "images" you are processing have a standard layout, e.g. forms filled out by customers (where the fields are always in the same place), various cards (such as business cards, ID cards), etc., there are specialized solutions that can detect and OCR only the specific text fields, "clean" out image "noise", and output the text in a structured manner (e.g. Name = John Smith, ID Number = 123456).
If your images ARE "templates", and you need an OCR that can output structured text, there are actually very few Linux solutions (as far as I know). Here are two solutions I'm familiar with:
- CSSN OCR (http://www.card-reader.com). Specializes in card-type documents like ID cards, driver's licenses, medical cards, bank checks, credit cards, etc. Runs on Linux using WINE.
- ARH (http://www.arhungary.hu). Able to read travel documents, passports, visa and ID cards.
HTH, Dana
- 90