Questions tagged [optical-character-recognition]

For questions about the application of AI/ML algorithms in the field of optical character recognition (OCR), aka optical character reader (OCR), which is the mechanical or electronic conversion of images of typed, handwritten, or printed text into machine-encoded text.

For more info, take a look at the related Wikipedia article.

36 questions
24
votes
3 answers

Why can't OCR be perceived as a good example of AI?

On the Wikipedia page about AI, we can read: Optical character recognition is no longer perceived as an exemplar of "artificial intelligence" having become a routine technology. On the other hand, the MNIST database of handwritten digits is…
kenorb
  • 10,525
  • 6
  • 45
  • 95
10
votes
3 answers

Are there any textual CAPTCHA challenges which can fool AI, but not human?

Are there any modern techniques of generating textual CAPTCHA (so person needs to type the right text) challenges which can easily fool AI with some visual obfuscation methods, but at the same time human can solve them without any struggle? For…
kenorb
  • 10,525
  • 6
  • 45
  • 95
7
votes
2 answers

Effective algorithms for OCR

I am using Google's OCR to extract text from images, like receipts and invoices. Whare examples of techniques used to make sense of the text? For example, I would like to extract the date, name of the business, address, total amount, etc. Before…
6
votes
1 answer

How should the racing agent take into account the velocity of the vehicle, given the images with a speedometer?

I'm developing a game AI, which tries to master racing simulations. I already trained a CNN (AlexNet) on in-game footage of me playing the game and the pressed keys as the target. As the CNN is only making predictions on a frame-to-frame basis, and…
5
votes
1 answer

Why object detection algorithms are poor in optical character recognition?

OCR is still a very hard problem. We don't have universal powerful solutions. We use the CTC loss function An Intuitive Explanation of Connectionist Temporal Classification | Towards Data Science Sequence Modeling With CTC | Distill which is very…
5
votes
1 answer

In OCR, how should I deal with the warped text on the sides of oval objects?

Consider an image that contains one can (or bottle, or any similar oval object), which has texts all over it. In the image below, I have many bottles, but you can assume that each image only contains one such object. As we can see, in each can, the…
5
votes
2 answers

How can we recognise musical notes in low-resolution or blurry images?

I was looking for an approach to recognise musical notes from photos. I found this repository https://github.com/mpralat/notesRecognizer. However, it doesn't seem good enough. If you look into the bad folder, you can see that just tiny variations of…
Toskan
  • 151
  • 1
  • 4
4
votes
1 answer

How should I define the loss function for a multi-object detection problem?

I'm trying to create a text recognition project using CNN. I need help regarding the text detection task. I have the training images and bounding box details for them. But I'm unable to figure out how to create the loss function. Can anyone help…
3
votes
0 answers

zonal or template ocr invoices reading

I'd like to explore the possibilities of applying artificial intelligence to ocr reading. Basic ocr invoices processing let me convert 30% of them only. The main purpose is defining invoices areas by training an ai, then process those areas with…
3
votes
2 answers

How could I use machine learning to detect text and non-text regions in scanned documents?

I have a collection of scanned documents (which come from newspapers, books, and magazines) with complex alignments for the text, i.e. the text could be at any angle w.r.t. the page. I can do a lot of processing for different features extraction.…
3
votes
2 answers

How to improve the performance of Easy OCR

I am working on a project that requires me to identify a product on a grocery shelf. For that, I am trying to use test recognition and localization to spot a product. I tried Easy OCR and tesseract OCR because they are giving me accurate results,…
3
votes
0 answers

Is there a deep learning-based architecture for digit localisation?

I'm new to object detectors and segmentation. I want to localize digits on a plate as fast as possible. All images of the dataset are normalized to $300 \times 60$. There are different approaches to solve the problem. For example, binarization +…
3
votes
1 answer

Attempting to solve a optical character recognition task using a feed-forward network

I am doing some experimentation on neural networks, and for that I am trying to program a plain OCR task. I have learned CNNs are the best choice ,but for the time being and due to my inexperience, I wanna go step by step and start with feedforward…
3
votes
0 answers

How does a neural network output text box location data?

I'm interested in creating a convolutional neural network or LSTM to locate text in an image. I don't want to OCR the text yet, just find the text regions. Yes, I know Tesseract and other systems can do this, but I want to learn how it works by…
2
votes
1 answer

How do I go about performing OCR text extraction from thousands of PDFs for training an AI model?

I have lots of data (PDFs) that I want to train an AI model to extract info from. All of them are a little different but have the same key data points. Is it possible to train an AI on the PDFs I have so that it would be able to recognize other PDFs…
1
2 3