5

I want to be able to input a block of text and then have it guess a string within a predefined range (i.e. a string that starts with three letters and ends with five numbers like "XXX12345", etc). Ideally, the string it will be guessing will be somewhere in the block of text, but sometimes it won't be.

I have been struggling where to begin on this or if I am even going in the right direction for considering Machine/Deep learning to try to do this.

Help!

TreHoffman
  • 59
  • 5

4 Answers4

0

you should definitely check about recurrent-neural networks trained on character level language data. but it make sure you have a relevant dataset.

0

I would also suggest character level Recurrent neural nets but with Normal Char level RNN we can only predict next chars based on previous chars so you should consider it to be bidirectional RNN because say we have text "xxx12345" basically if we feed this to our model our model should predict first three places based on last places ( in DL they call it as going back through time) and this is possible only by Bidirectional RNN.

koushik
  • 11
  • 1
0

I would suggest you use a sequence to sequence model with character level features. It is an easy task, provided you have data.

Patel Sunil
  • 185
  • 1
  • 9
0

As Andreas has commented this is a problem of statistical language model (a probability distribution over a sequence of words). The important thing you need is a hash table mapping fixed-length to the expected ending chains of words in your dictionary.
Things that can make your prediction better:

  • Add better and more words to your dictionary.
  • Use text expansion.

What you are looking for will require a pinch of Reinforcement Learning too. You need to figure out a way to penalize and award the predictions and then use the result in future. Your case also requires you to build your own corpus, which is the hardest part. If your corpus is good, it will give better results.
This is the research paper that will help you a lot.

Ugnes
  • 2,003
  • 1
  • 15
  • 26