0

Suppose, I have the following data-set:

... ...
... ...
AABBB  7.027  5.338  5.335  8.122  5.537  6.408
ABBBA  5.338  5.335  5.659  5.537  5.241  7.043
BBBAA  5.335  5.659  6.954  5.241  8.470  8.474
BBAAA  5.659  6.954  5.954  8.470  9.266  9.334
BAACA  6.954  5.954  6.117  9.266  9.243 12.200
AABAA  5.954  6.117  6.180  9.243  8.688 11.842
ACAAA  6.117  6.180  5.393  8.688  5.073  7.722
ABAAC  6.180  5.393  6.795  5.073  8.719  7.854
BAACC  5.393  6.795  5.796  8.719  9.196  9.705
... ...
... ...

Apparently, the feature values represent a string pattern comprising of only three letters A, B, and C.

I have to design a neural network that would be able to detect these patterns and spit out a binary representation of these strings where the letters should be encoded in 3-bit binary(one-hot encoding).

My first question is, What kind of problem is it and why?

My next question is, How should I approach this problem to solve it?

user366312
  • 341
  • 1
  • 13

1 Answers1

3

If you're trying to predict the string pattern, given the numerical feature and assuming your string pattern is fixed sized, you can one-hot encode each letter then combine them (into an array that is no longer one-hot). So AABBC would look like:

[1,0,0,1,0,0,0,1,0,0,1,0,0,0,1] <- Use this for training
[A,B,C,A,B,C,A,B,C,A,B,C,A,B,C]
[A,_,_,A,_,_,_,B,_,_,B,_,_,_,C]
AABBC

where each group of triplets represent a single integer. Then you can train a network with cross-entropy. This is the problem formulation of multi-task learning where you predict multiple things simultaneously. Needless to say, it is classification.

Recessive
  • 1,446
  • 10
  • 21
meliksahturker
  • 346
  • 2
  • 8