I have just presented a project of mine regarding sound recognition using neural networks. I told during the presentation that I decided to only recognize one sound (musical notes coming from a guitar) at any point in time, explaining that recognizing multiple sounds is very hard or impossible.
I apply FFT on the soundwave and feed that into a neural network. My question is, if I were to record multiple sounds at once, couldn't the FFT data be exactly the same for different sounds?
I mean, if you combine $\sin(x)$ and $\sin(x+\pi)$ you get a wave that is a straight line. You also get a straight line for $\sin(2x)$ and $\sin (2x + \pi/2)$. So, two different set of waves give the same combined wave, that will have the same FFT data.
But I got confused because the professor said that you can always decompose the sound into its original elements (e.:. take a chord that has 6 different notes played individually) because the data is in time-domain. Can you really always decompose a sound wave in its components? What about the case above? He further explained that in "space-domain" you can have the problem I mentioned above, but in the time-domain not?
PS: I don't know if this is the right Stack Exchange site to ask this question on.