Detect original sound waves

Question

I have just presented a project of mine regarding sound recognition using neural networks. I told during the presentation that I decided to only recognize one sound (musical notes coming from a guitar) at any point in time, explaining that recognizing multiple sounds is very hard or impossible.

I apply FFT on the soundwave and feed that into a neural network. My question is, if I were to record multiple sounds at once, couldn't the FFT data be exactly the same for different sounds?

I mean, if you combine $\sin(x)$ and $\sin(x+\pi)$ you get a wave that is a straight line. You also get a straight line for $\sin(2x)$ and $\sin (2x + \pi/2)$. So, two different set of waves give the same combined wave, that will have the same FFT data.

But I got confused because the professor said that you can always decompose the sound into its original elements (e.:. take a chord that has 6 different notes played individually) because the data is in time-domain. Can you really always decompose a sound wave in its components? What about the case above? He further explained that in "space-domain" you can have the problem I mentioned above, but in the time-domain not?

PS: I don't know if this is the right Stack Exchange site to ask this question on.

score 1 · Answer 1 · edited Apr 13 '17 at 12:40

Most sounds (even the sound of a "single note") contain multiple frequencies. For pure sounds, there is the fundamental frequency and its harmonics, but almost any "real" sound contains some additional components - due to the envelope of the sound (e.g. the fact that a string must be plucked, then decays) or due to sampling (your sample is finite - so there will be some effects due to this truncation of the sound wave).

When you do a Fourier Transform, you will see all the frequency components. It is quite easy to detect the pitch of multiple strings struck at the same time in, say, a piano chord. Now depending on your definition of "recognizing" different sounds, it may be tough to detect two notes that are sounded simultaneously but an octave apart (so that one sound is in essence covering the harmonics of the other). The ear is remarkably good at picking up the difference - for example, we can tell that the higher note is not exactly in phase, and that the amplitude of the second harmonic (the fundamental of the higher note) is higher than it would be if it was just the second harmonic of the lower note.

Of course if you play two notes of the same frequency and phase, then you will not be able to tell them apart (except by the fact that the amplitude will be larger than you expect from a "single note", perhaps). But "real" sounds, of musical instruments, will have sufficiently unique signatures that you can "see" them.

You might be interested in an answer I wrote a while ago where I used a simple iPhone app to analyze the spectrum of the sound of a coin being dropped; this shows that there are multiple harmonics for different sounds, and that you can most likely see (by looking at the combined spectrum) the sum of these two sounds - and tell them apart. It would actually be a fun application of your work to detect "how much money did I drop?".

Detect original sound waves

1 Answers1