Why is the short-time Fourier transform used for preprocessing audio samples?

Question

I've been told this is how I should be preprocessing audio samples, but what information does this method actually give me? What are the alternatives, and why shouldn't I use them?

score 2 · Accepted Answer · answered Oct 03 '18 at 05:39

Fourier transform is used to transform audio data to get more information (features).

For example, raw audio data usually represented by a one-dimensional array, x[n], which has a length n (number of samples). x[i] is an amplitude value of the i-th sample point.

Using the Fourier transform, your audio data will be represented as a two-dimensional array. Now, x[i] is a not a single value of amplitude, but a list of frequencies which compose original value at the i-th frame (a frame consists of a few samples).

See the image below (from wikipedia), the red graph is an original value of n samples before transformed, and the blue graph is a transformed value of one frame.

Why is the short-time Fourier transform used for preprocessing audio samples?

1 Answers1