Why does audio signal amplitude always fall off at higher frequencies?

Question

In the frequency spectrum of every real audio sample that I've ever seen, the amplitude of the frequency components is always higher at low frequencies, then rapidly falls off at higher frequencies.

For example, each of the following plots displays the median amplitude vs. frequency with a $\log_{10}$ amplitude axis (Y) and a $\log_{2}$ frequency axis (X). The values for each were computed with a series of FFTs over the entire sample in blocks of 8,192 samples (the amplitudes are calculated as the magnitudes of the complex results):

Recorded Audio (90 minutes of city traffic)	Recorded Audio (30 minutes of city traffic)

Recorded with a calibrated flat-response signal analysis mic.	Recorded with an uncalibrated flat-response signal analysis mic.

Television Audio	Classical Music

Mostly vocals, presumably mastered for production, encoded as lossy AAC.	Presumably mastered for production, encoded as lossless FLAC from source.

Note that in each plot, there is a steep fall-off of signal amplitude (remember these are logarithmic axes) as the frequency increases.

Why is this the case?

Does it have something to do with properties of sound in air? Or is it somehow related to a connection between power, amplitude, and frequency? Or is it just some consequence of DFTs that I don't understand? I see it consistently, all the time.

Also, is there predictable math behind the falloff that I can use to "normalize" the results i.e. flatten the curves for analysis purposes?

I know it's not just a result of production mastering because it appears in unmodified signals. I know it's not just a characteristic of the city noise I recorded because I see it regardless of the sound source (I've recorded ambient sounds in nature that also show the same profile). I know it's not just wind noise (except perhaps the very bottom end) because I see it in studio recordings as well.

Interestingly, in the two charts on top – which are signals I recorded myself with calibrated flat-response signal analysis microphones (in a gain range with minimal distortion) and no further filtering applied — the falloff seems linear in the $\log_{10}$ amplitude and $\log_2$ frequency space, but I don't know if this is a hint to what's going on or not.

Note: The mics on the recorded audio are electret mics with fairly flat response up to about 24kHz, with blips in the response compensated for by a post-recording filter, available from the manufacturer, specific to each mic's serial #. They're designed for signal analysis rather than general recording. I've got the response graphs laying around somewhere I'll scan them if I find them. But it seems like it doesn't affect the answers. In my experience they pick up highs (bird calls, mechanical squeaks, electronic coil ring) with good accuracy.

A. P. · Accepted Answer · 2023-02-19T09:55:58.270

Additionally to the answer by Bulbasaur, it is important to highlight that you look at the amplitude $A$ of frequency components, not their power $P$. The relation between them is^[1] $$P(\omega) = \frac{1}{2} \mu v \omega^2 A^2,$$ where $\mu$ is the mass density of the medium (e.g. air), $v$ is the speed of sound in that medium and $\omega$ is the angular frequency ($2\pi$ times the frequency) of the wave. Assuming you had a source which emits sound at all frequencies with the same power $P(\omega) = P$, the amplitude would drop as $A \sim \frac{1}{\omega}$. One can see this by solving the above equation for $A$: $$A = \frac{1}{\omega} \sqrt{\frac{2 P}{\mu v}}$$ Note that in this 1D example the mass density has units of $[\mu] = \frac{\text{kg}}{\text{m}}$. Together with the units of the other quantities $[v] = \frac{\text{m}}{\text{s}}$, $[\omega] = \frac{1}{\text{s}}$ and $[A] = \text{m}$, the power has units of $[P] = \frac{\text{kg}}{\text{m}} \frac{\text{m}}{\text{s}} \frac{\text{m}^2}{\text{s}^2} = \left( \text{kg} \frac{\text{m}^2}{\text{s}^2} \right) / \text{s} = \frac{\text{J}}{\text{s}}$.

In 3D, the mass density would have units of $[\rho] = \frac{\text{kg}}{\text{m}^3}$ and the above equation calculates the sound intensity $I(\omega)$, i.e. power per area $[I] = \frac{\text{J}}{\text{s} \cdot \text{m}^2}$.

score 4 · Answer 2 · answered Feb 17 '23 at 10:05

I think the answer to the question is made up of several points:

High frequency sound waves experience more attenuation during their propagation (generally speaking).
Typical consumer microphones often pick up higher frequency sound waves with a lower amplitude compared to low frequency sound waves due to their frequency response function.
"Naturally" occurring sound sources such as human speech, dog barking, etc. often have a spectrum whose amplitude decreases with increasing frequency.

However, there are also many sound sources in nature, such as noises produced by insects, that have a very low amplitude in the low-frequency range that you probably have not yet looked at.

score 3 · Answer 3 · answered Feb 17 '23 at 16:17

As Bulbasaur points out, there are many different levels at which one could think about this question. For example, audio samples that were deliberately produced for humans to listen to will naturally fall off quickly at frequencies above the range of human hearing, since there's no need to produce any sound at those frequencies. Moreover, the microphones that are recording the data are probably calibrated to have the highest sensitivity within the human auditory range. And atmospheric properties limit the propagation through air of sound with very high frequency.

But I think that the most fundamental answer to your questions comes from Plancherel's theorem. The instantaneous power of an audio signal $A$ is proportional to the square of its amplitude. On physical grounds, it's reasonable that the total energy contained in the signal $$E \propto \int_{-\infty}^\infty |A(t)|^2\, dt$$ must be finite. But Plancherel's theorem gives that $E$ is also proportional to the frequency-space expression $\int_{-\infty}^\infty |A(f)|^2\, df$. In order for this integral to converge, $A(f)$ must fall off to zero as $f \to \infty$ - and in fact, it must fall off faster than $f^{-\frac{1}{2}}$.

Why does audio signal amplitude always fall off at higher frequencies?

3 Answers3

Linked