Can we quantify the pitch of a sound that is a mixture of many frequencies?

Question

How is the pitch of a sound defined quantitatively when it is a mixture of many frequencies? For example, the sound emitted by a plucked guitar string, or say, the pitch of somebody's (normal) voice. I know that female voices are generally of higher pitch. But can we quantify the notion of pitch? My guess is that the pitch of a complex sound depends on the relative amplitudes of the different frequency components.

Nuclear Hoagie · Answer 1 · 2022-06-29T15:38:00.997

Pitch can be described as a subjective perception of an auditory stimulus which cannot be objectively, unambiguously quantified. It is strongly related to the objective physical property of frequency such that higher frequencies typically correspond to higher perceived pitch, but even notes with identical frequency can be perceived as having different pitches, depending on how loud they are, what other frequencies are played at the same time or in close proximity, and other factors. A pure sine wave is usually fairly easily mapped to its perceived pitch, but more complex sounds may not be.

A mixture of many frequencies may have a dominant frequency which is perceived as the overall pitch of the sound, or the many frequencies may mix in a way that are not perceived as any particular pitch at all. The sound of a snare drum, for example, is not usually perceived as having any particular pitch, but almost everyone would agree that a snare drum has a higher pitch than a bass drum. The bass drum resonates more strongly with lower frequencies than the snare drum does, but neither sound is very well described as having a particular pitch.

You may be interested in auditory illusions like the Shepard–Risset glissando, which is perceived as having ever-ascending pitch, despite the fact that the frequencies remain within a fixed window. We can objectively quantify the underlying frequencies, but it is not always simple to map a waveform to the subjective experience of pitch. As pointed out by @march, it's even possible to perceive a pitch from a complex tone when the corresponding frequency is damped or even completely absent from the waveform. The pitch produced by a timpano, for example, is implied by the harmonics it produces, with the fundamental frequency resonating much more weakly than higher frequencies - the perceived pitch of the drum is not simply the frequency with the highest amplitude. Practical applications of this effect can be seen in the design of some sound systems that have small speakers - by using a particular combination of higher frequenies, it allows the listener to experience low pitches that the speaker is not even physically capable of producing!

Thomas Blankenhorn · Accepted Answer · 2022-07-03T17:54:13.363

Short Answer:

Use the lowest of your frequencies, also known as the fundamental, to represent the pitch.

Full Answer:

First off, it seems worth noting that many sounds that consist of many frequencies don't have a pitch at all. For example, consider the sound of a piano chord or the hiss of a radio with bad reception. In cases like these, you do have many frequencies, but you do not have a pitch. There is too much going on with the sound to represent it with a single number.

Now let's move on to sounds where you can identify a pitch, as in the examples you gave. In a plucked guitar string or a human voice, the sound would consist of a fundamental and its harmonics. The frequencies of the harmonics would be integer multiples of the fundamental's. And the amplitudes of the harmonics, relative to the fundamental's, collectively make up the timbre. They are why the same middle-C sounds different when played by a guitar than when sung by a human.

For sounds like these, the one number describing their pitch is the frequency of the fundamental. That's the number you're looking for.

A Postscript On "The Missing Fundamental"

In a comment, Michael Seifert pointed out the curious case of the missing fundamental. It is possible to synthesize a sound that contains only the overtones but omits the fundamental. (The Wikipedia page on the phenomenon also mentions some naturally-occurring sounds where the fundamental is greatly attenuated.)

When that happens, an auditory illusion in the human brain can make a human hear a fundamental even though it is not in the spectrum. Here is a YouTube video showcasing the phenomenon.

When the fundamental is missing, the perceived pitch is often, but not always, the highest common divisor of the overtones. Informally, this is what the fundamental would have been if it wasn't missing.

DKNguyen · Answer 3 · 2022-06-28T20:05:32.130

The technical, quantitative definition of pitch only applies to "single sounds" such as music notes. Here, the pitch is the fundamental frequency.

However, when applied to "multiple sounds" collectively such as a voice or all the notes an instrument is capable of, the meaning is only qualitative. In this context, the more correct term for the quantifying "pitch" would the "range" which is self-explanatory whereas "pitch" refers to a vague qualitative sense of what is dominant in the range.

If you want quantitative metrics more specific than the range then you'll probably need to make one up and just make the metric known.

For example, you could narrow down the aformentioned range down a bit to where more than 50% of the time-averaged power is contained within some standard deviation during "normal sound production" (i.e. no falsetto).

If you're after a single number then perhaps the frequency component that results in the largest time-average power during normal sound production. You may also need to scale it against the sensitivity of the human ear.

score 4 · Answer 4 · answered Jun 28 '22 at 19:39

In physics, high pitch is translated to high frequencies and low pitch is lower frequencies in sound frequency spectrum. for example in the spectrum chart below:

the signal has most of its power gathered around 300KHz. the way to achieve such spectrum for a signal, be it voice signal or not, is to use Fourier transform on the given signal. for your question, human hearing spectrum is from 20Hz-20KHz. although speaking spectrum is much lower, being for females from 165-255Hz and for males 85-155Hz for seeing music notes pitches frequencies, see: https://en.wikipedia.org/wiki/Scientific_pitch_notation this will hopefully give a hint for what you are looking for.

leftaroundabout · Answer 5 · 2022-06-29T15:20:07.420

Forget about Fourier series for a while. Pitch isn't really about frequencies of sinusoidal components. It's just the frequency of how often the signal's oscillation repeats per time unit. We have a signal $u(t)$, and we're interested in its periodicity, i.e. the constant $\tau$ such that $$u(t+\tau) = u(t)\quad \forall t.$$ The frequency, or pitch, of $u$ is then simply the reciprocal: $\nu = \tfrac1\tau$.

There are at least three problems with that:

In the real world, no signal is truely periodic. There will always be at least small pertubations: literally noise but also effects like amplitude decay over time or similar. So what we should rather look for is $u(t+\tau) = u(t)+\varepsilon\ \forall t$, for some suitably small $\varepsilon$.
$\tau$ is not unique. In particular, a signal that repeats after time $\tau$ also repeats after time $2\times\tau$ etc. The $\varepsilon$ allowance makes it even worse, since a continuous signal will always change only by a small amount in sufficiently short time.
In reality we also can't have infinitely long signals. Right at the start of a guitar note it isn't periodic at all, rather you have a transient.

But still: for signals like voice or flute or whatever, we actually do have periodicity over a substantial time (on the order of a second) with hundreds of complete oscillations that are to a good approximation the same. So pitch as repeat-frequency is a sensible notion. In practice, to determine $\tau$ one uses the autocorrelation of the signal.

Again, none of this relies on Fourier decomposition, although typical implementations of autocorrelation do use a fast Fourier transform under the hood because that's computionally more efficient than directly carrying out the integration in time-space.

For most e.g. musical instrument signals, the periodicity-frequency happens to equal the lowest strong Fourier partial, aka the fundamental, which in many cases is also the one with the strongest amplitude. But this is by no means universal: in fact it is possible to completely remove the fundamental while changing neither the autocorrelation-periodicity nor the human-perceived pitch. It only changes the timbre of the sound.

score 1 · Answer 6 · answered Jun 29 '22 at 05:29

The description of pitch consisting of multiple frequencies that are harmonics of the fundamental frequency is called timbre. It is the timbre of a note that gives the difference in tone between a guitar playing an A at 440 Hz and a trumpet playing an A at 440 Hz. Since harmonics are integer multiples of the fundamental frequency at lower amplitudes, when added to the fundamental, these harmonics change the shape of the waveform without altering the frequency of the fundamental allowing pitch to be retained. It is possible to quantify harmonic content (number and relative intensity).

In addition to harmonic content, timbre also includes the attack-decay envelope of the note, and vibrato/tremolo. So a full quantification of timbre is difficult.

Can we quantify the pitch of a sound that is a mixture of many frequencies?

6 Answers6