Music Theory
Introduction
Humans perceive vibrations against their eardrums as sound. Most commonly, air is the vibrating medium that comes into contact with the eardrums. The rate, also known as the frequency, of the vibration determines the pitch of the sound; faster vibrations have higher pitch. Frequency is measured in units of hertz (Hz), meaning oscillations per second. The human range of hearing is from around 20 Hz (very low pitch, e.g., distant thunder) to 20,000 Hz (very high pitch, e.g., mosquito buzz). Human speech ranges from around 300 to 3,400 Hz.
Vibrating Strings
Many instruments (e.g., piano, violin, guitar) produce sound by vibrating strings. Let us calculate the sound frequencies that a vibrating string produces.
Imagine a string drawn taut horizontally between two fixed points. Let \(L\) be the length of the string. Let \(T\) be the tension in the string, i.e., the force pulling on the string in both directions. Let \(\mu\) be the mass density of the string. Let \(y(x, t)\) be the vertical position of the string at position \(x\) and time \(t\).
Consider a small section of the string at \(x\) with length \(dx\). The vertical force applied to the section from the left is equal to \(-T\sin(\theta)\), where \(\theta\) is the slope of the string at \(x\). Assuming the vertical displacement of the string is small relative to \(dx\), then \(\sin(\theta) \approx \frac{\partial y(x, t)}{\partial x}\). A similar argument applies to the force from the right, so the total vertical force on the section is \(T(\frac{\partial y(x + dx, t)}{\partial x} - \frac{\partial y(x, t)}{\partial x})\). We apply Newton's second law \(F = ma\), substituting \(m = \mu dx\) and \(a = \frac{\partial^2 y}{\partial t^2}\), to arrive at the equation: \[ \mu dx \frac{\partial^2 y}{\partial t^2} = T(\frac{\partial y(x + dx, t)}{\partial x} - \frac{\partial y(x, t)}{\partial x}) \] Observe that for small \(dx\), \(\frac{\partial y(x + dx, t)}{\partial x} - \frac{\partial y(x,t)}{\partial x} \approx \frac{\partial^2 y(x, t)}{\partial x^2} dx\). So the above equation further simplifies to: \[ \frac{\partial^2 y}{\partial t^2} = \frac{T}{\mu}\frac{\partial^2 y}{\partial x^2} \] This is a second-order PDE, with boundary conditions \(y(0, t) = y(L, t) = 0\). The solution is of the form \( y = \sin(\frac{n\pi}{L}x)\cos({\omega_n t})\), where \(\omega_n = \frac{n\pi}{L}\sqrt{\frac{T}{\mu}}\) and \(n\) is a positive integer. The sine term describes the shape of the vibration, and the cosine term describes the speed of the vibration, so the frequency of the oscillations is: \[f_n = \frac{\omega_n}{2\pi} = \frac{n}{2L}\sqrt{\frac{T}{\mu}}\] The above equation dictates the "natural" frequencies of the string, when you apply some initial displacement (e.g., by plucking it) but then do not apply external force afterwards. This equation implies that the frequency increases with tension and decreases with length and density. This makes sense intuitively: if you pull harder on a string, it will make a higher pitched sound. If you use a longer or heavier string, it will make a deeper sound.
Note that there is a term \(n\) in the frequency equation. A vibrating string actually produces infinitely many frequencies, corresponding to all integer multiples of \(\frac{1}{2L}\sqrt{\frac{T}{\mu}}\). That frequency is called the fundamental frequency and determines the perceived pitch. The frequencies corresponding to \(n \ge 2\) are called the harmonic frequencies. In practice, only the lower harmonic frequencies are audible.
Here is a widget to show what the vibrations look like as functions of \(n\) and \(\sqrt{\frac{T}{\mu}}\) (the amplitudes are very exaggerated):
As a side note, a vibrating string does not move much air on its own, so instruments have sound boards that vibrate along with the string to move more air and produce a louder sound, e.g., the top wooden plate of a violin.
Octaves
Two strings with fundamental frequencies in a ratio of 2:1, e.g., \(f\) and \(2f\), will sound very good together because there is a lot of overlap in their harmonic frequencies, i.e., every multiple of \(2f\) is also a multiple of \(f\). Humans perceive frequencies in a ratio of 2:1 as the same note class, and this is called an octave. Here is a widget that plays two frequencies an octave apart:
Perfect Fifths
Similarly, two strings of fundamental frequencies in a ratio of 3:2 will also sound very good together because there is a lot of overlap in their harmonic frequencies (though not as much as in an octave). This interval is called a perfect fifth.
Chromatic Scale
What happens if you start with some arbitrary frequency \(f\) and generate notes that sound good using the 3:2 ratio? You'll get \(f \rightarrow \frac{3}{2}f \rightarrow (\frac{3}{2})^2 f \rightarrow \dots\); eventually, you'll get to \((\frac{3}{2})^{12}f\), which is approximately equal to \(2^7 f\), since \((\frac{3}{2})^{12} \approx 129.7\) and \(2^7 = 128\). Based on our discussion about octaves, two frequencies whose ratio is a power of two will be perceived as the same note class. So, we've generated twelve distinct notes before arriving back at the same note class that we started with. Most Western music uses this twelve-note system, called the chromatic scale. One common naming of the notes is A, A#, B, C, C#, D, D#, E, F, F#, G, G#. On a piano, the natural notes A, B, C, D, E, F, G are the white keys, while the black keys are typically named with sharps or flats depending on context. By convention, A4, the A above middle C, is tuned to 440 Hz.
Because \((\frac{3}{2})^{12}\) does not perfectly equal \(2^7\), it is not possible to perfectly tune an instrument so that going up twelve notes always multiplies the frequency by 2 and going up by seven notes always multiplies the frequency by 1.5. Most of the time, instruments like pianos are tuned using equal temperament. This means that the ratio of frequencies of consecutive notes is constant, and specifically equal to \(2^\frac{1}{12}\) to guarantee that all octaves have a perfect ratio of 2:1. Therefore, perfect fifths will have a ratio of \(2^\frac{7}{12} \approx 1.4983\), which is close enough to 1.5 that it's difficult to tell the difference. In contrast, just intonation means allowing the ratio of frequencies of consecutive notes to be unequal, so that you can play pure ratios like 3:2 and 5:4.
Note that instruments like violins do not have discrete pitches like a piano. Instead, they have continuous pitch, since you can move your finger continuously along the string to change the length (and thus frequency).
Pitch Shifting
In the above, the specific starting frequency \(f\) is arbitrary. In fact, the way music sounds to us mostly depends on the relative pitch between notes, not the absolute pitch. So if you multiply the frequencies of all the notes in a song by the same factor, it will still sound like the same song. Here is a widget that plays the "Happy Birthday" song relative to the specified frequency of the first note.
So, why does it matter what specific notes are in a song, if only the relative pitch matters? It is mostly for practical reasons. The song might be easier or harder to play on an instrument depending on the specific notes. If the song has a vocal component, the singer's vocal range also influences what specific notes the song should use.