Skeptical Audio

The Most Damaging Image In Audio

One image has caused more misunderstandings about audio than any other in history.

This is the image:

You've seen it, or a similar one, many many times. Even most people who don't know anything about audio will find this kind of image familiar, and even engineers who should really know better still use it as an illustrative example.

It's a representation of a sine wave as it changes over time. This is generally used to represent audio via pulse code modulation (PCM). PCM is the most basic format used everyday in audio -- this is what software pumps into your soundcard, which then turns it into electrical signals which then go through an amp and then speakers/headphones.

The left-right axis represents time (the sampling rate is how dense the points are on this axis). Each point in time is called a sample, and each sample has a level. This level is represented by the up-down axis in the graph. It's limited by the bit depth, which determines how many different levels up/down it can go. In this case, 4 bits = 2 ^ 4 = 16 levels. This is called quantization.

Now, by the time this audio gets to your speakers, the up-down axis is turned into in-and-out movements of the speaker cone. As the level in the graph goes up, the speaker cone goes out, and vice versa as it goes down. If a layperson looked at this graph, they would think that the level generated by the soundcard moves up and down in discrete steps. You would start at 0.1, and after a certain point, it would jump up to 0.2. So you would think that increasing the bit depth would always make the resolution better which makes it sound better. That makes sense.

It's also completely wrong.

Say it with me now: each sample is an infinitely short point in time.

Of course, your soundcard output is in the analog domain, where there is no quantization and no steps between levels. So it could do anything it wants in between those points, right?

This is where the Shannon-Nyquist Limit comes in. Put simply, it means that below a certain frequency, there is only one combination of sinewaves that results in a wave that passes through a given set of points.

This limit is always half the sampling rate. So for a 44100Hz sample rate, there is only one wave that can be generated for a given set of points that only contain frequencies below 22.05Khz (which was chosen as the upper bound of human hearing.

So what I want to get across is that increasing the sample rate or bit depth past 44.1KHz does not simply increase resolution. You can debate all day whether it makes a difference that can be audible by humans, just as long as you're aware that what you're arguing is whether you can hear frequencies above 22.05KHz and nothing else. Similarly, 16 bits can be arranged into 65535 different combinations. 1 bit in 65535 = 0.0015%. If you can reliably hear a 0.0015% difference in volume for one instrument in a song, I suggest you consult an audiologist, because you have a Guinness World Record with your name on it.