As I’ve stated a couple of times through this series, my reason for writing this stuff was not to prove that high res audio is better or worse than normal res audio (whatever that is…). My reason was to highlight some of the advantages and disadvantages associated with LPCM audio at different bit depths and sampling rates. Just as a bullet-point summary of things-to-remember/consider (with some loose grouping):
“High resolution audio” could mean
“more than 16 bits per sample” or
“a sampling rate higher than 44.1 kHz” or
These two dimensions of the specifications have different implications on the signal
Just because you have more bits per sample doesn’t mean that you are actually getting more resolution. There are examples out there where a “24-bit recording” is just a 16-bit recording with 8 zeros stuck on the end.
Just because you have a higher sampling rate doesn’t mean that you are actually getting a recording that was done at that sampling rate. There are examples out there where, if you do a spectral analysis of a “high-res” recording, you’ll see the cutoff filter of the original 44.1 kHz recording.
Just because you have a recording done at a higher sampling rate doesn’t mean that the extra information you get is actually useful.
If you are a lazy DSP engineer who thinks that filters give you the expected magnitude response, no matter what the centre frequency, you’d better have a higher sampling rate. (Or you could just stop being lazy and compensate.)
If you have a volume control after the conversion to analogue, then 93 dB of dynamic range (16 bits, TPDF dithered) might be enough – especially if you listen to music with a limited dynamic range. However, if your volume control is in the digital domain, and you have a speaker that can play loudly, then you’ll probably want more dynamic range, and therefore more bits per sample hitting the DAC.
Like I said, I’m not here to tell you that one thing is better or worse than another thing.
As I said, my intention in writing all of this is to help you to never fall into the trap of assuming that “high resolution audio” is better than “normal resolution audio” in all respects.
More is not necessarily better, sometimes, it’s not even more. Don’t fall victim to misleading advertising.
In the previous posting, we left off with this drawing of a biquad filter:
This is not the normal way to draw the signal flow inside a biquad, since it has a little too much information. Normally you see something like this:
In the versions I show above, the feed-forward half of the biquad comes first, and its output feeds the start of the feedback portion. It is also possible to reverse these, putting the feedback portion first, like this:
In theory, these different implementations will all result in the same output if you match the gain values. However, in practice, they are not the same, and this difference is where we need to look for this part of the discussion on high res audio.
Let’s say I want to make a simple filter that reduces bass in a fairly narrow frequency band. I can use a biquad to do this. For example, if I want a peaking filter that reduces 20 Hz by 12 dB, with a Q of 1, then I get a magnitude response that looks like this:
If I wanted to build this filter using a biquad in a system with a sampling rate of 48 kHz, it would have the following gain coefficients:
We’ll also say that my biquad is implemented like the one shown in Figure 1, above… let’s take a look at that signal flow again:
I’ve highlighted a point inside the biquad using a red arrow. Let’s talk about the signal right there, in the middle of the processing…
In the last post, we talked about how, when the signal frequency is very low, a single sample delay has almost the same value at its output as its input, because the phase difference is so small for such a small time. So, let’s start with the (incorrect) assumption that, for those two feed-forward delays at the beginning, their outputs ARE equal to their inputs (because we’re starting with a low frequency). What happens when the input has a value of 1? Then the value at the red arrow is just the sum of the feed forward gains (because I multiplied each of them by 1 and added them together…)
In the case of the filter I described above, this value will be 0.000006836, which is a very small number. Also, if the value coming into the input of the biquad is less than 1, the value at the red arrow will be even smaller! This means that, if you come into the biquad with a low-frequency tone with a level of 0 dB FS, the level at that red arrow will be about -103 dB FS, which is very quiet. The feed-back portion of the biquad, after the red arrow, then has a lot of gain in it to bring the signal level back up towards 0 dB FS again.
So, the issue that we have here is that the FF (Feed Forward) portion of the biquad drops the level A LOT. And the FB portion increases the level A LOT, just to do something like a little 12 dB dip at 20 Hz.
The magnitude of the gains downwards and upwards in those two portions of the biquad are dependent on the parameters of the filter that we’re trying to make, however, we can generalise a little and say that:
the lower the frequency OR
the higher the Q,
then the bigger the gain down and up.
In other words, if you have a really low frequency dip, with a really high Q, then the level of the signal at that red arrow will be really low. REALLY low.
How low can you go?
How low is “REALLY low”? let’s see:
Take a look at Figure 7, which shows some values for one example filter (peaking, Gain = -12 dB, variable Q and Fc, and the test frequency = Fc). Notice that when the Fc is 10 kHz, even at earn Q=32, the signal level at the middle of the biquad is about -38 dB FS or so. However, when the Fc is 20 Hz, it’s -140 dB FS… This is very low.
Now let’s try again at a higher sampling rate: 192 kHz.
Notice that when we do exactly the same thing running at 192 kHz, the signal levels inside the biquad get much lower. Now for a 20 Hz signal and a Q of 32, the level is around -163 dB FS – a drop of more than 20 dB for 4x the sampling rate.
Why does this happen? It’s because the filter doesn’t “know” that the signal is at 20 Hz. It only knows the relationship between the frequency and the sampling rate. So, in its little world, 20 Hz doesn’t exist. In a system running at 48 kHz, what exists is 20 / 48000 = 0.0004167. This is called the “normalised frequency” where the sampling rate is 1, DC is 0, and everything else is in between. (Note that some textbooks and software say that Nyquist = 1 instead of the sampling rate – but you just need to know what the convention is for the thing you’re reading…) This means that if the sampling rate goes up to 192 kHz, then the normalised frequency for 20 Hz is 20 / 192000 = 0.0001042 (1/4 of the value because the sampling rate was multiplied by 4).
This is important. If you want to make a low-frequency, high-Q peaking filter in a digital system with a cut of 12 dB, you are forcing the signal to a very low level inside your filter, and then bringing it back up to a normal level again on the way out. If your processing is running with a limited resolution, (e.g. 16-bits, for example) then the signal level can approach or even go below the resolution of your system inside the biquad. This means that, when the signal’s level is raised again on the way out, it’s full of quantisation distortion, and you can’t get rid of it… This is bad.
There are different ways to solve this problem.
Increase the resolution of your processing internally. For example, even though your input and output might only be running at 16-bits or 24-bits, maybe you need more resolution inside to make the results of the math better – or at least below the limitations of the input and output.
Change the way the biquad is implemented. For example, if you use the implementation shown in Figure 4 (with the feedback before the feed-forward) instead of the one we used, then you don’t drop the signal level and raise it again, you do the opposite. This avoids your quantisation error problem. However, depending on the system, it might overload and clip the signal inside the biquad instead, so then you just end up with a different kind of distortion instead.
Reduce your sampling rate to make it closer to your filter’s frequency. The problem I showed above is that the centre frequency of the filter is too far away from the sampling rate. If the sampling rate were lower, then this automatically makes the filter’s centre frequency “higher” in a normalised frequency scale, thus reducing the problem.
Other, even more clever solutions that I won’t talk about because they’re not as simple.
This means (for example) that if you’re building a subwoofer with digital filtering, and you know for sure that NOTHING will come out of it above, say 1 kHz (just to pick a random number that’s far enough away from the typical 120 Hz that people normally use…) then it would be dumb to do the filtering at 192 kHz. It’s smarter to run its internal sampling rate at 2 kHz (because we only need to go up to 1 kHz; and we’re not considering anything other issues or artefacts in this posting.)
For this discussion, I used the specific example of a peaking filter with a gain of -12 dB, and I was varying the Q and the Fc. I was also measuring the level of the signal using a sine wave with a frequency that was the same as Fc in each case. However, the general lesson here about low frequency and high-Q filtering holds for other filter types and implementations as well.
I spent some time this week helping to track down the source of an error in a digital audio signal flow chain, and we wound up having a discussion that I thought might be worth repeating here.
Let’s start at the very beginning.
Let’s take an analogue audio signal and convert it to a Linear Pulse Code Modulation (LPCM) representation in the dumbest possible way.
In order to save this signal as a string of numerical values, we have to first accept the fact that we don’t have an infinite number of numbers to use. So, we have to round off the signal to the nearest usable value or “quantisation value”. This process of rounding the value is called “quantisation”.
Let’s say for now that our available quantisation values are the ones shown on the grid. If we then take our original sine wave and round it to those values, we get the result shown below.
Of course, I’m leaving out a lot of important details here like anti-aliasing filtering and dither (I said that we were going to be dumb…) but those things don’t matter for this discussion.
So far so good. However, we have to be a bit more specific: an LPCM system encodes the values using binary representations of the values. So, a quantisation value of “0.25”, as shown above isn’t helpful. So, let’s make a “baby” LPCM system with only 3 bits (meaning that we have three Binary digITs available to represent our values).
To start, let’s count using a 3-bit system:
0 x 4 +
0 x 2 +
0 x 1
0 x 4 +
0 x 2 +
1 x 1
0 x 4 +
1 x 2 +
0 x 1
0 x 4 +
1 x 2 +
1 x 1
1 x 4 +
0 x 2 +
0 x 1
1 x 4 +
0 x 2 +
1 x 1
1 x 4 +
1 x 2 +
0 x 1
1 x 4 +
1 x 2 +
1 x 1
Table 1: The 8 numbers that can be represented using a 3-bit binary representation
and that’s as far as we can go before needing 4 bits. However, for now, that’s enough.
Take a look at our signal. It ranges from -1 to 1 and 0 is in the middle. So, if we say that the “0” in our original signal is encoded as “000” in our 3-bit system, then we just count upwards from there as follows:
Now what? Well, let’s look at this a little differently. If we were to divide a circle into the same number of quantisation values, make the “12:00” position = 000, and count clockwise, it would look like this:
The question now is “how do we number the negative values?” but the answer is already in the circle shown above… If I make it a little more obvious, then the answer is shown below.
If we use the convention shown above, and represent that on the graph of our audio signal, then it looks like this:
One nice thing about this way of doing things is that you just need to look at the first digit in the binary word to know whether the value is positive or negative. A 0 means it’s positive, and a 1 means it’s negative.
However, there are two issues here that we need to sort out… The first is that, since we have an even number of values, but an odd number of quantisation steps (4 above zero, 4 below zero, and zero = 9 steps) then we had to do something asymmetrical. As you can see in the plot above, there are no numbers assigned to the top quantisation value, which actually means that it doesn’t exist.
So, if we’re still being dumb, then the result of our quantisation will either look like this:
But what happens when you make two mistakes simultaneously? Let’s go back and look at an earlier plot.
Let’s say that you’re writing some DSP code, and you forget about the asymmetry problem, so you scale things so they’ll TRY to look like the plot above.
However, as we already know, that top quantisation value doesn’t exist – but the code will try to put something there. If you’ve forgotten about this, then the system will THINK that you want this:
As you can see there, your code (because you’ve forgotten to write an IF-THEN statement) will think that the top-most positive quantisation value is just the number after 011, which is 100. However, that value means something totally different… So, the result coming out will ACTUALLY look like this:
As you can see there, the signal is very different from what we think it should be.
This error is called a “wrapping” error, because the signal is “wrapped” too far around the circle shown in Figure 5, shown above. It sounds very bad – much worse than “normal” clipping (as shown in Figure 7) because of that huge nearly-instantaneous transition from maximum positive to maximum negative and back.
Of course, the wrapping can also happen in the opposite direction; a negatively-clipped signal can wrap around and show up at the top of the positive values. The reason is the same because the values are trying to go around the same circle.
As I said: this is actually the result of two problems that both have to occur in the same system:
The signal has to be trying to get to a level that is beyond the limits of the quantisation values
Someone forgot to write a line of code that makes sure that, when that happens, the signal is “just” clipped and not wrapped.
So, if the second of these issues is sitting there, unresolved, but the signal never exceeds the limits, then you’ll never have a problem. However, I will never need the airbags in my car, unless I have an accident. So, it’s best to remember to look after that second issue… just in case.
This method of encoding the quantisation values is called the “Two’s Complement” method. If you want to know more about it, read this.
There was an article in the BBC News webpage this week, telling the story of how Leon Theremin (inventor of one of the first electronic musical instruments) invented the technology underneath what we now call RFID…
Of course, this means that every time I swipe my card to buy something at the store, I’m going to start humming the hook from”Good Vibrations”… Maybe knowledge is not always a good thing… Or maybe I should get out my Clara Rockmore album and have another listen.
This “series” of postings was intended to describe some of the errors that I commonly see when I measure and evaluate digital audio systems. All of the examples I’ve shown are taken from measurements of commercially-available hardware and software – they’re not “beta” versions that are in development.
There are some reasons why I wrote this series that I’d like to make reasonably explicit.
Many of the errors that I’ve described here are significant – but will, in some cases, not be detected by “typical” audio measurements such as frequency response or SNR measurements.
For example, the small clicks caused by skip/insert artefacts will not show up in a SNR or a THD+N measurement due to the fact that the artefacts are so small with respect to the signal. This does not mean that they are not audible. Play a midrange sine tone (say, in the 2 -3 kHz region… nothing too annoying) and listen for clicks.
As another example, the drifting time clock problems described here are not evident as jitter or sampling rate errors at the digital output of the device. These are caused by a clocking problems inside the signal path. So, a simple measurement of the digital output carrier will not, in any way, reveal the significance of the problem inside the system.
Aliasing artefacts (described here) may not show up in a THD measurement (since aliasing artefacts are not Harmonic). They will show up as part of the Noise in a THD+N measurement, but they certainly do not sound like noise, since they are weirdly correlated with the signal. Therefore you cannot sweep them under the rug as “noise”…
Some of the problems with some systems only exist with some combinations of file format / sampling rate / bit depth, as I showed here. So, for example, if you read a test of a streaming system that says “I checked the device/system using a 44.1 kHz, 16-bit WAV file, and found that its output is bit-perfect” Then this is probably true. However, there is no guarantee whatsoever that this “bit-perfect-ness” will hold for all other sampling rates, bit depths, and file formats.
Sometimes, if you test a system, it will behave for a while, and then not behave. As we saw in Figure 10 of this posting, the first skip-insert error happened exactly 10 seconds after the file started playing. So, if you do a quick sweep that only lasts for 9.5 seconds you’ll think that this system is “bit-perfect” – which is true most of the time – but not all of the time…
Unfortunately, the only thing that I have concluded after having done lots of measurements of lots of systems is that, unless you do a full set of measurements on a given system, you don’t really know how it behaves. And, it might not behave the same tomorrow because something in the chain might have had a software update overnight.
However, there are two more thing that I’d like to point out (which I’ve already mentioned in one of the postings).
Firstly, just because a system has a digital input (or source, say, a file) and a digital output does not guarantee that it’s perfect. These days the weakest links in a digital audio signal path are typically in the signal processing software or the clocking of the devices in the audio chain.
Secondly, if you do have a digital audio system or device, and something sounds weird, there’s probably no need to look for the most complicated solution to the problem. Typically, the problem is in a poor implementation of an algorithm somewhere in the system. In other words, there’s no point in arguing over whether your DAC has a 120 dB or a 123 dB SNR if you have a sampling rate converter upstream that is generating aliasing at -60 dB… Don’t spend money “upgrading” your mains cables if your real problem is that audio samples are being left out every half second because your source and your receiver can’t agree on how fast their clocks should run.
So, the bad news is that trying to keep track of all of this is complicated at best. More likely impossible.
On the other hand, if you do have a system that you’re happy with, it’s best to not read anything I wrote and just keep listening to your music…