When working on the last series of posts, I stumbled on a signal that caused an FFT analysis to look a little strange to me. This post is about that strangeness.

If I make a sine wave that sits perfectly on an FFT bin, and I do an FFT of it, the noise floor that I see is the result a lack of precision of the calculations that were used to make the signal. An example of this is shown in Figure 1.

As can be seen there, the noise floor in the fit is typically at least 300 dB down from the signal level. This means that if the signal has a peak amplitude of 1, then each bin in my FFT has a peak amplitude of less than 0.000 000 000 000 001, which is very, very, low.

If I dither and quantise the sine wave with, say, a 24-bit LPCM precision, then the result would be different, as shown in Figure 2.

Now the noise floor seen in the FFT analysis is the noise that is intentionally generated as dither to randomise the quantisation error when converting to LPCM.

However, what happens if the signal is quantised but not dithered? Then the result looks like Figure 3.

This is interesting because, starting the first bin, every second bin has nothing in it, so on a decibel scale, the value is -∞ dB. Why does this happen?

The short answer is symmetry. By quantising the sine wave, I made it perfectly symmetrical.

This removes the DC content, since the positive-going portion of the waveform is identical to the negative-going portion. Therefore there’s nothing at 0 Hz (which is DC) or any of its “harmonics” (at least in the world of FFT bins…).

There is content of some kind in the other bins because our sine wave is not perfectly sinusoidal. All those steps that I put in it are an error that generates information at frequency centres other than the sine tone’s itself.

So, if you do an FFT on a sinusoidal signal and you see a result where half of the bins have nothing in them, one possible reason is that you’re dealing with a perfectly symmetrical signal.

I mentioned in this posting that lately I’ve been doing some measurements of a DUT that:

required a frequency analysis with a very big dynamic range

… which meant that I was testing it using a sine tone with a frequency that was exactly the same as an FFT bin’s frequency centre

… and the sine tone had to be sent through the device by playing a standard audio file (wav and/or FLAC)

So, I did this, but I saw some weirdness that I didn’t expect down in the noise floor of the FFT output. Whenever I’m testing something and I see something weird, I start working my way back through the audio chain to verify that the weirdness is coming from the thing that I’m testing, and not from my test system itself.

So, the first step was to do of an FFT of both the .wav and the .flac files that I was sending through the DUT. The results of this test looked something like Figure 1.

Before I go further, let’s clarify exactly what I did to generate those three plots.

Using Matlab, I made a sine wave with a frequency identical to an FFT bin that was a close as possible to 997 Hz as I could get with a 65,536-point FFT at 48 kHz. (See this posting for more information about this.)

I exported the signal using Matlab’s “audiowrite” function, both as a 24-bit wav and a 24-bit FLAC.

I imported the two files back into Matlab

I ran an FFT on the original, and the two imported files

I would not expect the bottom two plots to be as “good” as the top plot, since they’ve been reduced to a 24-bit fixed point version of the original floating-point signal. However, there are two things to notice in Figure 1.

The most important thing is that the FLAC and WAVE imports produce different results. This is weird.

The less-important (but more interesting, later…) thing is that, for the FLAC import, every odd-numbered FFT bin is -∞ dB, which means that there is absolutely NO energy at those frequencies.

First things first

Let’s address that first issue first. The FFTs show us that the signals coming back from the .wav and .flac import are different. But I’m interested in (1) how they’re different and (2) why they’re different.

Let’s try to answer the first question first. I made a linear ramp that had the same number of samples as the number of quantisation values and had a range of -1 to 1 (just like my sine wave…). So, to test a 16-bit export, I made a ramp that was 2^{16} = 65,536 samples long (shown in the top plot in Figure 2). To test a 24-bit export, the ramp was 2^{24} samples long.

In theory, if I export this ramp to a file type with the matching number of bits, then each sample should quantise to the next quantisation level from the bottom to the top. I then exported this ramp out to .wav and .flac, imported it again, and looked at the result, which is shown in Figure 2.

If I subtract the results of the imported files from the original, I get the result shown in the middle plot in Figure 2. I would NOT expect either the .wav or the .flac to be identical to the original, since information is lost in the export to a 16- or 24-bit fixed point LPCM format. However, I WOULD expect the .wav and .flac to be the same, which they obviously aren’t.

As can be seen in the bottom plot in Figure 2, there is a 1-quantisation level difference between the .wav and .flac files for signal values higher than 0.

Now the question is whether this difference is inherent in the file format, or if something else is going on. To test this, I did the same test on the 997-ish Hz sine wave (again) without dither, but with my own quantisation (using the code shown in this post). The result of this test is shown in Figure 3.

As you can see there, the imported .wav and .flac files behave identically. But, if you look carefully and compare to the .flac version in Figure 1, you’ll see that they’re different from THAT version.

The fact that the red and blue plots in Figure 3 are identical tell me that .wav and .flac are identical.

The fact that my quantisation results in identical results in .wav and .flac, but are different from Matlab’s “audiowrite” results (which produces .wav and .flac files that are different from each other) tells me that Matlab’s quantisation is different for .wav and .flac – and different from what I’m doing.

So, I go back to the ramp shown in Figure 2 and dig into the details again, zooming in on the samples near a value of -1, 0, and 1. These are shown below in Figure 4.

It’s a bit cryptic to see the results in Figure 4, but let’s walk through it.

The top plot shows the ramp signal that I encoded as a 16-bit audio file in 4 different ways: as a .wav and a .flac, using audiowrite’s quantiser and mine.

The second plot shows the differences in the imported files relative to the original for the first 20 samples, which correspond to the bottom 20 quantisation levels. As can be seen there, the audiowrite quantiser’s result appears to be identical to the original (they’re not, as we saw in the middle plot of Figure 1, but they’re close…). My quantiser is one level higher. This is because I’m scaling my original signal so that it can’t reach the bottom, as I talked about in Part 2.

The third plot shows the behaviour of the three quantisers (2 audiowrites and mine) around the 0 value ±10 quantisation levels. Note that there’s no sample with a value of 0 (Because two’s complement is not symmetrical around the 0 value.). It’s not immediately obvious there, but all three quantisers have an “error” of 1/2 a quantisation level step around 0.

Below 0, both of audiowrite’s quantisers have a negative offset, and mine has a positive offset.

Above 0, audiowrite’s .flac quantiser has a positive offset whereas both audiowrite’s .wav quantiser and mine have a negative offset

If the signal were a sine wave, then we’d see the same thing, it would just be harder to interpret, as shown in Figure 5. (There’s nothing useful shown in the third plot there because when you zoom in so closely , the slope of the sine wave as it passes 0 is really steep…)

I titled this series of posts “Excruciating minutiae” for a reason. The “error” (let’s call it a “difference” instead) is VERY small. It’s a difference of 1 quantisation level on a portion of the signal, which raises the very pragmatic question: “So what?”

Unless you’re REALLY digging into the bottom of the noise floor of a device, you probably never need to care about this. (In fact, even if you ARE digging into the bottom of the noise floor, you might not need to care.)

You CERTAINLY don’t have to worry about it if you’re just writing audio files to listen to, since you should be dithering those with TPDF dither, which will create a noise floor that is FAR above the “errors” caused by the differences I described above. This can be seen in Figures 6 and 7 below.

In other words, I’ve been using Matlab to export test files both in .wav and .flac for at least 20 years, and it’s only now that I’ve noticed this issue, which is another way of saying “don’t worry about it…”

Nota Bene

If you’re still awake, you might notice that there is one loose end… At the top of this posting I said

The less-important (but more interesting, later…) thing is that, for the FLAC import, every odd-numbered FFT bin is -∞ dB, which means that there is absolutely NO energy at those frequencies.

That will be the topic of another posting, since it’s more or less unrelated to this one – it was just an artefact of the test I described above.

In Part 2 of this series, I wrote the following sentence:

The easiest (and possibly best) way to do this is to create white noise with a triangular probability distribution function and a peak-to-peak amplitude of ± 1 quantisation level.

That’s a very busy sentence, so let’s unpack it a little.

Rolling the dice

If you roll one die, you have an equal probability of rolling any number between 1 and 6 (inclusive). Let’s roll one die 100 times counting the number of times we get a 1, or a 2, or a 3, and so on up to 6.

Number rolled

Number of times the number was rolled

Percentage of times the number was rolled

1

17

17

2

14

14

3

15

15

4

15

15

5

21

21

6

18

18

(Note that the percentage of times each number was rolled is the same as the number of times each number was rolled only because I rolled the die 100 times.)

If I plot those results, it looks like Figure 1.

It may be weird, but I’ve plotted the number of times I rolled -5 or 13 (for example). These are 0 times because it’s impossible to get those numbers by rolling one die. But the reason I put those results in there will make more sense later.

Let’s keep rolling the die. If I do it 1,000,000 times instead of 100, I get these results:

ed

Number of times the number was rolled

Percentage of times the number was rolled

1

166225

16.6225

2

166400

16.6400

3

166930

16.6930

4

167055

16.7055

5

166501

16.6501

6

166889

16.6889

Now, since I rolled many, many, more times, it’s more obvious that the six results have an equal probability. The more I roll the die, the more those numbers get closer and closer to each other.

Take a look at the shape of the plot above. The area under the line from 1 to 6 (inclusive) is almost a rectangle because the six numbers are all almost the same.

The shape of that plot shows us the probability of rolling the six numbers on the die, so we call it a probability density function or PDF. In this case, we see a rectangular PDF.

But what happens if we roll two dice instead? Now things get a little more complicated, since there is more than one way to get a total result, as shown in the table below.

Total

2

1+1

3

1+2

2+1

4

1+3

2+2

3+1

5

1+4

2+3

3+2

4+1

6

1+5

2+4

3+3

4+2

5+1

7

1+6

2+5

3+4

4+3

5+2

6+1

8

2+6

3+5

4+4

5+3

6+2

9

3+6

4+5

5+4

6+3

10

4+6

5+5

6+4

11

5+6

6+5

12

6+6

As can be (hopefully) seen in the table, there is only one way to roll a 2, and there’s only one way to roll a 12. But there are 6 different ways to roll a 7. Therefore, if you’re rolling two dice, it’s 6 times more likely that you’ll roll a 7 than a 12, for example.

If I were to roll two dice 1,000,000 times, I would get a PDF like the one shown in Figure 3.

I won’t explain why this would be considered to be a triangular PDF.

Whether you roll one die or two dice, the number you get is random. In other words, you can’t use the past results to predict what the next number will be. However, if you are rolling one die, and you bet that you’ll roll a 6 every time, you’ll be right about 16.7% of the time. If you’re rolling two dice and you bet that you’ll roll a 12 every time, you’ll only be right about 2.8% of the time.

Let’s take two dice of different colours, say, one red die and one blue die. We’ll roll both dice again, but instead of adding the two values, we’ll subtract the blue value from the red one. If we do this 1,000,000 times, we’ll get something like the results shown below in Figure 4.

Notice that the probability density function keeps the same shape, it’s just moved down to a range of ±5 instead of 2 to 12.

Generating noise

In audio, noise is a sound that is completely random. In other words, just like the example with the dice, in a digital audio signal, you can’t predict what the next sample value will be based on the past sample values. However, there are many different ways of generating that random number and manipulating its characteristics.

Let’s start with a computer algorithm that can generate a random number between 0 and 1 (inclusive) with a rectangular PDF. We’ll then ask the algorithm to spit out 1,000,000 values. If the numbers really are random, and the computer has infinite precision, then we’ll probably get 1,000,000 different numbers. However, we’re not really interested in the numbers themselves – we’re interested in how they’re distributed between 0.00 and 1.00. Let’s say we divide up that range into 100 steps (or “buckets”) that are 0.01 wide and count how many of our random numbers fall into each group. So, we’ll count how many are between 0.0 and 0.01, between 0.01 and 0.02, and so on up to 0.99 to 1.00. We’ll get something like Figure 5.

I’ve only plotted the probabilities of the possible values: 0 to 1, which winds up showing only the top of the rectangle in the rectangular PDF.

If I generate 1,000,000 random numbers with that algorithm, and then subtract 1,000,000 other random numbers, one by one, and find the probabilities of the result, the answer will be familiar.

So, this is how we make the noise that’s added to the signal. If, for each sample, you generate two random numbers (making sure that your algorithm has a rectangular PDF) and subtract one from the other, you have the dither signal that will have a maximum level of ±1 quantisation level.

The signal (with a maximum range of ±1) is scaled up by multiplying it by 2^{(NumberOfBits-1)}-2

then you add the result of the dither generator

then the total is rounded to the nearest integer value

and then the result is scaled back down by a factor of 2^{(NumberOfBits-1)} to bring its back down to a range of ±1 to get it ready for exporting to a standard audio file format like .wav or .flac.

In other words, assuming that you have an audio signal called “Signal” that has a range of ±1 and consists of floating point values:

In Part 1, I talked about how an audio signal is quantised, and how the world that the quantised signal lives in is slightly asymmetrical.

Let’s stay in a 3-bit world (to keep things comprehensible on a human scale) and do some recreational quantisation. We’ll start by making a sine wave with a peak amplitude of 1. This means that the total range will be ±1.

Notice that I put two scales on the plot in Figure 1. On the left, we have the “floating point” amplitude scale. On the right, we have the 8 quantisation levels.

If we are a bit dumb, and we just quantise that sine wave directly, making sure that I’ve aligned the scaling to use ALL possible quantisation values, we get the result in Figure 2.

Notice that, because the original signal is symmetrical (with respect to positive and negative amplitudes) but the quantisation steps are not, we wind up getting a different result for the positive values than the negative values. In other words, after quantisation, I’ve clipped the positive peaks of the original signal.

Okay, so this is a dumb way to do this. A slightly less dumb way is to adjust the scaling so that the original wave does not use all possible quantisation values, as shown in Figure 3.

Notice that I’ve set the sine wave to a slightly lower level, so that it rounds to the top-most positive quantisation level, but this means that it doesn’t use the lowest negative quantisation level. If we’re being really picky, I could have made the sine wave just a little higher in amplitude: by 1/2 of a quantisation step, and the quantised result would still not have clipped asymmetrically.

Dither

As you can see in Figures 2 and 3 above, just taking a signal and quantising it generates an error. The more bits you have in the word length, the more quantisation levels you have, and the smaller the error. However, that error will always be correlated with the signal somehow, and as a result, it’s distortion, which is easy to learn to hear.

If, however, we add a little noise to the signal before we quantise it, then we can randomise the error, which changes the error from producing distortion to a constant signal-independent noise floor. Since the noise makes the quantiser appear to be indecisive, we call it dither.

The easiest (and possibly best) way to do this is to create white noise with a triangular probability distribution function and a peak-to-peak amplitude of ± 1 quantisation level. I’ll explain what that last sentence means in Part 3 of this series.

If we do this, then we

take the signal

add a little noise to it

quantise it

and the result might look like Figure 4.

It should be easy to see that we still have quantisation, and also that I’ve added some random element to the signal.

However, let’s look at the mistake I made in Figure 4. The noise that was added to the signal has an amplitude of ±1 quantisation level. So, we should see cases where the signal looks like it should be rounding to the closest level, but it might be either 1 above or 1 below. (For example, take a look at Time = 70, 71, and 72 as an example of this.)

However, take a look around Time = 20 to 30. Notice that the original signal is close to the top quantisation level. This means that, although a negative value in the dither in those samples can bring the quantisation level down, a positive value cannot bring it up because we don’t have any room for it. This will, again, result in a small amount of asymmetrical clipping. This is a VERY small amount. (Remember that, in the real world we’re probably using 2^{16} (= 65,536) or 2^{24} (= 16,777,216) quantisation values, not 2^{3} (= 8).

So, if we’re going to avoid this clipping, we need to adjust the scaling of the signal once more, as shown in Figure 5.

This shows a signal that is scaled so that, without dither, it would round to one level away from the top-most quantisation level. When you add the dither, it can go up to that top quantisation level. (In fact, I happened to use the same dither signal for Figures 4 and 5. The only difference is the scaling of the signal.)

Now, I know that if you’re not used to looking at 3-bit signals, and/or if dither is a new concept, the red signal in Figure 5 might make you a little upset. However (and you have to believe me on this…) this is the correct way to encode digital audio. Just because it looks crazy doesn’t mean that it is.

NB: The math

If you want to make the plots above, here’s a simplified version of the math to try it out. Note: I live in a world where a % symbol precedes a comment.

Some Constants

Bitdepth = 3
Fs = 100 % sampling rate in Hz
Fc = 1 % frequency of the sine wave in Hz
TimeInSamples = [0:Fs] % This will make the TimeInSamples all of the integer values from 0 to Fs (therefore, 1 second of audio)

Figure 1

Signal = sin(2 * pi * Fc/Fs * TimeInSamples)

Figure 2

ScaleUp = 2^(Bitdepth-1)
ScaleDown = 2^(Bitdepth-1)
QuantisedSignal = round(Signal * ScaleUp) / ScaleDown;
% Then apply a clipper to remove the top quantisation level.
% You can do this yourself.

This past week I found a very small oddity in the behaviour of one of the functions in Matlab. This led me down a rabbit hole that I’m still following, but the stuff I’ve learned along the way has proven to be interesting.

The summary

The short version of the story is that I made a test tone which consisted of a sine wave that had a frequency that matched an FFT bin centre so that I could test a thing. In order to get the sine wave through the thing, I had to export the audio signal as something the thing could play. So, I exported it as both a .wav and a .flac file, both with 24-bit word lengths and matching sampling rates.

Once the two signals came back from the thing, they looked different on an FFT analysis. Not very different, but different enough to raise questions. So, I ran the FFT on the .wav and .flac files that I created to do the test and found out that THEY were different, which I didn’t expect, because I know that FLAC is lossless.

The question that came up first was “why are they different?”, and that was just the entrance to the rabbit hole.

The long version

In order to explain what happened, we have to following some advice given by Carl Sagan who said

‘If you wish to make an apple pie from scratch, you must first invent the universe.’

We won’t invent the universe, but we’re going to dig down into the basics of LPCM digital audio in order to come back up to talk about where I wound up last Thursday.

Quantisation

Linear Pulse Code Modulation (LPCM) is a way of encoding signals (like an audio signal) by saving the waveform as a series of measurements of the instantaneous amplitude. However, when you do this, you can’t have a measurement with an infinite resolution, so you have to round off the value to the nearest one you can encode. This is just like measuring something with a ruler that has millimetres marked on it. You can’t really measuring something with a precision of less than the nearest millimetre, so you round off the value to something you know. Whether or not that’s good enough depends on what the measurement is for.

In LPCM digital audio, we call the steps that you can round the values to ‘quantisation levels’ because you’re dividing up the amplitude into discrete quanta. Since the values of those quantisation levels are stored or transmitted using a binary number (containing only 0s and 1s), the number of quantisation levels is a power of 2. For example, if you have a 16-bit (bit = Binary digIT) value, then you can count from

However, since audio signals go above and below 0 (we need to represent positive and negative values) we need a way to split up those options above (a range of 0 to 65,536) to do this.

Let’s take a simple example with a 3-bit long word. Since there are 3 bits, we have 2^{3} = 8 quantisation levels. It would be nice if 000 in the binary representation referred to a signal value of 0, like this:

All we need to do now is to figure out what binary values to put on the other quantisation levels. To do this, we use a system like the one shown in Figure 2.

If you start at the top, and follow the blue circular arrow going clockwise, you count from 000 ( = 0) all the way to 111 (= 7). However, if you look at the red arrows, you can see that we can assign the binary values to the positive and negative quantisation levels by looking at the circle clockwise for positive values and counter-clockwise for negative ones. This means that we wind up with the assignments shown in Figure 3.

This way of using ‘wrapping’ the values around the circle into number assignments on a one-dimensional (in this case, vertical) scale is called a ‘two’s complement’ method.

There are two nice things about this system:

the middle value of 0 is assigned an actual value of 0, which makes sense to us humans

the first bit (digit) in the binary value tells you whether the level is positive (if it’s a 0) or negative (if it’s a 1).

There is at least one slightly annoying thing about this system: it’s asymmetrical. Notice in Figure 3 that there are 3 available positive quantisation levels, but 4 negative ones. This is because we have an even number of values to use (because it’s a power of 2) but one of the values is 0, leaving an odd, and therefore asymmetrical number of remaining values for the non-0 quantisation levels.

This will come back to be a pain in the arse later…

This week, I was testing a device that required that I look WAY down into the floor caused by the noise+distortion artefacts in the presence of a signal.

One trick to do this is to play a sinusoidal wave through the system and do an FFT of the output. However, as I described in this posting a long time ago, there is an interaction between the frequency you choose and the behaviour of an FFT on a digital signal (yes… I know it’s really a DFT – but let’s not be pedantic…)

For example, if I do a 65536-point FFT on a 997 Hz sine tone in a 48 kHz sampling rate (with all the floating point precision I have available…) I get a magnitude response that looks like this:

Obviously, this is NOT the magnitude response of a sinusoidal wave. The “skirts” on either side of 997 Hz are artefacts caused by the fact that I’m using a rectangular window, and the sine wave’s last sample does not line up perfectly with its first when the FFT “wraps” it around to meet itself (read this leading up to Figure 10 for an explanation). That sharp discontinuity causes the extra energy in the other frequency bins as shown above.

If, however, I find out the frequency of the closest FFT bin, and make my sine wave THAT frequency instead, THEN I do an FFT and look at the magnitude response, it looks like Figure 2.

Notice that this is not a 997 Hz tone, but a 996.8261718750000 Hz tone instead.

Now the “noise floor” that you see there is the error in my sine wave caused by the precision of my calculator (Matlab). -300 dB is VERY low, and gives me plenty of room to see the errors in the thing that I might be testing (assuming that I can actually get that signal out to my Device Under Test or “DUT” and back in again from it).

Let’s say I were to represent the same sine wave using a 24-bit LPCM signal that has been correctly dithered with TPDF dither, and THEN I do the FFT and calculate the magnitude response. That would look like Figure 3.

Now, the energy at all the frequencies other than 996.8-ish Hz is the energy in the noise floor generated by the dither. (If you’re wondering why it’s almost 200 dB down, and not 141 dB down (6*24-3), it’s because the total energy in all those FFT bins add up to a noise floor that’s 141 dB below the sine tone.)

Okay. All of those plots show things that I’ve seen before – and are things that I would expect to see when measuring a device.

But then, this week, I did a measurement that produced the magnitude response shown in Figure 4.

This is NOT something I’ve seen before, so it raised one of my two eyebrows. In retrospect, I should have known what would cause this, but at the time, I was very confused. It’s not a noise floor because it’s too flat. It’s not distortion because it doesn’t have harmonics. So what is it?

The answer is actually really simple.

The sine tone is visible as the spike in the magnitude plot, just like in all the others.

The flat horizontal line is the result of a single-sample click that happened sometime in the 65536 samples that I used to do the FFT.

The sum (or mix) of the sine + click results in the magnitude response plot you see above. If you’re looking at the signal itself, it just means that one of the 65536 samples has an error, and isn’t sitting on the sine curve. I’ve shown an example of this in Figure 5.

The greater the error of that one sample value, the higher the floor in Figure 4.

Of course, for these plots, I simulated everything in Matlab. However, the actual result was even more interesting / confusing, since the DUT didn’t have a flat magnitude response. So, instead of a nice, horizontal line like the one I’ve shown in Figure 4, I could see something like the response of the system as well, but I’ll stay away from the details of that to keep things simple here.

Every once-in-a-while, I have to measure a system to find out whether its clock is behaving, or at the very least, whether its latency is stable over time. There are a number of different ways to do this, but I was trying to find a way that would be quick to implement and simple to analyse, if only as an initial “smoke test” to determine whether the system is working perfectly (which never happens) or which measurements we have to do next in order to figure out what exactly is going on.

Anyone who works in an engineering-type of area knows that the job doesn’t stop when you go home for the day. It percolates in the back of your head until, while you’re distracted by something else, the answer you’ve been looking for bubbles up to the frontal lobe. So, one evening I’m walking the dog in the forest, in the rain, and, like most people do, I was thinking about why we use 997 Hz sine tones to measure digital audio systems (if you don’t know the answer to this, check this posting). And that’s where it hit me. If we use a weird number to try and hit as many quantisation values as possible, what happens if we do the opposite?

Here’s a plot of a 4800 Hz sine tone, sampled at 48 kHz.

This is the way we normally plot a digital audio signal, but it’s not really fair. What I’m doing there is to connect the sample values. However, when this signal is sent out of a DAC, it will be smoothed by a reconstruction filter so those sharp corners will disappear on their way out to the real world. However, for the purposes of this posting, this doesn’t matter, since what I’m really interested in are the sample values themselves, as shown in Figure 2.

You may notice something curious about this plot. Since I’ve chosen to plot a sine wave whose frequency is exactly 1/10th of the sampling rate, then each period of the waveform is 10 samples long, and the next period is identical to the previous one. This can be shown by connecting every 10th sample as shown in Figure 3.

Again a reminder: this is the reason we use the “weird” frequency of 997 Hz to test a digital audio system running at 44.1 kHz or 48 kHz.

In this case, testing a 48 kHz system with a 4.8 kHz tone can measure 10 sample values at most. (If I had chosen to start with a different phase, it might have been fewer sample values, since I would have gotten repetitions within a period.)

If I “connect the dots” for all 10 sample values, it will look like Figure 4.

If I then do that for a much longer time window, it will look basically the same; I just won’t be able to see when the lines start and stop because we’ve zoomed out.

What will happen to this plot if the clock is drifting? For example, if you’re playing a 4.8 kHz tone through a system that is NOT running at 48 kHz (even though it should), then the samples won’t appear at the right time, and so they will have a different instantaneous amplitude. In other words, a change in time will result in a change in phase, which will show up in a plot like the one in Figure 5 as a change in amplitude.

Let’s pretend that we set up a system like the one shown above, and let’s say that the signal that we record over there on the right hand side produces a plot like the one shown below in Figure 7.

What does Figure 7 show us? Since the recording that we made with the sound card is at exactly 48 kHz, and since these are not horizontal lines, then this means that the recorded signal is not exactly 4.8 kHz.

However, this does not necessarily mean that the source (on the left side of Figure 6) is not transmitting a 4.8 kHz sine tone. It could mean that the clock that is determining the sampling rate in the loudspeaker is incorrect. So, the source “thinks” it’s playing a 4.8 kHz tone, but the loudspeaker is deciding otherwise for some reason. (This is a very normal behaviour. Nothing is perfect, and a Bluetooth speaker is a likely suspect for a number of errors…)

The curves in Figure 7 are sinusoidal. This means that the drift is constant. In other words, the sampling rate is wrong, but not varying, resulting in the wrong frequency of sine wave being played – but at least the frequency is not modulating. We can also see that each of the 10 sinusoidal waves makes about 1 cycle in the 1000 ms of the plot. This means that the clock is drifting by 1 period of the audio sine wave (4.8 kHz) ever 1000 ms. In other words, this is a system that it actually running at either 47990 Hz or 48010 Hz instead of 48000 Hz (because we’re either gaining or losing 10 samples every second). Unfortunately, without a little more attention, we don’t even know whether we’re running too slowly or too fast…

If the playback system’s clock (which controls its sampling rate) is not just incorrect but unstable, then you might see something like Figure 8, where I’ve only connected one of the 10 samples values.

If I were to plot the same slice of time, showing all 10 samples in the sine wave, they would look like Figure 9. Admittedly, this is probably less useful than Figure 8.

Obviously, this doesn’t tell us what’s going on other than to say that it’s obvious that this system is NOT behaving. However, we can get a little useful information. For example, we can see that the clock drift is modulating more from 0 ms to 200 ms, and then settles down to a more stable (and more correct) value from 200 to about 600 ms.

It would take more analysis to learn enough about this system to know what’s happening. However, as a smoke test to let you know whether it’s behaving well enough to not worry too much, or to see where you need to “zoom in” to find out more information.

In the last posting, I showed a scale drawing of a 15 µm radius needle on a 1 kHz sine tone with a modulation velocity of 50 mm/s (peak) on the inside groove of a record. Looking at this, we could see that the maximum angular rotation of the contact point was about 13º away from vertical, so the total range of angular rotation of that point would be about 27º.

I also mentioned that, because vinyl is mastered so that the signal on the groove wall has a constant velocity from about 1 kHz and upwards, then that range will not change for that frequency band. Below 1 kHz, because the mastering is typically ensuring a constant amplitude on the groove wall, then the range decreases with frequency.

We can do the math to find out exactly what the angular rotation the contact point is for a given modulation velocity and groove speed.

Looking at Figure 1, the rotation is ±13.4º away from vertical on the maximum; so the total range is 26.8º. We convert this to a time modulation by converting that angular range to a distance, and dividing by the groove speed at the location of the needle on the record.

If we repeat that procedure for a range of needle radii from 0 µm to 75 µm for the best-case (the outside groove) and the worst-case (the inside groove), we get the results shown in Figure 2.

Back in Part II of what is turning out to be a series of postings on this topic, I wrote

If this were a digital system instead of an analogue one, we would be describing this as ‘signal-dependent jitter’, since it is a time modulation that is dependent on the slope of the signal. So, when someone complains about jitter as being one of the problems with digital audio, you can remind them that vinyl also suffers from the same basic problem…

As I was walking the dog on another night, I got to thinking whether it would be possible to compare this time distortion to the jitter specifications of a digital audio device. In other words, is it possible to use the same numbers to express both time distortions? That question led me here…

Remember that the effect we’re talking about is caused by the fact that the point of contact between the playback needle and the surface of the vinyl is moving, depending on the radius of the needle’s curvature and the slope of the groove wall modulation. Unless you buy a contact line needle, then you’ll see that the radius of its curvature is specified in µm – typically something between about 5 µm and 15 µm, depending on the pickup.

Now let’s do some math. The information and equations for these calculations can be found here.

We’ll start with a record that is spinning at 33 1/3 RPM. This means that it makes 0.556 revolutions per second.

The Groove Speed relative to the needle is dependent on the rotation speed and the radius – the distance from the centre of the record to the position of the needle. On a 12″ LP, the groove speed at the outside groove where the record starts is 509.8 mm/sec. At the inside groove at the end of the record, it’s 210.6 mm/sec.

Let’s assume that the angular rotation of the contact point (shown in Figure 1) is 90º. This is not based on any sense of scale – I just picked a nice number.

We can convert that angular shift into a shift in distance on the surface of the vinyl by finding the distance between the two points on the surface, as shown below in Figure 2. Since you might want to choose an angular rotation that is not 90º, you can do this with the following equation:

2 * sin(AngularRotation / 2) * radius

So, for example, for a needle with a radius of 10 µm and a total angular rotation of 90º, the distance will be:

2 * sin(90/2) * 10 = 14.1 µm

We can then convert the “jitter” as a distance to a jitter in time by dividing it by the distance travelled by the needle each second – the groove speed in µm per second. Since that groove speed is dependent on where the needle is on the record, we’ll calculate it as best-case and a worst-case values: at the outside and the inside of the record.

Jitter Distance / Groove Speed = Jitter in time

For example, at the inside of the record where the jitter is worst (because the wavelength is shortest and therefore the maximum slope is highest), the groove speed is about 210.6 mm/sec or 210600 µm/sec.

Then the question is “what kind of jitter distance should we really expect?”

Looking at Figure 3 which shows a scale drawing of a 15 µm radius needle on a 1 kHz tone with a modulation velocity of 50 mm/s (peak) on the inside groove of a record, we can see that the angular rotation at the highest (negative) slope is about 13.4º. This makes the total range about 27º, and therefore the jitter distance is about 7.0 µm.

If we have a 27º angular rotation on a 15 µm radius needle, then the jitter will be

7.0 / 210600 = 0.0000332 or 33.2 µsec peak-to-peak

Of course, as the radius of the needle decreases, the angular rotation also decreases, and therefore the amount of “jitter” drops. When the radius = 0, then the jitter = 0.

It’s also important to note that the jitter will be less at the outside groove of the record, since the wavelength is longer, and therefore the slope of the groove is lower, which also reduces the angular rotation of the contact point.

Since the groove on records are typically equalised to ensure that you have a (roughly) constant velocity above 1 kHz and a constant amplitude below, then this means that the maximum slope of the signal and therefore the range of angular rotation of the contact point will be (roughly) the same from 1 kHz to 20 kHz. As the frequency of the signal descended from 1 kHz and downwards, the amplitude remains (roughly) the same, so the velocity decreases, and therefore the range of the angular rotation of the contact point does as well.

In other words, the amount of jitter is 0 at 0 Hz, and increases with frequency until about 1 kHz, then it remains the same up to 20 kHz.

As one final thing: as I was drawing Figure 3, I also did a scale drawing of a 20 kHz signal with the same 50 mm/s modulation velocity and the same 15 µm radius needle. It’s shown in Figure 4.

As you can see there, the needle’s 15 µm radius means that it can’t drop into the trough of the signal. So, that needle is far too big to play a CD-4 quad record (which can go all the way up to 45 kHz).

A question came to my desk this week from a customer who would like to connect a third-party streaming device to his Beolab 50s. He plans to use a USB-Audio connection and his question was “Should I control the volume of the audio signal in the streamer or in the Beolab 50s?” There are three different ways to configure these two options:

Control the volume in the streamer using its interface, and send a signal that has been volume-regulated to the Beolab 50s, which should then be set to have a start up default volume such that the maximum volume on the streamer results in a level that is as loud as the customer will ever want it to be. In order to do this, the Beolab 50s need to be set to ignore the volume information that is received on the USB-Audio connection.

Set the streamer to output an unregulated signal, and set the Beolab 50s to obey the volume information that is received on the USB-Audio connection, then use the streamer’s interface for the volume control (which would actually be happening inside the Beolab 50s).

Set the streamer to output an unregulated signal, and set the Beolab 50s to disobey the volume information that is received on the USB-Audio connection, then use the Beolab 50’s interface for the volume control (which would actually be happening inside the Beolab 50s).

Of course, one way to answer the question is “where do you want to control the volume?” For example, if it’s with a remote control for the Beolab 50s, then the answer is “use option #3”. If you’d prefer to use the streamer’s app, for example, then the answer is “use option #1 or #2”.

However, the question came to my desk because it was specifically about the technical performance of the audio signal. Which of these three options results in the highest audio “quality”? (I put the word “quality” in quotation marks because it is a loaded term, and might mean different things to different persons…)

The simplest answer without getting into any details is “it probably doesn’t matter“. However, that answer is based on a couple of assumptions that may or may not be wrong.

Hypothetically, the Beolab 50 can output an audio signal that peaks at about 122 dB SPL measured at 1 m in a free field, albeit not at all frequencies present at its output. (This is because there are some physical limitations of how far the woofers can move, which means that you can’t get 122 dB SPL at 20 Hz, for example.) The noise floor of the Beolab 50s is about 0 dB SPL measured in the same place (again, this is frequency-dependent). So, it has a total dynamic range at its output of about 122 dB.

The maximum output level is a result of a combination of the loudspeaker drivers, the amplifiers, and the power supply, however, these have all been chosen to reach their maximum outputs approximately simultaneously, so changing one of the three won’t make a big difference.

The noise floor is a result of the combination of the loudspeaker drivers’ sensitivities, the amplifiers’ noise floors, and the signal that feeds the amplifiers: the DAC outputs’ noise floors. For the purposes of this discussion, I’m sticking with a digital input, so we don’t need to worry about the noise floor of the ADC at the loudspeaker’s input.

If you have an audio signal at one of the digital inputs of the Beolab 50, and that signal is at its loudest possible level (for a sine wave, that’s 0 dB FS; or 0 dB relative to Full Scale). At Beolab 50’s maximum volume setting, this will produce a peak output level of 122 dB SPL (depending on the frequency as I mentioned above).

All digital inputs of the Beolab 50 accept at least a 24 bit word length. This means that the dynamic range of the digital input signal itself is about 6 * 24 – 3 = 141 dB. This in turn means that the hypothetical noise floor of a correctly-dithered 24-bit signal is 19 dB below the noise floor of the loudspeakers even at their maximum volume setting. (because 122 – 141 = -19)

In other words, if we assume that the streamer has a correctly-implemented gain function for its volume control, using TPDF dither implemented at the 24-bit level, then its noise floor will be 19 dB below the “natural” noise floor of the Beolab 50. Therefore, if the volume is controlled in the streamer, any artefacts will be masked by the 50s themselves.

On the other hand, the Beolab 50s volume control is done using a gain function that is performed in a 32-bit floating point calculation, which means that it has a dynamic range of 144 to 150 dB. (See this posting for an explanation and comparison of fixed point and floating point systems.) So the noise generated by the internal volume control will be somewhere between 22 and 26 dB below the “natural” noise floor of the Beolab 50.

So, (assuming my assumptions are correct) the noise floor that is produced by controlling the volume control in either the streamer or the Beolab 50s is FAR below the constant noise floor of the DAC / amplifiers.

In addition, the noise floors have roughly the same spectra (in other words, you don’t have pink noise in one case but white noise in the other; they’re all producing white noise). And since both are so far below, it really doesn’t matter. Arguing about whether the noise is 19 dB lower or 22 dB lower is a waste of good argument time, unless you paid for the four-and-a-half-hour argument instead of the five-minute one…

Important Notes

If the customer was asking about using the analogue input, then the answer MIGHT have been different.

Also, if my assumption about a 24-bit signal coming from the streamer, or that it has a correctly-implemented gain function for its volume control are incorrect, the this answer MIGHT be incorrect as well.