# Typical Errors in Digital Audio: Part 6 – Aliasing

In a previous posting, I tried to explain the concept of aliasing. The easiest way to illustrate this is to try to sample an audio signal that has a frequency that is higher than the Nyquist frequency – one half of the sampling rate. If you do this, then the signal that will come out of your digital audio system will have a different frequency than the original signal. In fact, it will be the Nyquist frequency minus the difference between the original signal and the Nyquist frequency.

For example, if we have an LPCM audio system that has a sampling rate of 48 kHz, then its Nyquist frequency is 24 kHz. If you allow any audio signal to be sampled by that system, and you record a sine wave with a frequency of 30 kHz, then the signal that will be played back by the system will be

Nyquist – (signal freq – Nyquist)
24 kHz – (30 kHz – 24 kHz)
24 kHz – 6 kHz
18 kHz

## Digging a little deeper

The example I gave above is only part of the story. It’s the part of the story that’s told because it’s easy to tell, and relatively easy to grasp. However, let’s look into this a little more…

If I ask you “what is the square root of 4?” you’ll probably say that the answer is “2”. However, this is also only part of the story. The square root of 4 is also -2, since -2 * -2 = 4. So, there are two correct answers to the question – in other words, both answers exist and are equally valid.

Aliasing is somewhat similar. If we manage to get a 30 kHz sine wave into an LPCM recording system with a sampling rate of 48 kHz, we will appear to have recorded an 18 kHz sine wave. However, the samples that we have captured are also equally valid for the original 30 kHz sine wave. In fact, both the 18 kHz and the 30 kHz tones can be thought of as being equally valid answers to the set of samples we recorded.

This means that, if I record an 18 kHz sine tone in the 48 kHz system, we can consider the 30 kHz sine tone to also exist simultaneously, inside the digital domain.

Oddly, this is also true at other frequencies. So, you do not only get a mirror effect around the Nyquist, but you also get it at the 1.5 times the sampling rate (or the sampling rate + Nyquist).

I won’t go into this any deeper for now – but if you want to continue, the section on “Folding” at the Wikipedia page on Aliasing is a good place to start.

Normally, we try to prevent audio signals higher with frequency content higher than the Nyquist frequency from getting into an LPCM system. This is done by low-pass filtering the audio signal to eliminate any content that might cause aliasing. That’s why the low-pass filter at the input of an analogue-to-digital converter is called an anti-aliasing filter. (At least, that’s the theory. In reality, the anti-aliasing filter of many ADC’s allow a little signal to get through above Nyquist…)

However, what happens if you create signals with a frequency above the Nyquist within the digital domain? Is this possible? Can it happen accidentally?

The short answer to this question is “yes”.

For example, let’s take a sine wave with a frequency of 2212 Hz (this is an arbitrary number… it could have been something else…), record it with an LPCM system with a sampling rate of 48 kHz. Then, after the signal is in the digital domain, I clip it at 85% of the peak value, so it looks like the waveform shown in Figure 1.

By clipping the sine wave symmetrically (meaning that we have made the same change in the wave’s shape on the top and the bottom), we create odd-order harmonics. This means that, when we look at the spectrum of the signal’s frequency content, we will see energy at the fundamental frequency (the original sine wave’s frequency) and also peaks at 3x, 5x, 7x, 9x, that frequency – and so on.

(If I had clipped only on the top or the bottom, and therefore made asymmetrical distortion, we would see energy in the even-order harmonics at 2x, 4x,  6x, 8x, the fundamental frequency – and so on.)

So, let’s look at the frequency content of the clipped signal shown in Figure 1. This is shown in Figure 2, below.

As you can see in Figure 2, we are expecting to see harmonics that extend (at least in this plot) up to 37604 Hz (or 17 x 2212 Hz). Of course, there are harmonics that go higher than this – but they aren’t visible in this plot because I’m only plotting signals with a level down to 60 dB FS.

You may notice that the width of the plot at 2212 Hz increases at the bottom. This is just an artefact of the math being done to find the frequency components in the signal. That spread in the frequency domain isn’t actually in the signal itself, so it can be ignored.

As I said above, the signal was clipped in the digital domain, in an LPCM system running at 48 kHz. So, just for reference, I’ve put in blue lines in Figure 2 that show the sampling rate and the Nyquist frequency – one half the sampling rate.

So: we can see that some of the artefacts created by clipping the signal are sitting at frequencies above the Nyquist frequency in this system. This means that this content will be “mirrored” or “folded down” or – more correctly – aliased to other frequencies below the Nyquist frequency. For example, the harmonic at 24332 Hz will be mirrored to 23668 Hz, according to the following math:

Nyquist – (signal freq – Nyquist)
24000 – (24332 – 24000)
24000 – 332
23668 Hz

So, looking at the top 60 dB of the signal content (shown in Figure 3): the resulting actual output of the LPCM signal will contain:

1. the original fundamental frequency at 2212 Hz
2. four harmonics of that frequency (shown as the other red numbers in Figure 3), and
3. four more frequencies that are not harmonically related to the fundamental (the blue numbers)

As you may already know, an LPCM system has a low-pass filter at its output stage – part of the system that is used to convert the signal back to an analogue output. However, that low pass filter typically has a cutoff frequency around the Nyquist frequency of the system. However, the artefacts that we have created here have aliased down to frequencies below the Nyquist within the digital domain – so, by the time the signal reaches the low pass filter at the output (known as a “reconstruction filter”) they’re already in the audio band, and therefore they’re not filtered out.

So, as we can see in this rather simple example: it is easily possible that a digital audio system that has some processing (specifically “non-linear” processing) can create harmonics that are higher than the Nyquist frequency – and therefore aliasing.

Since the aliased artefacts are not harmonically related to the fundamental frequency, they are more easily audible than “normal” distortion artefacts that generate harmonically-related artefacts. There are a couple of reasons for this, but the most obvious one can be demonstrated by sweeping the frequency of the fundamental. If the artefacts are harmonically related, then as the fundamental frequency of the signal goes up, so do the artefacts. However, if the artefacts are the result of aliasing, then as the fundamental frequency of the signal goes up, some of the artefacts go down in frequency, which sounds quite strange…

The example I gave above (of clipping) is just one way to create distortion that generates harmonically-related artefacts that alias in the system. Lots of different processes can create those artefacts. One of the usual suspects is a poorly-made sampling rate converter.

Many systems use sampling rate converters for different reasons. For example, if you have a loudspeaker or processor that has a lot of filtering in its processing chain, the best architecture is to run the digital signal processing (the  DSP) at a constant (or “fixed”) sampling rate, regardless of the sampling rate of the incoming signal. This is because, if you were to change sampling rates in the DSP to match the incoming signal, you would have to load an entirely new set of coefficients (a fancy word that basically means “multiplications values inside the digital filters”) into the processor. This takes some time, and you don’t want to miss the first part of the song every time the sampling rate changes… So, instead, the smart thing to do is to keep the DSP running at a constant rate, and sample rate convert all incoming signals to the internal sampling rate. This way, there’s no dropout at the start of the song.

However, you have to be careful if you do this, since a poorly-made sampling rate converter will certainly create aliasing artefacts.

In part 5 of this series of postings, I described one kind of test that can be made on an audio system. This test consists of sending a sine wave with a swept frequency into the system and recording its output. You then do a spectrogram of the output, looking for signals at frequencies other than the one you sent in.

To get an idea of what aliasing will look like in this plot, I made a DSP algorithm that creates the same kinds of artefacts. The resulting plot is shown in Figure 4, below. (Remember that this is a measurement of a system that I made to intentionally generate similar artefacts to aliasing – this isn’t actually the output of a system that is aliasing).

Now that you know what to look for in the plot, let’s look at the measurements of some commercially-available systems. Figure 5, below is a measurement of a system that has two problems. One can be seen as the vertical lines – these are “skip/insert” artefacts that I described in an earlier posting. The aliasing artefacts can also be seen in this plot. Note that, in this case, the input and output of the system are both digital connections to my measurement equipment.

If I send a signal at a different sampling rate into the same system, I get a different behaviour. This is not unusual in systems with sampling rate converters. In this plot, you can see the skip/insert artefacts (the vertical stripes) the aliasing artefacts, and the obvious band-limiting of the system. Notice that nothing above about 24 kHz comes out of the system, which would mean that, internally, it is probably running at a sampling rate of 48 kHz. (The input signal in this measurement was at 192 kHz and my analysis system was running at 96 kHz.)

Let’s look at another system. In this case, I put a 48 kHz, 16-bit .flac file on a hard drive, and played it through another digital audio system, again capturing its digital output. The result of this is shown in Figure 7.

As you can see in Figure 6, this system is behaving very well in this particular test. I see the nice, clean signal with only one frequency at only one time. No artefacts down to 100 dB below the signal level. This is good.

Now let’s test exactly the same system, at exactly the same sampling rate, again with a .flac file – but this time with a 24-bit word length in the file. The result of this is shown in Figure 7.

So, by going from a 16-bit file to a 24-bit file, this system obviously behaves very, very differently. It now has harmonic distortion (the straight diagonal lines running parallel to the fundamental frequency), aliasing of those harmonics when they go beyond 24 kHz, and strange noises as well (the large area of blue blobs in the lower left corner, and surrounding the fundamental frequency all the way up.

Those “strange noises” –  the blobs – are probably artefacts caused by a lossy codec similar to MP3. Typically, systems like this are built to reduce the data rate of the audio signal by trying to predict what you can’t hear in the signal – and leaving that out. In doing so, they create errors that produce noise, so the encoder tries to shape that noise so that it “hides” under the signal that it keeps. The end result looks something like the blobs shown in Figure 7…

So, based only on the information from this test, we can guess that the system might be decoding the 24-bit file, “transcoding” it to a lossy format, and transmitting that through the system. However, this is just a guess based on one test… So it could easily be wrong.

One thing we can conclude, however, is that the 48 kHz / 16-bit file behaves MUCH better than a 48 kHz / 24-bit file in this system… So, in this particular case, a higher resolution is not necessarily better…

I should also point out that the digital output of that system was capable of outputting 24 bits. The reason I’m pointing this out is that many persons think that if a system or device has a digital output, then it is good. This is too simple a conclusion to make, because, as I’m trying to illustrate with this series of postings, the “weak link” in the chain is very likely NOT the physical output of the system. It’s more likely some part of the processing in the DSP chain (for example, a poorly-made sampling rate converter that aliases) or a poorly-implemented clocking system (for example, a skip/insert strategy).

## For more aliasing fun…

If you’re intrigued by this, and you’d like to compare the aliasing caused by other sampling rate converters, I’d recommend checking out the page at http://src.infinitewave.ca. They plot the signals with a linear frequency scale instead of a logarithmic one. Consequently, the sweep of the fundamental looks like a curve (instead of the straight lines in my plots) but the harmonic distortion and aliasing artefacts are easier to see as being related to the fundamental.

# B&O Tech: Airtight excuses

#79 in a series of articles about the technology behind Bang & Olufsen loudspeakers

“Love at first sight? Let me just put on my glasses.”
Ljupka Cvetanova, The New Land

When I’m working on the sound design for a new pair of (over-ear, closed) headphones, I have to take off my glasses (which makes it difficult for me to see my computer screen…) I’ll explain.

Let’s over-simplify and consider a block diagram of a closed (and therefore “over-ear”) headphone, sitting on one side of your head. This is represented by Figure 1.

One of the important things to note there is that the air in the chamber between the headphone diaphragm and the ear canal is sealed from the outside world.

So, if I put such a headphone on an artificial ear (which is a microphone in a small hole in the middle of a plate – it is remarkably well-represented by the red lines in Figure 1….) I can  measure its magnitude response. I’ll call this the “reference”. It doesn’t matter to me what the measurement looks like, since this is just a magnitude response which is the combination of the headphone’s response and the artificial ear’s response – with some incorrect positioning thrown into the mix.

If I then remove the headphones from the plate, and put them back on, in what I think is the same position, and then do the measurement again, I’ll get another curve.

Then, I’ll subtract the “reference measurement” (the first one) from the second measurement to see what the difference is. An example of this is plotted in Figure 2.

Now, let’s consider what happens when the seal is broken. I’ll stick a small piece of metal (actually an Allan key, or a hex wrench, depending on where you live) in between the headphones and the plate, causing a leak in the air between the internal cavity and the outside world, as shown in Figure 4.

We then repeat the measurement, and subtract the original Reference measurement to see what happened. This is shown in Figure 6.

As you can see, the leak in the system causes us to lose bass, primarily. In the very low end, the loss is significant – more than 10 dB down at 20 Hz! Basically, what we’ve done here is to create an acoustical high-pass filter. (I’m not going to go into the physics of why this happens… That’s too much information for this posting.) You can also see that there’s a bump around 200 Hz which is also a result of the leak. The sharp peak up at 8 kHz is not caused by the leak – it’s just an artefact of the headphones having moved a little on the plate when I put in the Allen Key.

Now let’s make the leak bigger. I’ll stick the arm of my glasses in between the plate and the leather pad.

The result of this measurement (again with the Reference subtracted) is shown in Figure 8.

Now you can see that the high pass filter’s cutoff frequency has risen, and the resonance in the system has not only increased in frequency (to 400 Hz or so) but also in magnitude (to almost +10 dB! Again, the sharp wiggles at the top are mostly just artefacts caused by changes in position…

Just to check and see that I haven’t done something stupid, I’ll remove the glasses, and run the measurement again…

The result of this measurement is shown in Figure 10.

So, there are a couple of things to be learned here…

Firstly, if you and a friend both listen to the same pair of closed, sealed headphones, and you disagree about the relative level of bass, check that you’re both not wearing glasses or large earrings…

Secondly, those big, sharp spikes up the high end aren’t insignificant… They’re the result of small movements in the headphones on the measuring system. A similar thing happens when you move headphones on your head – but it can be even more significant due to effects caused by your pinna. This is why, many people, when doing headphone measurements, will do many measurements (say, 5 to 10) and average the results. Those errors in placement are not just the result of shifts on the plate – they may also be caused by differences in “clamping pressure” – so, if I angled the headphones a little on that table, then they might be pressing harder on the artificial ear, possibly only on one side of the ear cup, and this will also change the measured response in the high frequency bands.

Of course, it’s possible to reduce this problem by making the foam more compliant (a fancy word for “squishy”) – which may, in turn, mean that the response will be more different for different users due to different head widths. Or the problem could be reduced by increasing the clamping force, which will in turn make the headphones uncomfortable because they’re squeezing your head. Or, you could embrace the leak, and make a pair of open headphones – but those will not give you much passive noise isolation from the outside world. In fact, you won’t have any at all…

So as you can see, as a manufacturer, this issue has to be balanced with other issues when designing the headphones in the first place…

Or you can just take off your glasses, close your eyes, and listen…

Please don’t jump too far in your conclusions as a result of seeing these measurements. You should NOT interpret them to mean that, if you wear glasses, you will get a 10 dB bump at 400 Hz. The actual response that you will get from your headphones depends on the size of the leak, the volume of the chamber in the ear cup (which is partly dependent on the size of your pinna, since that occupies a significant portion of the volume inside the chamber) and other factors.

The take-home message here is: when you’re evaluating a pair of closed, over-ear headphones: small leaks have an effect on the low frequency response, and small changes in position have an effect on the high-frequency response. The details of those effects are almost impossible to predict accurately.

# Typical Errors in Digital Audio: Part 5 – What time is it there?

I’m originally from Newfoundland – one of the few places in the world with a 1/2-hour time zone. So, when it’s 10:00 a.m. in Montreal, it’s 11:30 a.m. in St. John’s – my home town. This meant that, when I was a kid 40 years ago, and we would call our relatives in Toronto or Germany to wish them a Merry Christmas, there were two questions that you could always rely on being asked: (1) what’s the weather like there? and (2) what time is it there?

These days, I have a similar problem that is well-described by “Segal’s Law“. My iPhone and my wristwatch (an old analogue one with hands that go around pointing at the floor and the fridge…) are never synchronised… This is because of two things: (1) I probably did a bad job of setting my watch and (more importantly) (2) my watch runs just a little bit slowly…

So, let’s say, for example, that I set my watch to be EXACTLY in sync with my phone on a Monday morning at 9:00 a.m. As the week goes by, my iPhone and my watch drift apart, and, just for the sake of argument let’s say that, one week later, when my iPhone turns over to 9:00 a.m. on Monday morning, my wristwatch turns over to 8:59 a.m. So, I lose 1 minute per week on my watch.

(It’s pretty safe to assume that my iPhone is also not perfect – but it’s different because, every once in a while, it compares its internal clock with another, more accurate clock somewhere else via a connection across the Internet (which, we will assume, for the purposes of this discussion, works).)

Let’s consider this from a strange point of view. Let’s assume that

• I’m checking the time on my watch every minute, on the minute
• someone else is “fixing” my watch every week so that it’s correct at 9:00 a.m. on Mondays. They do this by adjusting the watch to the correct time 30 seconds before the iPhone says it’s 9:00 a.m.
• I don’t know that they’re doing this for me…

If we think about this from my perspective, I’ll live in a strange world where 8:59 on Mondays never exists. This is because at 8:58 and 30 seconds (on my watch), my friend re-sets the time to 8:59 and 30 seconds (while I’m not looking) to synchronise with the iPhone…

IF my watch was running fast – say, gaining one minute each week, then I would live in a different strange universe where 9:00 happens twice every Monday morning…

The basic problem here is that we have two clocks that do not run at the same rate – but they are expected to do so. So, we synchronise them regularly (in the above example, on Monday mornings at 9:00) – but between those synchronisation events, they drift apart in time.

## So what?

The example above is very, very similar to the way a digital audio streaming system works – especially if you’re using a wireless connection between the transmitting device and a receiver.

Lets say that you’re playing a sound file that was recorded at 44.1 kHz and streaming it wirelessly to a receiver. I’m trying to be as generic as possible here, but I could be talking about a Bluetooth connection to a pair of headphones or a WiFi connection via DLNA to a device connected to a pair of loudspeakers, for example…

It is not unusual with such a connection for the transmitter to collect up a block of audio samples – say, 64 of them – and send them to the receiver’s input buffer. The receiver then pulls those samples out, one by one, and (eventually) sends them to a digital-to-analogue converter that produces a signal that (eventually) comes out as an audio signal. Then, 64/44100’ths of a second later (64 samples later) the transmitter sends another block, and so on and so on until the song ends.

This system works well if the clock inside the transmitter and the clock inside the receiver are perfectly synchronised. We can even be a little generous and say that they can drift apart a little – but not so much that we either run out of samples to play (because the receiver is playing them out faster than they’re coming in from the transmitter) or that we have samples left over to play when the next block comes in (because the receiver is playing them out slower than they’re coming in from the transmitter).

## Dealing with this problem the right way

The right way to deal with this issue is for the receiver to always be checking what time it thinks it is when the block arrives from the transmitter. If the block arrives a little early, then the receiver should think “hmmmm, my clock is going too slowly – I’ll speed it up a bit”. If the block arrives a little late, then the receiver should adjust its clock to go a little slower.

So, in this case, the receiver has a basic, nominal speed for its internal clock – but it’s constantly adjusting it to be faster and slower to try and match the clock of the transmitter – but it can only do this adjustment at the block rate – the frequency at which the blocks of samples arrive, which is dependent on the block length (how many samples are in each block) and the sampling rate (how many samples per second). (Of course, this can result in “jitter and wander” problems if you’re not careful (I won’t talk about this here…) – so you have to pay a little attention to how quickly you’re adjusting your clock rate… but that’s “just” a matter of correct implementation.)

## Dealing with this problem the wrong way

There is another way to deal with this problem, which, unfortunately, has measurable and possibly audible consequences. This implementation is basically the same as my original example, where I had a friend “fixing” my wristwatch once a week. You have a transmitter that sends blocks of samples to the receiver – and although these two devices should have exactly the same clock rate, they don’t.

Let’s say, for example, that the receiver is playing the samples faster than they’re being sent by the transmitter. This means that the two will slowly drift farther and farther apart until, eventually, the receiver will have to play a sample, but nothing has come in from the transmitter yet, so there’s no sample there to play. In this case, the receiver says “no problem, I’ll just play the last sample again, and the next block will come in while I’m doing that” – so it inserts an extra sample that is just a duplicate of the previous one.

If the receiver’s clock is going slower than the transmitter’s, then, as the two drift farther apart, we will get to a moment where the receiver will receive a new block of samples but it’s not done playing all of the samples in the previous block yet. In this event, it says “no problem, I’ll just leave that last sample out and move on to the next block to catch up” – so it skips a sample.

This is called a “Skip / Insert”  strategy for dealing with clock synchronisation. It’s done by software and hardware engineers because it’s simple to implement, and, in many cases, a manufacturer can get away with this, since it is rarely audible for a couple of reasons.

## Can this be measured?

The simple answer to this is “yes” – and it can be measured in a number of different ways. I’ll show one way below…

## Can I hear it?

The honest answer to this question is “sometimes” – but it’s not as easy to detect as one might think. Of course, a skip/insert event (a duplicated sample or a dropped one) creates an artefact. However, the magnitude of this artefact relative to the “correct” signal is dependent on when it happens.

Let’s take a look at a couple of simple cases. We’ll “transmit” one period of a sine wave that should come out on the other side of the system looking like Figure 1.

But what happens if we don’t get a block in time to keep outputting a signal? We insert a duplicate sample and hope that the block comes in before I have to send out another one. Examples of this are shown in Figures 2 and 3, below.

You’ll probably notice that it’s much easier to see which sample I duplicated in Figure 3 than in Figure 2. In Figure 3 it was sample number 26 that was duplicated. In Figure 2 it’s sample number 13.

The reason it’s easier to see the error in Figure 3 is that duplicating the sample causes an obvious change in the slope of the signal, whereas in Figure 2 it does not – the slope of the signal is 0, and by duplicating a sample, I am also making it 0 – but for a slightly longer time.

This does not mean that we did not generate an error. It just means that we’ll probably “get away with it” in the case of Figure 2, and we probably won’t in the case of Figure 3.

However, since the drifting of the two clocks (in the receiver and transmitter) are not dependent on the signal, there’s no way to know when this is going to happen.

And, of course, if this happens in the middle of a snare drum hit or a ssssinger sssstarting a word in a ssssong with the letter “s” – then we also won’t hear it because there’s so much going on (frequency-wise) that the artefact will be buried in the mess.

Also, since this clock drifting is usually not completely regular, the errors do not usually come in at a regular rate (although I’ve seen exceptions…). So, it’s not like you can listen for “a click every second” or “one per minute”. They happen when they happen – hopefully when you’re not listening and/or when the tune is busy enough to hide it.

A skip event is similar to an insert, as you can see in the two examples in Figures 4 and 5.

Again, I’ve intentionally put in these two skips in places where they are least obvious (Figure 4) and most obvious (Figure 5).

## The real world

One of the tests that can be done on an audio system is to send a sinusoidal signal with a swept frequency through a system, capture the output, and then do a spectrogram of the result. In theory, if you see anything other than a single frequency at any one time at the output, then you know that something has happened to the signal. You would probably then need to go back and look at the output signal itself to start evaluating exactly what happened… This is a test that is used to evaluate one aspect of the performance of different sampling rate converters, for example, at this site.

Let’s take a sine sweep and run it through a system. The sweep goes up logarithmically in frequency from 20 Hz to about 90% of Nyquist (which would correspond to 20,000 Hz in a system running at 44.1 kHz) over 60 seconds and has a level of -1 dB FS. We’ll then capture the output in a system that is behaving perfectly and do a spectrogram of this, looking for artefacts down to some level below the signal level. (If you’re really geeky, you’ll know that this signal-to-error ratio is dependent on the window length of the FFT I’m using to create the spectrogram – but this is beyond our discussion today…).

An example of the output of a system that is behaving well is shown in Figure 6.

You may notice that the plot looks a little “wide” in the beginning. This is because the window length of the FFT I’m using to analyse the signal isn’t long enough to get a precise analysis of a low-frequency signal. So, this is an artefact of the analysis – not an error in the playback system.

What happens if we have random skip/insert events in the system? This is shown in Figure 7.

The signal in Figure 7 was one that I created – I intentionally made skip/insert events at random times and applied them to my test signal.

There are two things to notice here. The first is that each event is visible as a vertical “spike” in the plot. This is because a skip/insert event will cause a short, wide-band “burst” that sounds like a click. However, the bandwidth of the click is dependent on when it happens relative to the signal. For example, the skip/insert events in Figure 2 and 4 would not create as much high-frequency energy as the ones in Figure 3 and 5. So, the bigger the effect on the slope of the signal, the more high frequency energy we’ll get in our “click” sound. Since the slope of a signal increases with frequency, then this also means that low-frequency signals will likely produce lower-bandwidth artefacts.

Now let’s look at the results from some real-world devices and systems that are commercially available.

As you can see in Figure 8, there was one skip/insert event that happened during the 60 seconds I was running this test. Remember that the time that that event happened had nothing to do with the frequency it was playing. It just happens when it happens due to the relationship between the transmitter’s and the receiver’s clock speeds.

Figure 9 shows the results from a different system/device that obviously uses a skip/insert strategy to deal with clock synchronisation problems. It also obviously has some serious clock issues, since it has to correct on the order of approximately once a second…

Figure 10 shows the results from a different system/device that uses a skip/insert strategy – but appears to do so at scheduled intervals. In this case, there is a high probability of getting a skip/insert event every 10 seconds with the counter starting at the instant I starting hearing the music.

Inquisitive readers may be asking why it is that, although I’m doing an analysis down to -101 dB FS (100 dB below the signal level of -1 dB FS), you can’t see the effects of the dither noise floor in my original 16-bit file (which is normally assumed to be at -93 dB FS). This is because the -93 dB FS estimate of a dither signal assumes that you are looking at the total energy from the entire frequency band. The spectrograms above are based on FFT’s that split up the total frequency band into “slices” (called frequency bins) – and the total energy in each of these bins is less than the total energy in all of them (one person clapping is not as loud as 1000 people clapping at the same time…). If we wanted to see the dither noise, I would have had to set my analysis to go down approximately 30 dB lower – but the actual value for this is dependent on the relationship between the sampling rate, the window length of the FFT’s, and the windowing function that I’m using.

Do not bother contacting me to ask which “commercially-available system/device” I measured and in which I found these errors. I’m not doing this to get anyone in trouble. I’m just doing this to try to illustrate common errors that I see often when I evaluate and test audio devices.

An besides, it would not be fair for me to rat on specific companies, systems, or devices, since, in some cases, these errors may have already been fixed with a firmware update, meaning that “naming names” would be irrelevant and unnecessarily detrimental.

But, I will say that I see this problem often. A rough estimate is that I would see errors like this on roughly half of the commercially-available devices and systems I test. It can also be sneaky, as we saw in Figures 8 and 10. Sometimes you get one of these clicks only once in a minute. So, if you do a 10-second measurement to test if your wireless audio receiver is “bit accurate” – the answer can be “yes” – but if you keep measuring for 1 or 2 minutes, you find out the answer is “no”…

If it helps, I could have used the example of a leap year instead of two clocks at the beginning. The reason we have a February 29 every 4 years is that our calendar “runs” a little faster than the time it takes us to get around the sun (because a “year” is actually 365.25 days long…). So, every 4 years we have to “insert” a day to put the two clocks back in sync.

Also, since a “year” is not exactly 365.25 days long, we also have the occasional “leap second” as well. But most people don’t notice this, since it’s rarely useful as an excuse when you’ve missed a meeting…

# Typical Errors in Digital Audio: Part 4 – Ode to Billy Joe

It was the third of June, another sleepy, dusty Delta day
I was out choppin’ cotton, and my brother was balin’ hay

I’ve always liked the song “Ode to Billy Joe”. It starts on a 7-chord, so you know it’s going to go somewhere… I love how Papa, when he hears that Billy Joe jumped off the Tallahatchie Bridge just says that he “never had a lick of sense”, and asks for more biscuits. And who, exactly, did Brother Taylor see with Billy Joe? What did they throw off the bridge?

I like the fact that there are many questions and few answers – and life just goes on anyway…

But we’re not here to talk about songwriting, we’re here to talk about typical errors in digital audio – specifically today – streaming services.

This error is an easy one to discuss – but an important one nonetheless…

When I’m sitting at work, typing on my computer, I listen to music a lot. Usually, I use the “Audirvana” software on my Mac, with an external Teac UD-501 USB-Audio headphone DAC (which does the digital-to-analogue conversion and the amplification for the headphones, all in one box). The reasons I choose to use Audirvana are (1) that it can play all of my files (I have some DSD stuff on my hard drive), it can stream directly to my external DAC without routing the audio through Mac’s OS, and it can also see my Tidal account.

Now, just to be clear, this posting is not an advertisement for Apple, Audirvana, Teac, or Tidal. I mention all of that just as background information… I also drive an 11-year old base-model Honda Civic (that will come up later in this posting) and I wear Ecco shoes (which is completely irrelevant…).

If you use Audirvana to search Tidal for tracks called “Ode to Billy Joe” You will get 300 hits. I don’t know if this is because there are 300 covers of that song on Tidal (I doubt it) or if 300 is a limit on the number of tracks either Tidal or Audirvana will report in a Search function (I suspect that this is the case…)

As you can see in the screenshot in Figure 1, all of them are 16 bit, 44.1 kHz files. So far so good…

I have two favourite versions of this song. One of them is by Paula Cole (the other is by Patty Smyth). If I press “play” on the Paul Cole version, and I look at the top of the screen, I see something like the screenshot in Figure 2.

One of the nice things about Audirvana is that it tells you a little technical information about the track to which you’re listening. Notice there on the right-hand side of the screenshot above, that we’re listening to a 16-bit, 44.1 kHz FLAC file.

This makes sense. In fact, it’s what I expect, since my Tidal subscription promises “lossless high fidelity sound quality” – that’s why I pay extra for a Tidal HiFi subscription…

So far so good.

One of my less-favourite renditions of “Ode to Billy Joe” is performed by The Stadium Saxophone Players on their album “Timeless Sax Instrumentals – Volume 2”. IF I press play on this version, and look at the top of my Audirvana window, I see the information in Figure 3.

Interesting…. Notice that I am now listening to a 96 kbps AAC file with a 16-bit word length, and a sampling rate of “22.1 kHz” (actually 22.05 kHz – half of 44.1). So much for “lossless high fidelity sound quality”.

This calls for more investigation.

So, I pressed “Play” on the top hits in my search, one by one, and checked the file format displayed on the screen. The results of this “test” was that, in the first 66 “Ode to Billy Joe’s” listed, 6 of them were 96 kbps AAC files, 60 of them were FLAC.

So, for this sampling, roughly 9% of the available tracks were not in a lossless format, and were not even full bandwidth. Admittedly, the tracks that were in the lower-quality format were versions that I would not listen to anyway – so, to be honest, I don’t really care too much.

Now, before you mis-interpret me, I want to be very explicit and state that this is NOT Tidal’s fault. Of course they did not ask for an AAC version of the file they put on their hard drives. This was the file format supplied to them by the record label (to use an increasingly old-fashioned term…). So, we can’t blame Tidal for this – and I’m quite certain that they’re not the only streaming service that “suffers” from this issue.

However, what my little test shows is that what Tidal is actually selling me is the capability of streaming “lossless high fidelity sound quality” – and not a guarantee that what is in the “pipe” really is lossless.

Of course, this is not just true for streaming services. Other people have shown that some higher-priced “high resolution” audio files that you can purchase online are actually just a bit-for-bit copy of the “normal resolution” version of the same track. I have at least one CD that contains at least one track that has MP3 artefacts obvious enough that I can hear them on my unbranded audio system in my 11-year old Honda Civic while I’m driving… (It’s a compilation disc, so I guess the label was supplied with an MP3 version that they decoded to PCM and put on the CD.)

So, just like Ode to Billy Joe – there are some questions here… and you don’t need to know much about digital audio to answer them… But the basic moral of this part of the story is that the format that is used to deliver your music is not a guarantee of higher quality…

# Typical Errors in Digital Audio: Part 3 – Aliasing

Reminder: This is still just the lead-up to the real topic of this series. However, we have to get some basics out of the way first…

In the first posting in this series, I talked about digital audio (more accurately, Linear Pulse Code Modulation or LPCM digital audio) is basically just a string of stored measurements of the electrical voltage that is analogous to the audio signal, which is a change in pressure over time… In the second posting in the series, we looked at a “trick” for dealing with the issue of quantisation (the fact that we have a limited resolution for measuring the amplitude of the audio signal). This trick is to add dither (a fancy word for “noise”) to the signal before we quantise it in order to randomise the error and turn it into noise instead of distortion.

In this posting, we’ll look at some of the problems incurred by the way we carve up time into discrete moments when we grab those samples.

Let’s make a wheel that has one spoke. We’ll rotate it at some speed, and make a film of it turning. We can define the rotational speed in RPM – rotations per minute, but this is not very useful. In this case, what’s more useful is to measure the wheel rotation speed in degrees per frame of the film.

Take a look at the left-most column in Figure 1. This shows the wheel rotating 45º each frame. If we play back these frames, the wheel will look like it’s rotating 45º per frame. So, the playback of the wheel rotating looks the same as it does in real life.

This is more or less the same for the next two columns, showing rotational speeds of 90º and 135º per frame.

However, things change dramatically when we look at the next column – the wheel rotating at 180º per frame. Think about what this would look like if we played this movie (assuming that the frame rate is pretty fast – fast enough that we don’t see things blinking…) Instead of seeing a rotating wheel with only one spoke, we would see a wheel that’s not rotating – and with two spokes.

This is important, so let’s think about this some more. This means that, because we are cutting time into discrete moments (each frame is a “slice” of time) and at a regular rate (I’m assuming here that the frame rate of the film does not vary), then the movement of the wheel is recorded (since our 1 spoke turns into 2) but the direction of movement does not. (We don’t know whether the wheel is rotating clockwise or counter-clockwise. Both directions of rotation would result in the same film…)

Now, let’s move over one more column – where the wheel is rotating at 225º per frame. In this case, if we look at the film, it appears that the wheel is back to having only one spoke again – but it will appear to be rotating backwards at a rate of 135º per frame. So, although the wheel is rotating clockwise, the film shows it rotating counter-clockwise at a different (slower) speed. This is an effect that you’ve probably seen many times in films and on TV. What may come as a surprise is that this never happens in “real life” unless you’re in a place where the lights are flickering at a constant rate (as in the case of fluorescent or some LED lights, for example).

Again, we have to consider the fact that if the wheel actually were rotating counter-clockwise at 135º per frame, we would get exactly the same thing on the frames of the film as when the wheel if rotating clockwise at 225º per frame. These two events in real life will result in identical photos in the film. This is important – so if it didn’t make sense, read it again.

This means that, if all you know is what’s on the film, you cannot determine whether the wheel was going clockwise at 225º per frame, or counter-clockwise at 135º per frame. Both of these conclusions are valid interpretations of the “data” (the film). (Of course, there are more – the wheel could have rotated clockwise by 360º+225º = 585º or counter-clockwise by 360º+135º = 495º, for example…)

Since these two interpretations of reality are equally valid, we call the one we know is wrong an alias of the correct answer. If I say “The Big Apple”, most people will know that this is the same as saying “New York City” – it’s an alias that can be interpreted to mean the same thing.

We people in audio commit many sins. One of them is that, every time we draw a plot of anything called “audio” we start out by drawing a sine wave. (A similar sin is committed by musicians who, at the first opportunity to play a grand piano, will play a middle-C, as if there were other notes in the world.) The question is: what, exactly, is a sine wave?

Get a Slinky – or if you don’t want to spend money on a brand name, get a spring. Look at it from one end, and you’ll see that it’s a circle, as can be (sort of) seen in Figure 2.

Since this is a circle, we can put marks on the Slinky at various amounts of rotation, as in Figure 3.

Of course, I could have put the 0º marl anywhere. I could have also rotated counter-clockwise instead of clockwise. But since both of these are arbitrary choices, I’m not going to debate either one.

Now, let’s rotate the Slinky so that we’re looking at from the side. We’ll stretch it out a little too…

Let’s do that some more…

When you do this, and you look at the Slinky directly from one side, you are able to see the vertical change of the spring from the centre as a result of the change in rotation. For example, we can see in Figure 6 that, if you mark the 45º rotation point in this view, the distance from the centre of the spring is 71% of the maximum height of the spring (at 90º).

So what? Well, basically, the “punch line” here is that a sine wave is actually a “side view” of a rotation. So, Figure 7, shows a measurement – a capture – of the amplitude of the signal every 45º.

Since we can now think of a sine wave as a rotation of a circle viewed from the side, it should be just a small leap to see that Figure 7 and the left-most column of Figure 1 are basically identical.

Let’s make audio equivalents of the different columns in Figure 1.

Figure 10 is an important one. Notice that we have a case here where there are exactly 2 samples per period of the cosine wave. This means that our sampling frequency (the number of samples we make per second) is exactly one-half of the frequency of the signal. If the signal gets any higher in frequency than this, then we will be making fewer than 2 samples per period. And, as we saw in Figure 1, this is where things start to go haywire.

Figure 11 shows the equivalent audio case to the “225º per frame” column in Figure 1. When we were talking about rotating wheels, we saw that this resulted in a film that looked like the wheel was rotating backwards at the wrong speed. The audio equivalent of this “wrong speed” is “a different frequency” – the alias of the actual frequency. However, we have to remember that both the correct frequency and the alias are valid answers – so, in fact, both frequencies (or, more accurately, all of the frequencies) exist in the signal.

So, we could take Fig 11, look at the samples (the black lollipops) and figure out what other frequency fits these. That’s shown in Figure 12.

Moving up in frequency one more step, we get to the right-hand column in Figure 1, whose equivalent, including the aliased signal, are shown in Figure 13.

## Do I need to worry yet?

Hopefully, now, you can see that an LPCM system has a limit with respect to the maximum frequency that it can deal with appropriately. Specifically, the signal that you are trying to capture CANNOT exceed one-half of the sampling rate. So, if you are recording a CD, which has a sampling rate of 44,100 samples per second (or 44.1 kHz) then you CANNOT have any audio signals in that system that are higher than 22,050 Hz.

That limit is commonly known as the “Nyquist frequency“, named after Harry Nyquist – one of the persons who figured out that this limit exists.

In theory, this is always true. So, when someone did the recording destined for the CD, they made sure that the signal went through a low-pass filter that eliminated all signals above the Nyquist frequency.

In practice, however, there are many cases where aliasing occurs in digital audio systems because someone wasn’t paying enough attention to what was happening “under the hood” in the signal processing of an audio device. This will come up later.

## Two more details to remember…

There’s an easy way to predict the output of a system that’s suffering from aliasing if your input is sinusoidal (and therefore contains only one frequency). The frequency of the output signal will be the same distance from the Nyquist frequency as the frequency if the input signal. In other words, the Nyquist frequency is like a “mirror” that “reflects” the frequency of the input signal to another frequency below Nyquist.

This can be easily seen in the upper plot of Figure 14. The distance from the Input signal and the Nyquist is the same as the distance between the output signal and the Nyquist.

Also, since that Nyquist frequency acts as a mirror, then the Input and output signal’s frequencies will move in opposite directions (this point will help later).

Usually, frequency-domain plots are done on a logarithmic scale, because this is more intuitive for we humans who hear logarithmically. (For example, we hear two consecutive octaves on a piano as having the same “interval” or “width”. We don’t hear the width of the upper octave as being twice as wide, like a measurement system does. that’s why music notation does not get wider on the top, with a really tall treble clef.) This means that it’s not as obvious that the Nyquist frequency is in the centre of the frequencies of the input signal and its alias below Nyquist.

# Typical Errors in Digital Audio: Part 2 – Dither

Reminder: This is still just the lead-up to the real topic of this series. However, we have to get some basics out of the way first…

In the last posting, I talked about digital audio (more accurately, Linear Pulse Code Modulation or LPCM digital audio) is basically just a string of stored measurements of the electrical voltage that is analogous to the audio signal, which is a change in pressure over time…

For now, we’ll say that each measurement is rounded off to the nearest possible “tick” on the ruler that we’re using to measure the voltage. That rounding results in an error. However, (assuming that everything is working correctly) that error can never be bigger than 1/2 of a “step”. Therefore, in order to reduce the amount of error, we need to increase the number of ticks on the ruler.

Now we have to introduce a new word. If we really had a ruler, we could talk about whether the ticks are 1 mm apart – or 1/16″ – or whatever. We talk about the resolution of the ruler in terms of distance between ticks. However, if we are going to be more general, we can talk about the distance between two ticks being one “quantum” – a fancy word for the smallest step size on the ruler.

So, when you’re “rounding off to the nearest value” you are “quantising” the measurement (or “quantizing” it, if you live in Noah Webster’s country and therefore you harbor the belief that wordz should be spelled like they sound – and therefore the world needz more zees). This also means that the amount of error that you get as a result of that “rounding off” is called “quantisation error“.

In some explanations of this problem, you may read that this error is called “quantisation noise”. However, this isn’t always correct. This is because if something is “noise” then is is random, and therefore impossible to predict. However, that’s not strictly the case for quantisation error. If you know the signal, and you know the quantisation values, then you’ll be able to predict exactly what the error will be. So, although that error might sound like noise, technically speaking, it’s not. This can easily be seen in Figures 1 through 3 which demonstrate that the quantisation error causes a periodic, predictable error (and therefore harmonic distortion), not a random error (and therefore noise).

Sidebar: The reason people call it quantisation noise is that, if the signal is complicated (unlike a sine wave) and high in level relative to the quantisation levels – say a recording of Britney Spears, for example – then the distortion that is generated sounds “random-ish”, which causes people to just to the conclusion that it’s noise.

Now, let’s talk about perception for a while… We humans are really good at detecting patterns – signals – in an otherwise noisy world. This is just as true with hearing as it is with vision. So, if you have a sound that exists in a truly random background noise, then you can focus on listening to the sound and ignore the noise. For example, if you (like me) are old enough to have used cassette tapes, then you can remember listening to songs with a high background noise (the “tape hiss”) – but it wasn’t too annoying because the hiss was independent of the music, and constant. However, if you, like me, have listened to Bob Marley’s live version of “No Woman No Cry” from the “Legend” album, then you, like me, would miss the the feedback in the PA system at that point in the song when the FoH engineer wasn’t paying enough attention… That noise (the howl of the feedback) is not noise – it’s a signal… Which makes it just as important as the song itself. (I could get into a long boring talk about John Cage at this point, but I’ll try to not get too distracted…)

The problem with the signal in Figure 2 is that the error (shown in Figure 3) is periodic – it’s a signal that demands attention. If the signal that I was sending into the quantisation system (in Figure 1) was a little more complicated than a sine wave – say a sine wave with an amplitude modulation – then the error would be easily “trackable” by anyone who was listening.

So, what we want to do is to quantise the signal (because we’re assuming that we can’t make a better “ruler”) but to make the error random – so it is changed from distortion to noise. We do this by adding noise to the signal before we quantise it. The result of this is that the error will be randomised, and will become independent of the original signal… So, instead of a modulating signal with modulated distortion, we get a modulated signal with constant noise – which is easier for us to ignore. (It has the added benefit of spreading the frequency content of the error over a wide frequency band, rather than being stuck on the harmonics of the original signal… but let’s not talk about that…)

## For example…

Let’s take a look at an example of this from an equivalent world – digital photography.

The photo in Figure 4 is a black and white photo – which actually means that it’s comprised of shades of gray ranging from black all the way to white. The photo has 272,640 individual pixels (because it’s 640 pixels wide and 426 pixels high). Each of those pixels is some shade of gray, but that shading does not have an infinite resolution. There are “only” 256 possible shades of gray available for each pixel.

So, each pixel has a number that can range from 0 (black) up to 255 (white).

If we were to zoom in to the top left corner of the photo and look at the values of the 64 pixels there (an 8×8 pixel square), you’d see that they are:

86 86 90 88 87 87 90 91
86 88 90 90 89 87 90 91
88 89 91 90 89 89 90 94
88 90 91 93 90 90 93 94
89 93 94 94 91 93 94 96
90 93 94 95 94 91 95 96
93 94 97 95 94 95 96 97
93 94 97 97 96 94 97 97

What if we were to reduce the available resolution so that there were fewer shades of gray between white and black? We can take the photo in Figure 1 and round the value in each pixel to the new value. For example, Figure 5 shows an example of the same photo reduced to only 4 levels of gray.

Now, if we look at those same pixels in the upper left corner, we’d see that their values are

102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102

They’ve all been quantised to the nearest available level, which is 102. (Our possible values are restricted to 0, 51, 102, 154, 205, and 255).

So, we can see that, by quantising the gray levels from 256 possible values down to only 6, we lose details in the photo. This should not be a surprise… That loss of detail means that, for example, the gentle transition from lighter to darker gray in the sky in the original is “flattened” to a light spot in a darker background, with a jagged edge at the transition between the two. Also, the details of the wall pillars between the windows are lost.

If we take our original photo and add noise to it – so were adding a random value to the value of each pixel in the original photo (I won’t talk about the range of those random values…) it will look like Figure 6. This photo has all 256 possible values of gray – the same as in Figure 1.

If we then quantise Figure 6 using our 6 possible values of gray, we get Figure 7. Notice that, although we do not have more grays than in Figure 5, we can see things like the gradual shading in the sky and some details in the walls between the tall windows.

That noise that we add to the original signal is called dither – because it is forcing the quantiser to be indecisive about which level to quantise to choose.

I should be clear here and say that dither does not eliminate quantisation error. The purpose of dither is to randomise the error, turning the quantisation error into noise instead of distortion. This makes it (among other things) independent of the signal that you’re listening to, so it’s easier for your brain to separate it from the music, and ignore it.

## Addendum: Binary basics and SNR

We normally write down our numbers using a “base 10” notation. So, when I write down 9374 – I mean
9 x 1000 + 3 x 100 + 7 x 10 + 4 x 1
or
9 x 103 + 3 x 102 + 7 x 101 + 4 x 100

We use base 10 notation – a system based on 10 digits (0 through 9) because we have 10 fingers.

If we only had 2 fingers, we would do things differently… We would only have 2 digits (0 and 1) and we would write down numbers like this:
11101

which would be the same as saying
1 x 16 + 1 x 8 + 1 x 4 + 0 x 2 + 1 x 1
or
1 x 24 + 1 x 23 + 1 x 22 + 0 x 21 + 1 x 20

The details of this are not important – but one small point is. If we’re using a base-10 system and we increase the number by one more digit – say, going from a 3-digit number to a 4-digit number, then we increase the possible number of values we can represent by a factor of 10. (in other words, there are 10 times as many possible values in the number XXXX than in XXX.)

If we’re using a base-2 system and we increase by one extra digit, we increase the number of possible values by a factor of 2. So XXXX has 2 times as many possible values as XXX.

Now, remember that the error that we generate when we quantise is no bigger than 1/2 of a quantisation step, regardless of the number of steps. So, if we double the number of steps (by adding an extra binary digit or bit to the value that we’re storing), then the signal can be twice as “far away” from the quantisation error.

This means that, by adding an extra bit to the stored value, we increase the potential signal-to-error ratio of our LPCM system by a factor of 2 – or 6.02 dB.

So, if we have a 16-bit LPCM signal, then a sine wave at the maximum level that it can be without clipping is about 6 dB/bit * 16 bits – 3 dB = 93 dB louder than the error. The reason we subtract the 3 dB from the value is that the error is +/- 0.5 of a quantisation step (normally called an “LSB” or “Least Significant Bit”).

Note as well that this calculation is just a rule of thumb. It is neither precise nor accurate, since the details of exactly what kind of error we have will have a minor effect on the actual number. However, it will be close enough.

# Typical Errors in Digital Audio: Part 1

## Introduction

Once upon a time, when I was a young whipper snapper, studying how to be a recording engineer (which is half of being a tonmeister) I had a textbook on sound recording. There were chapters in there on musical instruments, acoustics, microphones, mixing consoles, magnetic tape, and so on.. There was also a section on something called “digital audio” – but it was a portion of the chapter titled “Noise Reduction”.

Fast-forward a couple of years to 1983 and a new technology hit the market called “Compact Disc” (Here’s a fun fact for impressing people at your next dinner party: The “c” at the end of “disc” means it’s an optical medium. If it were magnetic, it would be a “disk”. So: Compact Disc, but Hard Disk.) Back then, the magazine advertisement read “Perfect Sound. Forever.” Then it hit the real world and the complaints started rolling in from people who believed that they knew things about audio. Some of these complaints were valid, and some were less so… Many of the ones that were valid no longer are, but it’s difficult to un-do a first impression.

Nowadays, it is very likely that almost-all-to-all of the music you listen to has been digital at some point in its life. Even if you’re listening to vinyl, it should not surprise you to know that the master version of the recording you’re hearing was probably stored on a hard disk or passed through a digital mixing console – or at least some of the tracks included some kind of digital processing (say, a guitar pedal or a reverb unit, for example). (I know, I know… There are exceptions. However, if you want to send me anti-digital hate mail you may not do it using a digital communication format such as e-mail. Use an analogue pen to write out your words on a piece of paper and send it to me by post. I look forward to receiving your analogue letters.)

Nowadays, a big part of my “day job” is to test (digital) audio systems to find out what’s wrong with them. So, I thought it would be interesting to do a series of postings that describe the typical kinds of errors that I look for (and find) when I’m digging down into the details.

In order to do this, I’m going to start by being a little redundant and describe the basics of how audio is converted from an analogue signal to a digital one – and hopefully address some of the misconceptions that are associated with this conversion process.

## A quick introduction to sound

At the simplest level, sound can be described as a small change in air pressure (or barometric pressure) over short periods of time. If you’d like to have a better and more edu-tain-y version of this statement with animations and pretty colours, you could take 10 minutes to watch this video, for example.

That change in pressure can be “captured” by using a microphone, that is (at the simplest level) a device that has a change in air pressure at its input and a change in electrical voltage at its output. Ignoring a lot of details, we could say that if you were to plot a measurement of the air pressure (at the input of the microphone) over time, and you were to compare it to a plot of the measurement of the voltage (at the output of the microphone) over time, you would see the same curve on the two graphs. This means that the change in voltage is analogous to the change in air pressure.

At this point in the conversation, I’ll make a point to say that, in theory, we could “zoom in” on either of those two curves shown in Figure 1 and see more and more details. This is like looking at a map of Canada – it has lots of crinkly, jagged lines. If you zoom in and look at  the map of Newfoundland and Labrador, you’ll see that it has finer, crinkly, jagged lines. If you zoom in further, and stand where the water meets the shore in Trepassey and take a photo of your feet, you could copy it to draw a map of the line of where the water comes in around the rocks – and your toes – and you would wind up with even finer, crinkly, jagged lines… You could take this even further and get down to a microscopic or molecular level – but you get the idea… The point is that, in theory, both of the plots in Figure 1 have infinite resolution, both in time and in air pressure or voltage.

Now, let’s say that you wanted to take that microphone’s output and transmit it through a bunch of devices and wires that, in theory, all do nothing to the signal. Let’s say, for example, that you take the mic’s output, send it through a wire to a box that makes the signal twice as loud. Then take the output of that box and send it through a wire to another box that makes it half as loud. You take the output of that box and send it through a wire to a measuring device. What will you see? Unfortunately, none of the wires or boxes in the chain can be perfect, so you’ll probably see the signal plus something else which we’ll call the “error” in the system’s output. We can call it the error because, if we measure the input voltage and the output voltage at any one instant, we’ll probably see that they’re not identical. Since they should be identical, then the system must be making a mistake in transmitting the signal – so it makes errors…

Pedantic Sidebar: Some people will call that error that the system adds to the signal “noise” – but I’m not going to call it that. This is because “noise” is a specific thing – noise is random – so if it’s not random, it’s not noise. Also, although the signal has been distorted (in that the output of the system is not identical to the input) I won’t call it “distortion” either, since distortion is a name that’s given to something that happens to the signal because the signal is there. (We would probably get at least some of the error out of our system even if we didn’t send any audio into it.) So, we could be slightly geeky and adequately vague and call the extra stuff “Distortion plus noise” but not “THD+N” – which stands for “Total Harmonic Distortion Plus Noise” – because not all kinds of distortion will produce a harmonic of the signal… but I’m getting ahead of myself…

So, we want to transmit (or store) the audio signal – but we want to reduce the noise caused by the transmission (or storage) system. One way to do this is to spend more money on your system. Use wires with better shielding, amplifiers with lower noise floors, bigger power supplies so that you don’t come close to their limits, run your magnetic tape twice as fast, and so on and so on. Or, you could convert the analogue signal (remember that it’s analogous to the change in air pressure over time) to one that is represented (and therefore transmitted or stored) digitally instead.

What does this mean?

## Conversion from analogue to digital and back (but skipping important details)

In the example above, we made a varying voltage that was analogous to the varying air pressure. If we wanted to store this, we could do it by varying the amount of magnetism on a wire or a coating on a tape, for example. Or we could cut a wiggly groove in a bit of vinyl that has a similar shape to the curve in the plots in Figure 1. Or, we could do something else: we could get a metronome (or a clock) and make a measurement of the voltage every time the metronome clicks, and write down the measurements.

For example, let’s zoom in on the first little bit of the signal in the plots in Figure 1

We’ll then put on a metronome and make a measurement of the voltage every time we hear the metronome click…

We can then keep the measurements (remembering how often we made them…) and write them down like this:

0.3000
0.4950
0.5089
0.3351
0.1116
0.0043
0.0678
0.2081
0.2754
0.2042
0.0730
0.0345
0.1775

We can store this series of numbers on a computer’s hard disk, for example. We can then come back tomorrow, and convert the measurements to voltages. First we read the measurements, and create the appropriate voltage…

We then make a “staircase” waveform by “holding” those voltages until the next value comes in.

All we need to do then is to use a low-pass filter to smooth out the hard edges of the staircase.

So, in this example, we’ve gone from an analogue signal (the red curve in Figure 3) to a digital signal (the series of numbers), and back to an analogue signal (the red curve in Figure 7).

In some ways, this is a bit like the way a movie works. When you watch a movie, you see a series of still photographs, probably taken at a rate of 24 pictures (or frames) per second. If you play those photos back at the same rate (24 fps or frames per second), you think you see movement. However, this is because your eyes and brain aren’t fast enough to see 24 individual photos per second – so you are fooled into thinking that things on the screen are moving.

However, digital audio is slightly different from film in two ways:

• The sound (equivalent to the movement in the film) is actually happening. It’s not a trick that relies on your ears and brain being too slow.
• If, when you were filming the movie, something were to happen between frames (say, the flash of a gunshot, for example) then it would never be caught on film. This is because the photos are discrete moments in time – and what happens between them is lost. However, if something were to make a very, very short sound between two samples (two measurements) in the digital audio signal – it would not be lost. This is because of something that happens at the beginning of the chain that I haven’t described… yet…

However, there are some “artefacts” (a fancy term for “weird errors”) that are present both in film and in digital audio that we should talk about.

The first is an error that happens when you mess around with the rate at which you take the measurements (called the “sampling rate”) or the photos (called the “frame rate”) – and, more importantly, when you need to worry about this. Let’s say that you make a film at 24 fps. If you play this back at a higher frame rate, then things will move very quickly (like old-fashioned baseball movies…). If you play them back at a lower frame rate, then things move in slow motion. So, for things to look “normal” you have to play the movie at the same rate that it was filmed. However, as longs no one is looking, you can transfer the movie as fast as you like. For example, if you wanted to copy the film, you could set up a movie camera so it was pointing at a movie screen and film the film. As long as the movie on the screen is running in sync with the camera, you can do this at any frame rate you like. But you’ll have to watch the copy at the same frame rate as the original film…

The second is an easy artefact to recognise. If you see a car accelerating from 0 to something fast on film, you’ll see the wheels of the car start to get faster and faster, then, as the car gets faster, the wheels slow down, stop, and then start going backwards… This does not happen in real life (unless you’re in a place lit with flashing lights like fluorescent bulbs or LED’s). I’ll do a posting explaining why this happens – but the thing to remember here is that the speed of the wheel rotation that you see on the film (the one that’s actually captured by the filming…) is not the real rotational speed of the wheel. However, those two rotational speeds are related to each other (and to the frame rate of the film). If you change the real rotational rate or the frame rate, you’ll change the rotational rate in the film. So, we call this effect “aliasing” because it’s a false version (an alias) of the real thing – but it’s always the same alias (assuming you repeat the conditions…) Digital audio can also suffer from aliasing, but in this case, you put in one frequency (which is actually the same as a rotational speed) and you get out another one. This is not the same as harmonic distortion, since the frequency that you get out is due to a relationship between the original frequency and the sampling rate, so the result is almost never a multiple of the input frequency.

## Some details that I left out…

One of the things I said above was something like “we measure the voltage and store the results” and the example I gave was a nice series of numbers that only had 4 digits after the decimal point. This statement has some implications that we need to discuss.

Let’s say that I have a thing that I need to measure. For example, Figure 8 shows a piece of metal, and I want to measure its width.

Using my ruler, I can see that this piece of metal is about 57 mm wide. However, if I were geeky (and I am) I would say that this is not precise enough – and therefore it’s not accurate. The problem is that my ruler is only graduated in millimetres. So, if I try to measure anything that is not exactly an integer number of mm long, I’ll either have to guess (and be wrong) or round the measurement to the nearest millimetre (and be wrong).

So, if I wanted you to make a piece of metal the same width as my piece of metal, and I used the ruler in Figure 8, we would probably wind up with metal pieces of two different widths. In order to make this better, we need a better ruler – like the one in Figure 9.

Figure 9 shows a vernier caliper (a fancy type of ruler) being used to measure the same piece of metal. The caliper has a resolution of 0.05 mm instead of the 1 mm available on the ruler in Figure 8. So, we can make a much more accurate measurement of the metal because we have a measuring device with a higher precision.

The conversion of a digital audio signal is the same. As I said above, we measure the voltage of the electrical signal, and transmit (or store) the measurement. The question is: how accurate and precise is your measurement? As we saw above, this is (partly) determined by how many digits are in the number that you use when you “write down” the measurement.

Since the voltage measurements in digital audio are recorded in binary rather than decimal (we use 0 and 1 to write down the number instead of 0 up to 9) then we use Binary digITS – or “bits” instead of decimal digits (which are not called “dits”). The number of bits we have in the number that we write down (partly) determines the precision of the measurement of the voltage – and therefore (possibly), our accuracy…

Just like the example of the ruler in Figure 8, above, we have a limited resolution in our measurement. For example, if we had only 4 bits to work with then the waveform in 4 – the one we have to measure – would be measured with the “ruler” shown on the left side of Figure 10, below.

When we do this, we have to round off the value to the nearest “tick” on our ruler, as shown in Figure 11.

Using this “ruler” which gives a write-down-able “quantity” to the measurement, we get the following values for the red staircase:

0010
0100
0100
0011
0001
0000
0001
0010
0010
0010
0001
0000
0001

When we “play these back” we get the staircase again, shown in Figure 12.

Of course, this means that, by rounding off the values, we have introduced an error in the system (just like the measurement in Figure 8 has a bigger error than the one in Figure 9). We can calculate this error if we just subtract the original signal from the output signal (in other words, Figure 12 minus Figure 10) to get Figure 13.

In order to improve our accuracy of the measurement, we have to increase the precision of the values. We can do this by adding an extra digit (or bit) to the number that we use to record the value.

If we were using decimal numbers (0-9) then adding an extra digit to the number would give us 10 times as many possibilities. (For example, if we were using 4 digits after the decimal in the example at the start of this posting, we have a total of 10,000 possible values – 0.0000 to 0.9999. If we add one more digit, we increase the resolution to 100,000 possible values – 0.00000 to 0.99999 ).

In binary, adding one extra digit gives us twice as many “ticks” on the ruler. So, using 4 bits gives us 16 possible values. Increasing to 5 bits gives us 32 possible values.

If you’re listening to a CD, then the individual measurements of each voltage – the “sample values” – are stored with 16 bits, which means that we have 65,536 possible values to pick from.

Remember that this means that we have more “ticks” on our ruler – but we don’t necessarily increase its range. So, for example, we’re still measuring a voltage from -1 V to 1 V – we just have more and more resolution to do that measurement with.

## Error #1

Finally! We get to the beginning of the point of the posting in the first place. My whole reason for starting this series of postings was to talk about errors in digital audio.

So, the first one to talk about is whether we have “bit matching” in a system where we expect to do so. For example, if you look at the S/P-DIF output of a good-old-fashioned CD player, do the sample values that are transmitted on that wire identical to the ones on the disc?

This is a fairly easy test to make (in theory). All you have to do is to record the digital signal on the S/P-DIF output of your CD player, subtract the original signal that’s on the disc (making sure that you have done your time alignment correctly). If you have anything other than nothing left over, then something went wrong somewhere.

If the result of this test is that you do NOT get nothing remaining, you cannot jump in head first and say that your S/P-DIF output is not working properly. For example, some sound cards have a sampling rate converter at their digital input. So, if you are capturing the CD player’s output using such a sound card on your computer, then perhaps the errors that you see are being produced by your sound card – and not your player.

### A little associated story

This was a method that I used to do the final testing of Wireless Power Link for B&O. I created a little software application that made a signal and sent it out digitally to a Wireless Power Link transmitter (which was running with a resolution of 24 bits – giving us 16,777,216 possible values). I then connected a Wireless Power Link receiver’s output to the same computer. The computer knew how much time it took the signal to get from its output, through the wireless transmission system, back to its input (about 5 ms). So, I took the “output” signal, delayed it by that amount, and then subtracted it from the “input” signal. I then made a detector that counted every bit (instead of every sample) that was incorrect.

The reason I was counting bit errors instead of sample errors was that we wanted to be able to diagnose problems if we found them. If you find out that “this sample is wrong” – you don’t necessarily know whether it was one or more bit errors that caused the problem. By counting bit errors, you have a little more information that can help you diagnose the source of problems when you find them.

Sidebar: since this test was running at 48 kHz and 24 bits with a 2-channel system, that means that there were 2,304,000 bits per second being checked every second

This test ran 24-hours a day continuously for over 11 days. In that time, we found 0 bit errors. That means that we got 0 errors in more than 2,189,721,600,000 bits, which was good.

Now, just before anyone gets excited: that test was run to find out whether the WPL system was able to deliver a bit-perfect output in the absence of any external disturbances. So, the transmitter and the receiver were not moved at any time during the test, and nothing was moved between them – and the result was that the system behaved perfectly.

# B&O Tech: A very brief introduction to Parametric Equalisation

#78 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Almost all sound systems offer bass and treble adjustments for the sound – these are basically coarse versions of a more general tool called an equaliser that is often used in recording studios, and are increasingly found in high-end home audio equipment.

Once upon a time, if you made a long-distance phone call, there was an actual physical connection made between the wire running out of your telephone and the telephone at the other end of the line. This caused a big problem in signal quality because a lot of high-frequency components of the signal would get attenuated along the way due to losses in the wiring. Consequently, booster circuits were made to help make the relative levels of the various frequencies more equal. As a result, these circuits became known as equalisers. Nowadays, of course, we don’t need to use equalisers to fix the quality of long-distance phone calls (mostly because the communication paths use digital encoding instead of analogue transmission), but we do use them to customise the relative balance of various frequencies in an audio signal. This happens most often in a recording studio, but equalisers can be a great personalisation tool in a playback system in the home.

The two main reasons for using equalisation in a playback system are (1) personal preference and (2) compensation for the effects of the listening room’s acoustical behaviour.

Equalisers are typically comprised of a collection of filters, each of which has up to 4 “handles” or “parameters” that can be manipulated by the user. These parameters are

• Filter Type
• Gain
• Centre Frequency
• Q

## Filter Type

The filter type will let you decide the relative levels of signals at frequencies within the band that you’re affecting.

There are up to 7 different types of filters that can be found in professional parametric equalisers. These are (in no particular order…)

• Low Pass
• High Pass
• Low shelving
• High shelving
• Band-pass
• Band-reject
• Peaking (also known as Peak/Dip or Peak/Notch)

However, for this posting, we’ll just focus on the three most-used of these:

• Low shelving
• High shelving
• Peaking

### Low Shelving Filter

In theory, a low shelving filter affects gain of all frequencies below the stated frequency by the same amount. In reality, there is a band around the stated frequency where the filter transitions between a gain of 0 dB (no change in the signal) and the gain of the affected frequency band.

Note that the low shelving filters used in the parametric equalisers in Bang & Olufsen loudspeakers define the centre frequency as being the frequency where the gain is one half the maximum (or minimum) gain of the filter. For example, in Figure 1, the gain of the filter is 6 dB. The centre frequency is the frequency where the gain is one-half this value or 3 dB, which can be found at 80 Hz.

Some care should be taken when using low shelving filters since their affected frequency bands extend to 0 Hz or DC. This can cause a system to be pushed beyond its limits in extremely low frequency bands that are of little-to-no consequence to the audio signal. Note, however, that this is less of a concern for the B&O loudspeakers, since they are protected against such abuse.

### High Shelving Filter

In theory, a high shelving filter affects gain of all frequencies above the stated frequency by the same amount. In reality, there is a band around the stated frequency where the filter transitions between a gain of 0 dB (where there is no change in the signal) and the gain of the affected frequency band.

Note again that the high shelving filters used in B&O loudspeakers define the centre frequency as being the frequency where the gain is one half the maximum (or minimum) gain of the filter. For example, in Figure 4, the gain of the filter is -6 dB. The centre frequency is the frequency where the gain is one-half this value or -3 dB, which can be found at 8 kHz.

Some care should be taken when using high shelving filters since their affected frequency bands can extend beyond the audible frequency range. This can cause a system to be pushed beyond its limits in extremely high frequency bands that are of little-to-no consequence to the audio signal.

### Peaking Filter

A peaking filter is used for a more local adjustment of a frequency band. In this case, the centre frequency of the filter is affected most (it will have the Gain of the filter applied to it) and adjacent frequencies on either side are affected less and less as you move further away. For example, Figure 5 shows the response of a peaking filter with a centre frequency of 1 kHz and gains of 6 dB (the black curve) and -6 dB (the red curve). As can be seen there, the maximum effect happens at 1 kHz and frequency bands to either side are affected less.

You may notice in Figure 5 that the black and red curves are symmetrical – in other words, they are identical except in polarity (in dB) of the gain. This is a particular type of peaking filter called a reciprocal peak/dip filter – so-called because these two filters, placed in series, can be used to cancel each other’s effects on the signal.

There are other types of peaking filters that are not reciprocal. This is true in cases where the Q is defined differently. However, we won’t get into that here. If you’d like to read about this “issue”, see this link.

## Gain

If you need to make all frequencies in your audio signal louder, then you just need to increase the volume. However, if you want to be a little more selective and make some frequency bands louder (or quieter) and leave other bands unchanged, then you’ll need an equaliser. So, one of the important questions to ask is “how much louder?” or “how much quieter?” The answer to this question is the gain of the filter — this is the amount by which is signal is increased or decreased in level.

The gain of an equaliser filter is almost always given in decibels or dB. (The “B” is a capital because it’s named after Alexander Graham Bell.) This is a scale based on logarithmic changes in level. Luckily, it’s not necessary to understand logarithms in order to have an intuitive feel for decibels. There are really just three things to remember:

• a gain of 0 dB is the same as saying “no change”
• positive decibel values are louder, negative decibel values are quieter
• adding approximately 6 dB to the gain is the same as saying “two times the level”. (Therefore, subtracting 6 dB is half the level.)

## Centre Frequency

So, the next question to answer is “which frequency bands do you want to affect?” This is partially defined by the centre frequency or Fc of the filter. This is a value that is measured in the number of cycles per second (This is literally the number of times a loudspeaker driver will move in and out of the loudspeaker cabinet per second.), labelled Hertz or Hz.

Generally, if you want to increase (or reduce) the level of the bass, then you should set the centre frequency to a low value (roughly speaking, below 125 Hz). If you want to change the level of the high frequencies, then you should set the centre frequency to a high value (say, above 8 kHz).

## Q

In all of the above filter types, there are transition bands — frequency areas where the filter’s gain is changing from 0 dB to the desired gain. Changing the filter’s Q allows you to alter the shape of this transition. The lower the Q, the smoother the transition. In both the case of the shelving filters and the peaking filter, this means that a wider band of frequencies will be affected. This can be seen in the examples in Figures 6 and 7.

It should be explained that the Q parameter can cause a shelving filter to behave a little strangely. When the Q of a shelving filter exceeds a value of 0.707 (or 1/sqrt(2)), the gain of the filter will “overshoot” its limits. For example, as can be seen in Figure 8, a filter with a gain of 6 dB and a Q of 4 will actually have a gain of almost 13 dB and will attenuate by almost 7 dB.

## Some extra information

Some people and books will say that “Q” stands for the “Quality” of the filter. This is a very old myth, but it is not true. There is a great paper worth reading called “The Story of Q” by Estill I. Green in which it is clearly stated “His [K.S. Johnson – an employee in the Engineering Dept. of the Western Electric Company, which later became Bell Telephone Laboratories.] reason for choosing Q was quite simple. He says that it did not stand for “quality factor” or anything else, but since the other letters of the alphabet had already been pre-empted for other purposes, Q was all he had left.”

For peaking filters, the Q of the filter is equal to the centre frequency divided by the filter’s bandwidth. So, if the Q of the filter is 2 and the centre frequency is 1 kHz, then the bandwidth will be 500 Hz. Another way to look at this is that, very roughly speaking, 1/Q will be the filter’s bandwidth in octaves. So, for example, a filter with a Q of 2 will have a bandwidth of about 1/2 an octave. A filter with a Q of 0.5 will have a bandwidth of about 2 octaves.

This is just a basic introduction to parametric equalisers. For more information, check out the explanation here.

# No more FTP on a Mac!????

I am not impressed! Apparently, Apple has decided that FTP shall no longer be available as a Unix command in the Terminal window as of 10.13.x (High Sierra)…

All I get is a message that says “ftp: command not found”. You have GOT to be kidding me…

Grrrrr….

According to this link I’m not alone…