Typical Errors in Digital Audio: Part 2 – Dither

Reminder: This is still just the lead-up to the real topic of this series. However, we have to get some basics out of the way first…

In the last posting, I talked about digital audio (more accurately, Linear Pulse Code Modulation or LPCM digital audio) is basically just a string of stored measurements of the electrical voltage that is analogous to the audio signal, which is a change in pressure over time…

For now, we’ll say that each measurement is rounded off to the nearest possible “tick” on the ruler that we’re using to measure the voltage. That rounding results in an error. However, (assuming that everything is working correctly) that error can never be bigger than 1/2 of a “step”. Therefore, in order to reduce the amount of error, we need to increase the number of ticks on the ruler.

Now we have to introduce a new word. If we really had a ruler, we could talk about whether the ticks are 1 mm apart – or 1/16″ – or whatever. We talk about the resolution of the ruler in terms of distance between ticks. However, if we are going to be more general, we can talk about the distance between two ticks being one “quantum” – a fancy word for the smallest step size on the ruler.

So, when you’re “rounding off to the nearest value” you are “quantising” the measurement (or “quantizing” it, if you live in Noah Webster’s country and therefore you harbor the belief that wordz should be spelled like they sound – and therefore the world needz more zees). This also means that the amount of error that you get as a result of that “rounding off” is called “quantisation error“.

In some explanations of this problem, you may read that this error is called “quantisation noise”. However, this isn’t always correct. This is because if something is “noise” then is is random, and therefore impossible to predict. However, that’s not strictly the case for quantisation error. If you know the signal, and you know the quantisation values, then you’ll be able to predict exactly what the error will be. So, although that error might sound like noise, technically speaking, it’s not. This can easily be seen in Figures 1 through 3 which demonstrate that the quantisation error causes a periodic, predictable error (and therefore harmonic distortion), not a random error (and therefore noise).

Sidebar: The reason people call it quantisation noise is that, if the signal is complicated (unlike a sine wave) and high in level relative to the quantisation levels – say a recording of Britney Spears, for example – then the distortion that is generated sounds “random-ish”, which causes people to just to the conclusion that it’s noise.

Fig 1: The first cycle of a periodic signal (in this case, a sinusoidal waveform) that we are going to quantise using a 4-bit system (notice the 4 bits in the scale on the left).

Fig 2: The same waveform shown in Figure 1 after quantisation (rounding off) in a 4-bit world.

Fig 3: The difference between Figure 2 and Figure 1. I made this by subtracting the original signal from the quantised version. This is the error in the quantised waveform – the quantisation error. Notice that it is not noise… it’s completely predictable and it will repeat with repetitions of the signal. Therefore the result of this is distortion, not noise…

Now, let’s talk about perception for a while… We humans are really good at detecting patterns – signals – in an otherwise noisy world. This is just as true with hearing as it is with vision. So, if you have a sound that exists in a truly random background noise, then you can focus on listening to the sound and ignore the noise. For example, if you (like me) are old enough to have used cassette tapes, then you can remember listening to songs with a high background noise (the “tape hiss”) – but it wasn’t too annoying because the hiss was independent of the music, and constant. However, if you, like me, have listened to Bob Marley’s live version of “No Woman No Cry” from the “Legend” album, then you, like me, would miss the the feedback in the PA system at that point in the song when the FoH engineer wasn’t paying enough attention… That noise (the howl of the feedback) is not noise – it’s a signal… Which makes it just as important as the song itself. (I could get into a long boring talk about John Cage at this point, but I’ll try to not get too distracted…)

The problem with the signal in Figure 2 is that the error (shown in Figure 3) is periodic – it’s a signal that demands attention. If the signal that I was sending into the quantisation system (in Figure 1) was a little more complicated than a sine wave – say a sine wave with an amplitude modulation – then the error would be easily “trackable” by anyone who was listening.

So, what we want to do is to quantise the signal (because we’re assuming that we can’t make a better “ruler”) but to make the error random – so it is changed from distortion to noise. We do this by adding noise to the signal before we quantise it. The result of this is that the error will be randomised, and will become independent of the original signal… So, instead of a modulating signal with modulated distortion, we get a modulated signal with constant noise – which is easier for us to ignore. (It has the added benefit of spreading the frequency content of the error over a wide frequency band, rather than being stuck on the harmonics of the original signal… but let’s not talk about that…)

For example…

Let’s take a look at an example of this from an equivalent world – digital photography.

The photo in Figure 4 is a black and white photo – which actually means that it’s comprised of shades of gray ranging from black all the way to white. The photo has 272,640 individual pixels (because it’s 640 pixels wide and 426 pixels high). Each of those pixels is some shade of gray, but that shading does not have an infinite resolution. There are “only” 256 possible shades of gray available for each pixel.

So, each pixel has a number that can range from 0 (black) up to 255 (white).

Fig 4: A photo of a building in Paris. Each pixel in this photo has one of 256 possible levels of gray – from white (255) down to black (0).

If we were to zoom in to the top left corner of the photo and look at the values of the 64 pixels there (an 8×8 pixel square), you’d see that they are:

86 86 90 88 87 87 90 91
86 88 90 90 89 87 90 91
88 89 91 90 89 89 90 94
88 90 91 93 90 90 93 94
89 93 94 94 91 93 94 96
90 93 94 95 94 91 95 96
93 94 97 95 94 95 96 97
93 94 97 97 96 94 97 97

What if we were to reduce the available resolution so that there were fewer shades of gray between white and black? We can take the photo in Figure 1 and round the value in each pixel to the new value. For example, Figure 5 shows an example of the same photo reduced to only 4 levels of gray.

Fig 5: The same photo of the same building. Each pixel in this photo has one of 4 possible levels of gray – 255 (white), 170, 85 and 0 (black). Notice that some details are lost – like the smooth transitions in the clouds, or the stripes in the marble in the pillars.

Now, if we look at those same pixels in the upper left corner, we’d see that their values are

102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102
102 102 102 102 102 102 102 102

They’ve all been quantised to the nearest available level, which is 102. (Our possible values are restricted to 0, 51, 102, 154, 205, and 255).

So, we can see that, by quantising the gray levels from 256 possible values down to only 6, we lose details in the photo. This should not be a surprise… That loss of detail means that, for example, the gentle transition from lighter to darker gray in the sky in the original is “flattened” to a light spot in a darker background, with a jagged edge at the transition between the two. Also, the details of the wall pillars between the windows are lost.

If we take our original photo and add noise to it – so were adding a random value to the value of each pixel in the original photo (I won’t talk about the range of those random values…) it will look like Figure 6. This photo has all 256 possible values of gray – the same as in Figure 1.

Fig 6: Noise. This “photo” has the same number of pixels (640 x 480) as the photo in Figure 4. We add this to the photo before asking the computer to reduce the number of “colours”.

If we then quantise Figure 6 using our 6 possible values of gray, we get Figure 7. Notice that, although we do not have more grays than in Figure 5, we can see things like the gradual shading in the sky and some details in the walls between the tall windows.

Fig 7: The same photo of the same building in Figure 4. Each pixel in this photo ALSO only has one of 6 possible levels of gray – just like in Figure 5. However, this version is the result of quantising the original photo with the noise added before quantisation. The result is admittedly noisy – but we are able to see pattens in the noise that preserve some of the details that we lost in Figure 5.

That noise that we add to the original signal is called dither – because it is forcing the quantiser to be indecisive about which level to quantise to choose.

I should be clear here and say that dither does not eliminate quantisation error. The purpose of dither is to randomise the error, turning the quantisation error into noise instead of distortion. This makes it (among other things) independent of the signal that you’re listening to, so it’s easier for your brain to separate it from the music, and ignore it.

Addendum: Binary basics and SNR

We normally write down our numbers using a “base 10” notation. So, when I write down 9374 – I mean
9 x 1000 + 3 x 100 + 7 x 10 + 4 x 1
or
9 x 10³ + 3 x 10² + 7 x 10¹ + 4 x 10⁰

We use base 10 notation – a system based on 10 digits (0 through 9) because we have 10 fingers.

If we only had 2 fingers, we would do things differently… We would only have 2 digits (0 and 1) and we would write down numbers like this:
11101

which would be the same as saying
1 x 16 + 1 x 8 + 1 x 4 + 0 x 2 + 1 x 1
or
1 x 2⁴ + 1 x 2³ + 1 x 2² + 0 x 2¹ + 1 x 2⁰

The details of this are not important – but one small point is. If we’re using a base-10 system and we increase the number by one more digit – say, going from a 3-digit number to a 4-digit number, then we increase the possible number of values we can represent by a factor of 10. (in other words, there are 10 times as many possible values in the number XXXX than in XXX.)

If we’re using a base-2 system and we increase by one extra digit, we increase the number of possible values by a factor of 2. So XXXX has 2 times as many possible values as XXX.

Now, remember that the error that we generate when we quantise is no bigger than 1/2 of a quantisation step, regardless of the number of steps. So, if we double the number of steps (by adding an extra binary digit or bit to the value that we’re storing), then the signal can be twice as “far away” from the quantisation error.

This means that, by adding an extra bit to the stored value, we increase the potential signal-to-error ratio of our LPCM system by a factor of 2 – or 6.02 dB.

So, if we have a 16-bit LPCM signal, then a sine wave at the maximum level that it can be without clipping is about 6 dB/bit * 16 bits – 3 dB = 93 dB louder than the error. The reason we subtract the 3 dB from the value is that the error is +/- 0.5 of a quantisation step (normally called an “LSB” or “Least Significant Bit”).

Note as well that this calculation is just a rule of thumb. It is neither precise nor accurate, since the details of exactly what kind of error we have will have a minor effect on the actual number. However, it will be close enough.

Typical Errors in Digital Audio: Part 1

Introduction

Once upon a time, when I was a young whipper snapper, studying how to be a recording engineer (which is half of being a tonmeister) I had a textbook on sound recording. There were chapters in there on musical instruments, acoustics, microphones, mixing consoles, magnetic tape, and so on.. There was also a section on something called “digital audio” – but it was a portion of the chapter titled “Noise Reduction”.

Fast-forward a couple of years to 1983 and a new technology hit the market called “Compact Disc” (Here’s a fun fact for impressing people at your next dinner party: The “c” at the end of “disc” means it’s an optical medium. If it were magnetic, it would be a “disk”. So: Compact Disc, but Hard Disk.) Back then, the magazine advertisement read “Perfect Sound. Forever.” Then it hit the real world and the complaints started rolling in from people who believed that they knew things about audio. Some of these complaints were valid, and some were less so… Many of the ones that were valid no longer are, but it’s difficult to un-do a first impression.

Nowadays, it is very likely that almost-all-to-all of the music you listen to has been digital at some point in its life. Even if you’re listening to vinyl, it should not surprise you to know that the master version of the recording you’re hearing was probably stored on a hard disk or passed through a digital mixing console – or at least some of the tracks included some kind of digital processing (say, a guitar pedal or a reverb unit, for example). (I know, I know… There are exceptions. However, if you want to send me anti-digital hate mail you may not do it using a digital communication format such as e-mail. Use an analogue pen to write out your words on a piece of paper and send it to me by post. I look forward to receiving your analogue letters.)

Nowadays, a big part of my “day job” is to test (digital) audio systems to find out what’s wrong with them. So, I thought it would be interesting to do a series of postings that describe the typical kinds of errors that I look for (and find) when I’m digging down into the details.

In order to do this, I’m going to start by being a little redundant and describe the basics of how audio is converted from an analogue signal to a digital one – and hopefully address some of the misconceptions that are associated with this conversion process.

A quick introduction to sound

At the simplest level, sound can be described as a small change in air pressure (or barometric pressure) over short periods of time. If you’d like to have a better and more edu-tain-y version of this statement with animations and pretty colours, you could take 10 minutes to watch this video, for example.

That change in pressure can be “captured” by using a microphone, that is (at the simplest level) a device that has a change in air pressure at its input and a change in electrical voltage at its output. Ignoring a lot of details, we could say that if you were to plot a measurement of the air pressure (at the input of the microphone) over time, and you were to compare it to a plot of the measurement of the voltage (at the output of the microphone) over time, you would see the same curve on the two graphs. This means that the change in voltage is analogous to the change in air pressure.

Fig 1. Notice that (in theory, and ignoring a lot of things…) the change in air pressure over time at the input of the microphone is identical to the change in voltage over time at its output. Of course, this is not true in real life – microphones lie like a cheap rug…

At this point in the conversation, I’ll make a point to say that, in theory, we could “zoom in” on either of those two curves shown in Figure 1 and see more and more details. This is like looking at a map of Canada – it has lots of crinkly, jagged lines. If you zoom in and look at the map of Newfoundland and Labrador, you’ll see that it has finer, crinkly, jagged lines. If you zoom in further, and stand where the water meets the shore in Trepassey and take a photo of your feet, you could copy it to draw a map of the line of where the water comes in around the rocks – and your toes – and you would wind up with even finer, crinkly, jagged lines… You could take this even further and get down to a microscopic or molecular level – but you get the idea… The point is that, in theory, both of the plots in Figure 1 have infinite resolution, both in time and in air pressure or voltage.

Now, let’s say that you wanted to take that microphone’s output and transmit it through a bunch of devices and wires that, in theory, all do nothing to the signal. Let’s say, for example, that you take the mic’s output, send it through a wire to a box that makes the signal twice as loud. Then take the output of that box and send it through a wire to another box that makes it half as loud. You take the output of that box and send it through a wire to a measuring device. What will you see? Unfortunately, none of the wires or boxes in the chain can be perfect, so you’ll probably see the signal plus something else which we’ll call the “error” in the system’s output. We can call it the error because, if we measure the input voltage and the output voltage at any one instant, we’ll probably see that they’re not identical. Since they should be identical, then the system must be making a mistake in transmitting the signal – so it makes errors…

Fig 2. If you send an audio signal through some wires and devices that (in theory) do nothing to the signal, you’ll find out that they add some extra stuff that you don’t want.

Pedantic Sidebar: Some people will call that error that the system adds to the signal “noise” – but I’m not going to call it that. This is because “noise” is a specific thing – noise is random – so if it’s not random, it’s not noise. Also, although the signal has been distorted (in that the output of the system is not identical to the input) I won’t call it “distortion” either, since distortion is a name that’s given to something that happens to the signal because the signal is there. (We would probably get at least some of the error out of our system even if we didn’t send any audio into it.) So, we could be slightly geeky and adequately vague and call the extra stuff “Distortion plus noise” but not “THD+N” – which stands for “Total Harmonic Distortion Plus Noise” – because not all kinds of distortion will produce a harmonic of the signal… but I’m getting ahead of myself…

So, we want to transmit (or store) the audio signal – but we want to reduce the noise caused by the transmission (or storage) system. One way to do this is to spend more money on your system. Use wires with better shielding, amplifiers with lower noise floors, bigger power supplies so that you don’t come close to their limits, run your magnetic tape twice as fast, and so on and so on. Or, you could convert the analogue signal (remember that it’s analogous to the change in air pressure over time) to one that is represented (and therefore transmitted or stored) digitally instead.

What does this mean?

Conversion from analogue to digital and back
(but skipping important details)

IMPORTANT: If you read this section, then please read the following postings as well. This is because, in order to keep things simple to start, I’m about to leave out some important details that I’ll add afterwards. However, if you don’t add the details, you could (understandably) jump to some incorrect conclusions (that many others before you have concluded…) So, if you don’t have time to read both sections, please don’t read either of them.

In the example above, we made a varying voltage that was analogous to the varying air pressure. If we wanted to store this, we could do it by varying the amount of magnetism on a wire or a coating on a tape, for example. Or we could cut a wiggly groove in a bit of vinyl that has a similar shape to the curve in the plots in Figure 1. Or, we could do something else: we could get a metronome (or a clock) and make a measurement of the voltage every time the metronome clicks, and write down the measurements.

For example, let’s zoom in on the first little bit of the signal in the plots in Figure 1

Fig. 3 The same curve as was shown in Figure 1 – but zoomed in to the very beginning.

We’ll then put on a metronome and make a measurement of the voltage every time we hear the metronome click…

Fig 4. The same curve (in red) measured at regular intervals (in black)

We can then keep the measurements (remembering how often we made them…) and write them down like this:

0.3000
0.4950
0.5089
0.3351
0.1116
0.0043
0.0678
0.2081
0.2754
0.2042
0.0730
0.0345
0.1775

We can store this series of numbers on a computer’s hard disk, for example. We can then come back tomorrow, and convert the measurements to voltages. First we read the measurements, and create the appropriate voltage…

Fig. 5. The voltages that we stored as measurements

We then make a “staircase” waveform by “holding” those voltages until the next value comes in.

Fig 6. We make a “staircase” curve using the voltages.

All we need to do then is to use a low-pass filter to smooth out the hard edges of the staircase.

Fig 7. When we smooth out the staircase, we get back the original signal (in red).

So, in this example, we’ve gone from an analogue signal (the red curve in Figure 3) to a digital signal (the series of numbers), and back to an analogue signal (the red curve in Figure 7).

In some ways, this is a bit like the way a movie works. When you watch a movie, you see a series of still photographs, probably taken at a rate of 24 pictures (or frames) per second. If you play those photos back at the same rate (24 fps or frames per second), you think you see movement. However, this is because your eyes and brain aren’t fast enough to see 24 individual photos per second – so you are fooled into thinking that things on the screen are moving.

However, digital audio is slightly different from film in two ways:

The sound (equivalent to the movement in the film) is actually happening. It’s not a trick that relies on your ears and brain being too slow.
If, when you were filming the movie, something were to happen between frames (say, the flash of a gunshot, for example) then it would never be caught on film. This is because the photos are discrete moments in time – and what happens between them is lost. However, if something were to make a very, very short sound between two samples (two measurements) in the digital audio signal – it would not be lost. This is because of something that happens at the beginning of the chain that I haven’t described… yet…

However, there are some “artefacts” (a fancy term for “weird errors”) that are present both in film and in digital audio that we should talk about.

The first is an error that happens when you mess around with the rate at which you take the measurements (called the “sampling rate”) or the photos (called the “frame rate”) – and, more importantly, when you need to worry about this. Let’s say that you make a film at 24 fps. If you play this back at a higher frame rate, then things will move very quickly (like old-fashioned baseball movies…). If you play them back at a lower frame rate, then things move in slow motion. So, for things to look “normal” you have to play the movie at the same rate that it was filmed. However, as longs no one is looking, you can transfer the movie as fast as you like. For example, if you wanted to copy the film, you could set up a movie camera so it was pointing at a movie screen and film the film. As long as the movie on the screen is running in sync with the camera, you can do this at any frame rate you like. But you’ll have to watch the copy at the same frame rate as the original film…

The second is an easy artefact to recognise. If you see a car accelerating from 0 to something fast on film, you’ll see the wheels of the car start to get faster and faster, then, as the car gets faster, the wheels slow down, stop, and then start going backwards… This does not happen in real life (unless you’re in a place lit with flashing lights like fluorescent bulbs or LED’s). I’ll do a posting explaining why this happens – but the thing to remember here is that the speed of the wheel rotation that you see on the film (the one that’s actually captured by the filming…) is not the real rotational speed of the wheel. However, those two rotational speeds are related to each other (and to the frame rate of the film). If you change the real rotational rate or the frame rate, you’ll change the rotational rate in the film. So, we call this effect “aliasing” because it’s a false version (an alias) of the real thing – but it’s always the same alias (assuming you repeat the conditions…) Digital audio can also suffer from aliasing, but in this case, you put in one frequency (which is actually the same as a rotational speed) and you get out another one. This is not the same as harmonic distortion, since the frequency that you get out is due to a relationship between the original frequency and the sampling rate, so the result is almost never a multiple of the input frequency.

Some details that I left out…

One of the things I said above was something like “we measure the voltage and store the results” and the example I gave was a nice series of numbers that only had 4 digits after the decimal point. This statement has some implications that we need to discuss.

Let’s say that I have a thing that I need to measure. For example, Figure 8 shows a piece of metal, and I want to measure its width.

Fig 8. A piece of metal with a width of “approximately 57 mm”.

Using my ruler, I can see that this piece of metal is about 57 mm wide. However, if I were geeky (and I am) I would say that this is not precise enough – and therefore it’s not accurate. The problem is that my ruler is only graduated in millimetres. So, if I try to measure anything that is not exactly an integer number of mm long, I’ll either have to guess (and be wrong) or round the measurement to the nearest millimetre (and be wrong).

So, if I wanted you to make a piece of metal the same width as my piece of metal, and I used the ruler in Figure 8, we would probably wind up with metal pieces of two different widths. In order to make this better, we need a better ruler – like the one in Figure 9.

Fig 9. The same piece of metal being measured with a vernier caliper. This gives us additional precision (down to 0.05 mm) so we can make a more accurate measurement.

Figure 9 shows a vernier caliper (a fancy type of ruler) being used to measure the same piece of metal. The caliper has a resolution of 0.05 mm instead of the 1 mm available on the ruler in Figure 8. So, we can make a much more accurate measurement of the metal because we have a measuring device with a higher precision.

The conversion of a digital audio signal is the same. As I said above, we measure the voltage of the electrical signal, and transmit (or store) the measurement. The question is: how accurate and precise is your measurement? As we saw above, this is (partly) determined by how many digits are in the number that you use when you “write down” the measurement.

Since the voltage measurements in digital audio are recorded in binary rather than decimal (we use 0 and 1 to write down the number instead of 0 up to 9) then we use Binary digITS – or “bits” instead of decimal digits (which are not called “dits”). The number of bits we have in the number that we write down (partly) determines the precision of the measurement of the voltage – and therefore (possibly), our accuracy…

Just like the example of the ruler in Figure 8, above, we have a limited resolution in our measurement. For example, if we had only 4 bits to work with then the waveform in 4 – the one we have to measure – would be measured with the “ruler” shown on the left side of Figure 10, below.

Fig 10: The waveform from Figure 4 as a voltage (notice the Y-axis on the right). We have to measure these values using the ruler with the resolution shown on the Y-axis on the left.

When we do this, we have to round off the value to the nearest “tick” on our ruler, as shown in Figure 11.

Fig 11: The values from figure 10 (shown as the circles) rounded off to the nearest value on our 4-bit ruler (the red staircase).

Using this “ruler” which gives a write-down-able “quantity” to the measurement, we get the following values for the red staircase:

0010
0100
0100
0011
0001
0000
0001
0010
0010
0010
0001
0000
0001

When we “play these back” we get the staircase again, shown in Figure 12.

Fig 12: The output of the measurements. Notice that all values sit exactly on one of the values for the “ruler” on the left Y-axis of the plot.

Of course, this means that, by rounding off the values, we have introduced an error in the system (just like the measurement in Figure 8 has a bigger error than the one in Figure 9). We can calculate this error if we just subtract the original signal from the output signal (in other words, Figure 12 minus Figure 10) to get Figure 13.

Fig 13: The error that we produced due to the rounding off of the signal when we did the measurements. Notice that the error is always less than 0.5 of a “tick” of the ruler on the left Y-axis.

In order to improve our accuracy of the measurement, we have to increase the precision of the values. We can do this by adding an extra digit (or bit) to the number that we use to record the value.

If we were using decimal numbers (0-9) then adding an extra digit to the number would give us 10 times as many possibilities. (For example, if we were using 4 digits after the decimal in the example at the start of this posting, we have a total of 10,000 possible values – 0.0000 to 0.9999. If we add one more digit, we increase the resolution to 100,000 possible values – 0.00000 to 0.99999 ).

In binary, adding one extra digit gives us twice as many “ticks” on the ruler. So, using 4 bits gives us 16 possible values. Increasing to 5 bits gives us 32 possible values.

If you’re listening to a CD, then the individual measurements of each voltage – the “sample values” – are stored with 16 bits, which means that we have 65,536 possible values to pick from.

Remember that this means that we have more “ticks” on our ruler – but we don’t necessarily increase its range. So, for example, we’re still measuring a voltage from -1 V to 1 V – we just have more and more resolution to do that measurement with.

Error #1

Finally! We get to the beginning of the point of the posting in the first place. My whole reason for starting this series of postings was to talk about errors in digital audio.

So, the first one to talk about is whether we have “bit matching” in a system where we expect to do so. For example, if you look at the S/P-DIF output of a good-old-fashioned CD player, do the sample values that are transmitted on that wire identical to the ones on the disc?

This is a fairly easy test to make (in theory). All you have to do is to record the digital signal on the S/P-DIF output of your CD player, subtract the original signal that’s on the disc (making sure that you have done your time alignment correctly). If you have anything other than nothing left over, then something went wrong somewhere.

If the result of this test is that you do NOT get nothing remaining, you cannot jump in head first and say that your S/P-DIF output is not working properly. For example, some sound cards have a sampling rate converter at their digital input. So, if you are capturing the CD player’s output using such a sound card on your computer, then perhaps the errors that you see are being produced by your sound card – and not your player.

A little associated story

This was a method that I used to do the final testing of Wireless Power Link for B&O. I created a little software application that made a signal and sent it out digitally to a Wireless Power Link transmitter (which was running with a resolution of 24 bits – giving us 16,777,216 possible values). I then connected a Wireless Power Link receiver’s output to the same computer. The computer knew how much time it took the signal to get from its output, through the wireless transmission system, back to its input (about 5 ms). So, I took the “output” signal, delayed it by that amount, and then subtracted it from the “input” signal. I then made a detector that counted every bit (instead of every sample) that was incorrect.

The reason I was counting bit errors instead of sample errors was that we wanted to be able to diagnose problems if we found them. If you find out that “this sample is wrong” – you don’t necessarily know whether it was one or more bit errors that caused the problem. By counting bit errors, you have a little more information that can help you diagnose the source of problems when you find them.

Sidebar: since this test was running at 48 kHz and 24 bits with a 2-channel system, that means that there were 2,304,000 bits per second being checked every second

This test ran 24-hours a day continuously for over 11 days. In that time, we found 0 bit errors. That means that we got 0 errors in more than 2,189,721,600,000 bits, which was good.

Now, just before anyone gets excited: that test was run to find out whether the WPL system was able to deliver a bit-perfect output in the absence of any external disturbances. So, the transmitter and the receiver were not moved at any time during the test, and nothing was moved between them – and the result was that the system behaved perfectly.

B&O Tech: A very brief introduction to Parametric Equalisation

#78 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Almost all sound systems offer bass and treble adjustments for the sound – these are basically coarse versions of a more general tool called an equaliser that is often used in recording studios, and are increasingly found in high-end home audio equipment.

Once upon a time, if you made a long-distance phone call, there was an actual physical connection made between the wire running out of your telephone and the telephone at the other end of the line. This caused a big problem in signal quality because a lot of high-frequency components of the signal would get attenuated along the way due to losses in the wiring. Consequently, booster circuits were made to help make the relative levels of the various frequencies more equal. As a result, these circuits became known as equalisers. Nowadays, of course, we don’t need to use equalisers to fix the quality of long-distance phone calls (mostly because the communication paths use digital encoding instead of analogue transmission), but we do use them to customise the relative balance of various frequencies in an audio signal. This happens most often in a recording studio, but equalisers can be a great personalisation tool in a playback system in the home.

The two main reasons for using equalisation in a playback system are (1) personal preference and (2) compensation for the effects of the listening room’s acoustical behaviour.

Equalisers are typically comprised of a collection of filters, each of which has up to 4 “handles” or “parameters” that can be manipulated by the user. These parameters are

Filter Type
Gain
Centre Frequency
Q

Filter Type

The filter type will let you decide the relative levels of signals at frequencies within the band that you’re affecting.

There are up to 7 different types of filters that can be found in professional parametric equalisers. These are (in no particular order…)

Low Pass
High Pass
Low shelving
High shelving
Band-pass
Band-reject
Peaking (also known as Peak/Dip or Peak/Notch)

However, for this posting, we’ll just focus on the three most-used of these:

Low shelving
High shelving
Peaking

Low Shelving Filter

In theory, a low shelving filter affects gain of all frequencies below the stated frequency by the same amount. In reality, there is a band around the stated frequency where the filter transitions between a gain of 0 dB (no change in the signal) and the gain of the affected frequency band.

Figure 2: Example of a low shelving filter with a negative gain. Frequencies below approximately 80 Hz have been affected.

Note that the low shelving filters used in the parametric equalisers in Bang & Olufsen loudspeakers define the centre frequency as being the frequency where the gain is one half the maximum (or minimum) gain of the filter. For example, in Figure 1, the gain of the filter is 6 dB. The centre frequency is the frequency where the gain is one-half this value or 3 dB, which can be found at 80 Hz.

Some care should be taken when using low shelving filters since their affected frequency bands extend to 0 Hz or DC. This can cause a system to be pushed beyond its limits in extremely low frequency bands that are of little-to-no consequence to the audio signal. Note, however, that this is less of a concern for the B&O loudspeakers, since they are protected against such abuse.

High Shelving Filter

In theory, a high shelving filter affects gain of all frequencies above the stated frequency by the same amount. In reality, there is a band around the stated frequency where the filter transitions between a gain of 0 dB (where there is no change in the signal) and the gain of the affected frequency band.

Figure 4: Example of a high shelving filter with a negative gain. Frequencies above approximately 8 kHz have been affected.

Note again that the high shelving filters used in B&O loudspeakers define the centre frequency as being the frequency where the gain is one half the maximum (or minimum) gain of the filter. For example, in Figure 4, the gain of the filter is -6 dB. The centre frequency is the frequency where the gain is one-half this value or -3 dB, which can be found at 8 kHz.

Some care should be taken when using high shelving filters since their affected frequency bands can extend beyond the audible frequency range. This can cause a system to be pushed beyond its limits in extremely high frequency bands that are of little-to-no consequence to the audio signal.

Peaking Filter

A peaking filter is used for a more local adjustment of a frequency band. In this case, the centre frequency of the filter is affected most (it will have the Gain of the filter applied to it) and adjacent frequencies on either side are affected less and less as you move further away. For example, Figure 5 shows the response of a peaking filter with a centre frequency of 1 kHz and gains of 6 dB (the black curve) and -6 dB (the red curve). As can be seen there, the maximum effect happens at 1 kHz and frequency bands to either side are affected less.

You may notice in Figure 5 that the black and red curves are symmetrical – in other words, they are identical except in polarity (in dB) of the gain. This is a particular type of peaking filter called a reciprocal peak/dip filter – so-called because these two filters, placed in series, can be used to cancel each other’s effects on the signal.

There are other types of peaking filters that are not reciprocal. This is true in cases where the Q is defined differently. However, we won’t get into that here. If you’d like to read about this “issue”, see this link.

Gain

If you need to make all frequencies in your audio signal louder, then you just need to increase the volume. However, if you want to be a little more selective and make some frequency bands louder (or quieter) and leave other bands unchanged, then you’ll need an equaliser. So, one of the important questions to ask is “how much louder?” or “how much quieter?” The answer to this question is the gain of the filter — this is the amount by which is signal is increased or decreased in level.

The gain of an equaliser filter is almost always given in decibels or dB. (The “B” is a capital because it’s named after Alexander Graham Bell.) This is a scale based on logarithmic changes in level. Luckily, it’s not necessary to understand logarithms in order to have an intuitive feel for decibels. There are really just three things to remember:

a gain of 0 dB is the same as saying “no change”
positive decibel values are louder, negative decibel values are quieter
adding approximately 6 dB to the gain is the same as saying “two times the level”. (Therefore, subtracting 6 dB is half the level.)

Centre Frequency

So, the next question to answer is “which frequency bands do you want to affect?” This is partially defined by the centre frequency or Fc of the filter. This is a value that is measured in the number of cycles per second (This is literally the number of times a loudspeaker driver will move in and out of the loudspeaker cabinet per second.), labelled Hertz or Hz.

Generally, if you want to increase (or reduce) the level of the bass, then you should set the centre frequency to a low value (roughly speaking, below 125 Hz). If you want to change the level of the high frequencies, then you should set the centre frequency to a high value (say, above 8 kHz).

Q

In all of the above filter types, there are transition bands — frequency areas where the filter’s gain is changing from 0 dB to the desired gain. Changing the filter’s Q allows you to alter the shape of this transition. The lower the Q, the smoother the transition. In both the case of the shelving filters and the peaking filter, this means that a wider band of frequencies will be affected. This can be seen in the examples in Figures 6 and 7.

Figure 6: Example of two low shelving filters. The black curve shows a filter with a Q of 0.4, the red curve shows the a filter with a Q of 1. For both filters, the centre frequency is 100 Hz and the gain is +6 dB.

Figure 7: Example of two peaking filters. The black curve shows a filter with a Q of 0.5, the red curve shows the a filter with a Q of 8. For both filters, the centre frequency is 1 kHz and the gain is +6 dB.

It should be explained that the Q parameter can cause a shelving filter to behave a little strangely. When the Q of a shelving filter exceeds a value of 0.707 (or 1/sqrt(2)), the gain of the filter will “overshoot” its limits. For example, as can be seen in Figure 8, a filter with a gain of 6 dB and a Q of 4 will actually have a gain of almost 13 dB and will attenuate by almost 7 dB.

Figure 8: Example of low-shelving filters with a Q of more than 0.707. The black curve shows a filter with a Q of 0.7 for reference, the red curves shows the a filter with Q’s of 1, 2, and 4.

Some extra information

Some people and books will say that “Q” stands for the “Quality” of the filter. This is a very old myth, but it is not true. There is a great paper worth reading called “The Story of Q” by Estill I. Green in which it is clearly stated “His [K.S. Johnson – an employee in the Engineering Dept. of the Western Electric Company, which later became Bell Telephone Laboratories.] reason for choosing Q was quite simple. He says that it did not stand for “quality factor” or anything else, but since the other letters of the alphabet had already been pre-empted for other purposes, Q was all he had left.”

For peaking filters, the Q of the filter is equal to the centre frequency divided by the filter’s bandwidth. So, if the Q of the filter is 2 and the centre frequency is 1 kHz, then the bandwidth will be 500 Hz. Another way to look at this is that, very roughly speaking, 1/Q will be the filter’s bandwidth in octaves. So, for example, a filter with a Q of 2 will have a bandwidth of about 1/2 an octave. A filter with a Q of 0.5 will have a bandwidth of about 2 octaves.

This is just a basic introduction to parametric equalisers. For more information, check out the explanation here.

B&O Tech: Beolab loudspeakers and Third-party systems

#77 in a series of articles about the technology behind Bang & Olufsen loudspeakers

I’m occasionally asked about the technical details of connecting Bang & Olufsen loudspeakers to third-party (non-B&O) sources. In the “old days”, this was slightly difficult due to connectors, adapters, and outputs. However, that was a long time ago – although beliefs often persist longer than facts…

All Bang & Olufsen “Beolab” loudspeakers are “active”. At the simplest level, this means that the amplifiers are built-in. In addition, almost all of the BeoLab loudspeakers in the current portfolio use digital signal processing. This means that the filtering and crossovers are implemented using a built-in computer instead of using resistors, capacitors, and inductors. This will be a little important later in this posting.

In order to talk about the compatibility issues surrounding the loudspeakers in the BeoLab portfolio – both with themselves and with other loudspeakers, we really need to break the discussion into two areas. The first is that of connectors and signals. The second, more problematic issue is that of “latency” (which is explained below…)

Connectors and signals

Since Beolab loudspeakers have the amplifiers built-in, you need to connect them to an analogue “line level” signal instead of the output of an amplifier.

This means that, if you have a stereo preamplifier, then you just connect the “volume-regulated” Line Output of the preamp to the RCA line inputs of the BeoLab loudspeakers. (Note that the BeoLab 3 does not have a built-in RCA connector, so you need an adapter for this). Since the BeoLab loudspeakers (except for BeoLab 5, 50, and 90) are fixed at “full volume”, then you need to ensure that your Line Output of the source is, indeed, volume-regulated. If not, things will be surprisingly loud…

In addition to the RCA Line inputs, most Beolab loudspeakers also have at least one digital audio input. The BeoLab 5 has an S/P-DIF “coaxial” input. The Beolab 17, 18, and 20 have optical digital inputs. The Beolab 50 and 90 have many options to choose from. Again, apart from the Beolab 5, 50, and 90, the loudspeakers are fixed at “full volume”, so if you are going to use the digital input for the Beolab 17, 18, or 20, you will need to enable the volume regulation of the digital output of your source, if that’s possible.

Latency

Any audio device has some inherent “latency” or “delay from the time the signal comes in until it goes out”. For some devices, this latency can be so low that we can think of it as being 0 seconds. In other words, for some devices (say, a wire, for example) the signal comes out at the same time as it comes in (as far as we’re concerned… I’m not going to get into an argument about the speed of electricity or light, since these go very fast…)

Any audio device that uses digital signal processing has some measurable (and possibly audible) latency. This is primarily due to 5 things, seen in the flowchart below.

Fig 1. The basic steps that cause latency in a digital audio system that has an analogue input and output.

Each of these 5 steps each have different amounts of latency – some of them very, very small. Some are bigger. One thing to know about digital signal processing is that, typically, in order to make the math more efficient (and therefore squeeze as much as possible out of the computing power), the samples are processed in “blocks” – not one-by-one. So, the signal comes into the input, it gets converted to individual samples, and those samples are collected into a block of 64 samples (for example) before being sent to the processing.

So, let’s say that you have a sampling rate of 44100 samples per second, and a block size of 64 samples. This then means that you send a block to the processor every 64 * 1/44100 = 1.45 ms. That block gets processed (which takes some time), and then sent as another block of 64 samples to the DAC (digital to analogue converter).

So, ignoring the latency of the conversion from- and to-analogue, in the example above, it will take 1.45 ms to get the signal into the processor, you have a 1.45 time window to do the processing, and it will take another 1.45 ms to get the signal out to the DAC. This is a total of 4.35 ms from the instant a signal gets comes into the analogue input to the moment it comes out the analogue output.

Sidebar: Of course, 4.3 ms is not a long time. If you had a loudspeaker outdoors, then adding 4.35 ms to its latency would be same delay you would incur by moving 1.5 m (or about 4.9 feet) further away. However, in terms of a stereo or multichannel audio system, 4.35 ms is an eternity. For example, if you have a correctly-configured stereo loudspeakers (with each loudspeaker 30º from centre-front, and you’re sitting in the “sweet spot”, if you delay the left loudspeaker by just 0.2 ms, then lead vocals in your pop tunes will move 10º to the right instead of being in the centre. It only takes 1.12 ms of delay in one loudspeaker to move things all the way to the opposite side. In a multichannel loudspeaker configuration (or in headphones), some of the loudspeaker pairs (e.g. Left Surround – Right Surround) result in you being even more sensitive to these so-called “inter-channel delay differences”.

Also, the amount of time required by the processing depends on what kind of processing you’re doing. In the case of Beolab 50 and 90, for example, we are using FIR filters as part of the directivity (Beam Width and Beam Direction) processing. Since this filtering extends quite low in frequency, the FIR filters are quite long – and therefore they require extra latency. To add a small amount of confusion to this discussion (as we’ll see below) this latency is switchable to be either 25 ms or 100 ms. If you want Beam Width control to extend as low in frequency as possible, you need to use the 100 ms “Long Latency” mode. However, if you need lip-synch with a non-B&O source, you should use the 25 ms “Low Latency” mode (with the consequent loss of directivity control at very low frequencies).

Latency in Beolab loudspeakers

In order to use Beolab loudspeakers with a non-B&O source (or an older B&O source) , you may need to know (and compensate for) the latency of the loudspeakers in your system. This is particularly true if you are “mixing and matching” loudspeakers: for example, using different loudspeaker models (or other brands – *gasp*) in a single multichannel configuration.

Model	A / D	Latency (ms)	Equivalent in m	Volume Regulation?
Unknown analogue	A	0	0	No
Beolab 1	A	0	0	No
Beolab 2	A	0	0	No
Beolab 3	A	0	0	No
Beolab 4	A	0	0	No
Beolab 5	D	3.97	1.37	Yes
Beolab 7 series	A	0	0	No
Beolab 9	A	0	0	No
Beolab 12 series	D	5.14	1.77	No
Beolab 14 Satellite	D	4.38	1.51	No
Beolab 17	D	4.55	1.58	No
Beolab 18 (Power Link)	D	4.82	1.70	No
Beolab 19	D	4.92	1.71	No
Beolab 20 (Power Link)	D	5.45	1.88	No
Beolab 20 (Line)	D	9.30	3.21	No
Beolab 28 (Minijack)	D	31.20	10.81	Yes
Beolab 50	D	24.38 / 99.38	8.54 / 34.88	Yes
Beolab 90	D	28.71 / 100.23	10.0 / 34.93	Yes

Table 1. The latencies and equivalent distances for various Beolab loudspeakers Notice that the analogue loudspeakers all have a latency of 0 ms.

How to Do It

I’m going to make two assumptions for the rest of this posting:

you have a stereo preamp or a surround processor / AVR that has a “Speaker Distance” or “Speaker Delay” adjustment parameter (measured from the loudspeaker location to the listening position)
it does not have a “loudspeaker latency” adjustment parameter

The simple version (that probably won’t work):

Since the latency of the various loudspeakers can be “translated” into a distance, and since AVR’s typically have a “Speaker Distance” parameter, you simply have to add the equivalent distance of the loudspeaker’s latency to the actual distance to the loudspeaker when you enter it in the menus.

For example, let’s say that you have a 5.0 channel loudspeaker configuration with the following actual speaker distances, measured in the room.

Channel	Model	Distance
Left Front	Beolab 5	3.7
Right Front	Beolab 5	3.9
Centre Front	Beolab 3	3.9
Left Surround	Beolab 17	1.6
Right Surround	Beolab 17	3.2

Table 2. An example of a simple 5.0-channel loudspeaker configuration

You then look up the equivalent distances in the first table and add the appropriate number to each loudspeaker.

Channel	Model	Distance	+	Latency Equivalent	=	Total
Left Front	Beolab 5	3.7	+	1.37	=	5.07
Right Front	Beolab 5	3.9	+	1.37	=	5.27
Centre Front	Beolab 3	3.9	+	0	=	3.9
Left Surround	Beolab 17	1.6	+	1.58	=	3.18
Right Surround	Beolab 17	3.2	+	1.58	=	4.78

Table 3. Calculating the required speaker distances to compensate for the loudspeakers’ latencies using the example in Table 2.

This technique will work fine unless the total distance that you have to enter in the AVR’s menus is greater than its maximum possible value (which is typically 10.0 m on most brands and models that I’ve seen – although there are exceptions).

So, what do you do if your AVR can’t handle a value that’s high enough? Then you need to fiddle with the numbers a bit…

The slightly-more complicated version (which might work most of the time)

When you enter the Speaker Distances in the menus of your AVR, you’re doing two things:

calibrating the delay compensation for the differences in the distances from the listening position to the individual loudspeakers
(maybe) calibrating the system to ensure that the sound arrives at the listening position at the same time as the video is displayed on the screen (therefore sending the sound out early, since it takes longer for the sound to travel to the sofa than it takes the light to get from your screen…)

That second one has a “maybe” in front of it for a couple of reasons:

this is a very small effect, and might have been decided by the manufacturer to be not worth the effort
the manufacturer of an AVR has no way of knowing the latency of the screen to which it’s attached. So, it’s possible that, by outputting the sound earlier (to compensate for the propagation delay of the sound) it’s actually making things worse (because the screen is delayed, but the AVR doesn’t know it…)

So, let’s forget about that lip-synch issue and stick with the “delay compensation for the differences in the distances” issue. Notice that I have now highlighted the word “differences” in italics twice… this is important.

The big reason for entering Speaker Distances is that you want the a sound that comes out of all loudspeakers simultaneously to reach the listening position simultaneously. This means that the closer loudspeakers have to wait for the further loudspeakers (by adding an appropriate delay to their signal path). However, if we ignore the synchronisation to another signal (specifically, the lips on the screen), then we don’t need to know the actual (or “absolute”) distance to the loudspeakers – we only need to know their differences (or “relative distances”). This means that you can consider the closest loudspeaker to have a distance of 0 m from the listening position, and you can subtract that distance from the other distances.

For example, using the table above, we could subtract the distance to the closest loudspeaker (the Left Surround loudspeaker, with a distance of 1.6 m) from all of the loudspeakers in the table, resulting in the table below.

Channel	Model	Distance	–	Closest	=	Result
Left Front	Beolab 5	3.7	–	1.6	=	2.1
Right Front	Beolab 5	3.9	–	1.6	=	2.3
Centre Front	Beolab 3	3.9	–	1.6	=	2.3
Left Surround	Beolab 17	1.6	–	1.6	=	0
Right Surround	Beolab 17	3.2	–	1.6	=	1.6

Table 4. Another version of Table 3, showing how to reduce values to fit the constraints of the AVR if necessary.

Again, you look up the equivalent distances in the first table and add the appropriate number to each loudspeaker.

Channel	Model	Distance	+	Latency equivalent	=	Total
Left Front	Beolab 5	2.1	+	1.37	=	3.47
Right Front	Beolab 5	2.3	+	1.37	=	3.67
Centre Front	Beolab 3	2.3	+	0	=	2.3
Left Surround	Beolab 17	0	+	1.58	=	1.58
Right Surround	Beolab 17	1.6	+	1.58	=	3.18

Table 5. Calculating the required speaker distances to compensate for the loudspeakers’ latencies using the example in Table 4.

As you can see in Table 5, the end results are smaller than those in Table 3 – which will help if your AVR can’t get to a high enough value for the Speaker Distance.

The only-slightly-even-more complicated version (which has a better chance of working most of the time)

Of course, the version I just described above only subtracted the smallest distance from the other distances, however, we could do this slightly differently and subtract the smallest total (actual + equivalent distance) from the totals to “force” one of the values to 0 m. This can be done as follows:

Starting with a copy of Table 3, we get a preliminary Total, and then subtract the smallest of these from all value to get our Final Speaker Distance.

Channel	Model	Distance (m)	+	Latency (m)	=	Total	–	Smallest	=	Final
Lf	BL 5	3.7	+	1.37	=	5.07	–	3.18	=	1.89
Rf	BL 5	3.9	+	1.37	=	5.27	–	3.18	=	2.09
Cf	BL 3	3.9	+	0	=	3.9	–	3.18	=	0.72
Ls	BL17	1.6	+	1.58	=	3.18	–	3.18	=	0
Rs	BL 17	3.2	+	1.58	=	4.78	–	3.18	=	1.6

Table 6. Another version of Table 3, showing how to minimise values to fit the constraints of the AVR if necessary.

Of course, if you do it the first way (as shown in Table 3) and the values are within the limits of your AVR, then you don’t need to get complicated and start subtracting. And, in many cases, if you don’t own BeoLab 50 or 90, and you don’t live in a mansion, then this will probably be okay. However… if you DO own BeoLab 50 or 90, and/or you do live in a mansion, then you should probably get used to subtracting…

Some additional information about Beolab 50 & 90

As I mentioned above, the Beolab 50 and Beolab 90 have two latency options. The “High Latency” option (100 ms) allows us to implement FIR filters that control the directivity (the Beam Width and Beam Directivity) to as low a frequency as possible. However, in this mode, the latency is so high that you will notice that the sound is behind the picture if you have a non-B&O television.* In other words, you will not have “lip-synch”.

For customers with a non-B&O television*, we have included a “Low Latency” option (25 ms) which is within the tolerable limits of lip-synch. In this mode, we are still controlling the directivity of the loudspeaker with an FIR, but it cannot go as low in frequency as the “High Latency” option.

As I mentioned above, a 100 ms latency in a loudspeaker is equivalent to placing it 34.4 m further away (ignoring the obvious implications on the speaker level). If you have a third-part source such as an AVR, it is highly unlikely that you can set a Speaker Distance in the menus to be the actual distance + 34.4 m…

So, in the case of Beolab 50 or 90, you should manually set the Latency Mode to “Low Latency” (using the setup options in the speaker’s app). This then means that you should add “only” 8.6 m to the actual distance to the loudspeaker.

Of course, if you are using the Beolab 50 or 90 alone (meaning that there is no video signal, and no other loudspeakers that need time-alignment) then this is irrelevant, and you can just set the Speaker Distance to 0 m. You can also change the loudspeakers to another preset (that you or your installer set up) that uses the High Latency mode for best performance.

Instructions on how to do this are found in the Technical Sound Guide for the Beolab 50 or the Beolab 90 via the Bang & Olufsen website at www.bang-olufsen.com.

* Here a “B&O Television” means a BeoPlay V1, Beovision 11, 14, Avant, Avant NG, Horizon, or Eclipse. Older B&O televisions are different… This will be discussed in the next blog posting.

One way to compare CODEC quality

I’m often asked about my opinion regarding sound quality vs. compression formats or sampling rates or bit depths or psychoacoustic CODEC’s or other things like that…

Of course, there are lots of ways to decide on such an opinion, depending on what parameters you use to define “sound quality” and therefore what it is you’re asking specifically…

One way to think of this is to consider that the original sound file is the “reference” (regardless of how “good” or “bad” it is…), and when you encode it somehow (say, by changing sampling rates, or making it an MP3 file, for example), AND that encoding makes it different, then the resulting difference from the original can be considered an error.

So, I took a compilation of tracks that I often use for listening to loudspeakers. This is about 13 minutes long and is made of excerpts of many different recordings and recording styles, ranging from anechoic female speech, through a cappella choral, orchestral music, jazz, hard rock, heavy metal, and hip hop. The original tracks were all taken from 44.1 kHz / 16-bit CD’s, and the compilation is a 44.1 kHz / 16 bit result. This is what we’ll call the “reference”.

I then used LAME to encode the compilation in different bitrates of MP3. I re-encoded as 320, 256, and 128 CBR (Constant Bit Rate). I also used the “–preset” option to make encodings in the “insane”, “extreme”, “standard”, and “medium” settings (I’ve included the details of this at the bottom in the “Appendix”). Three of these four presets are VBR – the “Insane” setting is a CBR 320 kbps with some tweaked parameters.

I decoded those MP3 files back to PCM, and compared them to the original, of course making sure that everything was time- and gain-aligned. (There are some small differences in the overall level of the original file and the MP3 output – which is different for different bitrates. If I did not do this, then I would be exaggerating the differences between the original and the encoded versions – so this gain difference was calculated and compensated for, before subtracting the original from the MP3.)

Let’s take a look at a plot of the sample values in the left channel of the beginning of the track.

Figure 1. The original (in black) and the decoded 128 kbps MP3 file.

The plot above shows the first 44100 samples in the track (the first second of sound). The red plot is the decoded 128 kbps MP3. The black plot (which is difficult to see because it is overlapped by the red plot – except in the signal peaks) is the original file. For example, if I zoom into the area around the beginning of the sound (say, starting around sample number 15800) then we see this

Figure 2. A close-up of a portion of Figure 1.

So, as you can see in the two plots above, the decoded 128 kbps MP3 and the original 44.1/16 file are different. But, the difference is small relative to the levels of the signals themselves. The question is, how small is the difference, exactly?

We can find this out by subtracting the original signal from the decoded MP3 output, sample by sample. The result of this is shown in the plot below.

Figure 3. The difference between the two plots in Figure 2.

Notice that the vertical scale of the plot in Figure 3 is small. This is because it shows the difference between the two lines in Figure 2, which is also quite small.

Let’s think for a minute about how I arrived at the signal in Figure 3. I subtracted the Original signal from the MP3 output. In other words:

MP3 output – Original = Difference

If we consider that the difference between the MP3 output and the Original can be thought of as an “error”, and if I move the terms in the equation above, I get the following:

MP3 output – Original = Error

Original + Error = MP3 output

So, the question is: how loud is that error relative to the signal we’re listening to? The idea here is that, the louder the error, the easier it will be to detect.

Figure 4, below, shows this level difference over time. The black curve is a running RMS level of the decoded 128 kbps MP3 file. As you can see there, it ranges from about -30 dB FS to about +10 dB FS. You may think that it’s strange that it “only” goes to -10 dB FS – but this is because the time window I’m using to calculate the RMS value of the signal is 500 ms long. The peaks of the track reach full scale, but since my time window is long, this tends to pull down the apparent level (because the peaks are short). (NB: If you want to argue about the choice of a 500 ms time window, please wait until I’ve followed up this posting with another one that divides things up by frequency band…)

The res curve in Figure 4 is a running RMS value of the Error signal – the difference between the MP3 file and the original. As you can see there, that error signal ranges from about -50 dB FS to about -30 dB FS, give or take…

Figure 4. Running measures of the level of the decoded 128 kpbs MP3 file (in black) and the error signal (in red).

We can find the running value of the difference between the level of the MP3 file and the level of the Error it contains by subtracting the black curve from the red curve. The result of this is shown in Figure 5, below.

Figure 5: The difference in level between the error signal and the decoded 128 kbps MP3 file.

So, Figure 5, therefore, shows the measure of how loud the signal is relative to the error that makes it different from the original. If this error signal were just harmonic distortion, then we could call this a measure of THD in dB. If it were just good-old-fashioned noise, like on a magnetic tape, then we could call it a signal-to-noise ratio. However, this is neither distortion or noise in the traditional sense – or, maybe more accurately, it’s both…

So, let’s call the plot in Figure 5 a “signal-to-error ratio”. What we can see there is that, for this particular track, for the settings that I used to make the 128 kbps MP3 file, the error – the MP3 artefacts – are only 20 to 25 dB below the signal most of the time. Now, don’t jump to conclusions here. This does not mean that they would be as audible as white noise that is only 25 dB below the signal. This is because part of the “magic” of the MP3 encoder is that it tries to ensure that the error can “hide” under the signal by placing the error signal in the same frequency band(s) as the signal. Typically, white noise is in a different band than the signal, so it’s easier to hear because it’s not masked. So, be very careful about interpreting this plot. This is a measurable signal-to-error ratio, but it cannot be directly compared to a signal-to-noise ratio.

Let’s now increase the bitrate of the MP3 encoding, allowing the encoder to increase the quality.

Figure 6. A running RMS of a decoded 256 kbps MP3 file (black) and the difference between that signal and the original (red).

Figure 7: The Signal-to-Error ratio of a 256 kbps MP3 file.

Figure 6 and 7 show the same information as before, but for a 256 kpbs encoding of the same track. As you can see there, by doubling the bitrate of the MP3, we have increased our signal-to-error ratio by about 10 to 15 dB or so – to about 35 or 40 dB.

Figure 8: A running RMS of a decoded 320 kbps MP3 file (black) and the difference between that signal and the original (red).

Figure 9: The Signal-to-Error ratio of a 320 kbps MP3 file.

As you can see in Figures 8 and 9 above, increasing the MP3 bitrate to 320 kbps can improve the Signal-to-Error ratio from about 25 dB (for 128 kbps) to about 40 dB or so.

Now, if you’re looking carefully, you might notice that, some times in the track that I used for testing, the signal-to-error ratio is actually worse for the 320 kbps file than it is for the 256 kbps file – all other things being equal in the LAME converter parameters. This is a bit misleading, since what you cannot see there is the frequency spectrum of the error signal. I’ll deal with that in a future posting – with some more analysis and explanation to go with it.

For now, let’s play with the VBR presets in LAME. I’ll just show the signal-to-error plots for the 4 settings.

Figure 10: The Signal-to-Error ratio of an MP3 file converted using LAME’s “medium” quality preset.

Figure 11: The Signal-to-Error ratio of an MP3 file converted using LAME’s “standard” quality preset.

Figure 12: The Signal-to-Error ratio of an MP3 file converted using LAME’s “extreme” quality preset.

Figure 13: The Signal-to-Error ratio of an MP3 file converted using LAME’s “insane” quality preset.

So, as you can see in Figures 10 through 13, the signal-to-error ratio can be improved with the VBR presets, reaching a peak of over 60 dB for the “Insane” setting, for this track…

As I said a couple of times above:

You have to be careful about interpreting these graphs from a background of “knowing” what a SNR is… This error is not normal “distortion” or “noise” – at least from a perceptual point of view…
I’ll go further with this, including some frequency-dependent information in a future posting.

Appendix – LAME parameters and verbose output

For the geeks…

MAC60090:mp3_demos ggm$ lame -b 320 -q 0 –verbose compilation_original.wav lame_320.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

Using polyphase lowpass filter, transition band: 20094 Hz – 20627 Hz

Encoding compilation_original.wav to lame_320.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (4.4x) 320 kbps qval=0

misc:

scaling: 1

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=0

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: off

constant bitrate – CBR

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: -10 dB

adjust masking short: -11 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 1

^ amplification: 2

^ stopping: 1

ATH: using

^ type: 4

^ shape: 0 (only for type 4)

^ level adjustement: -12 dB

^ adjust type: 3

^ adjust sensitivity power: 1.000000

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=0.5 dB

using temporal masking effect: yes

interchannel masking ratio: 0

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 2:07/ 2:07| 2:08/ 2:08| 7.5929x| 0:00

————————————————————————————————–

kbps LR MS % long switch short %

320.0 73.7 26.3 93.4 3.4 3.1

Writing LAME Tag…done

ReplayGain: -2.6dB

MAC60090:mp3_demos ggm$ lame -b 256 -q 0 –verbose compilation_original.wav lame_256.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

Using polyphase lowpass filter, transition band: 19383 Hz – 19916 Hz

Encoding compilation_original.wav to lame_256.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (5.5x) 256 kbps qval=0

misc:

scaling: 1

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=0

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: off

constant bitrate – CBR

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: -8 dB

adjust masking short: -8.8 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 1

^ amplification: 2

^ stopping: 1

ATH: using

^ type: 4

^ shape: 1 (only for type 4)

^ level adjustement: -10 dB

^ adjust type: 3

^ adjust sensitivity power: 1.000000

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=0.5 dB

using temporal masking effect: yes

interchannel masking ratio: 0

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 1:50/ 1:50| 1:51/ 1:51| 8.7235x| 0:00

————————————————————————————————–

kbps LR MS % long switch short %

256.0 71.6 28.4 93.4 3.4 3.1

Writing LAME Tag…done

ReplayGain: -2.6dB

MAC60090:mp3_demos ggm$ lame -b 128 -q 0 –verbose compilation_original.wav lame_128.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

Using polyphase lowpass filter, transition band: 16538 Hz – 17071 Hz

Encoding compilation_original.wav to lame_128.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (11x) 128 kbps qval=0

misc:

scaling: 0.95

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=0

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: off

constant bitrate – CBR

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: 0 dB

adjust masking short: 0 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 2

^ amplification: 2

^ stopping: 1

ATH: using

^ type: 4

^ shape: 4 (only for type 4)

^ level adjustement: -3 dB

^ adjust type: 3

^ adjust sensitivity power: 1.000000

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=0.5 dB

using temporal masking effect: yes

interchannel masking ratio: 0.0002

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 1:33/ 1:33| 1:34/ 1:34| 10.305x| 0:00

————————————————————————————————–

kbps LR MS % long switch short %

128.0 25.2 74.8 95.2 2.6 2.2

Writing LAME Tag…done

ReplayGain: -2.2dB

MAC60090:mp3_demos ggm$ lame –preset medium –verbose compilation_original.wav lame_medium.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

Using polyphase lowpass filter, transition band: 17249 Hz – 17782 Hz

Encoding compilation_original.wav to lame_medium.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=4)

misc:

scaling: 1

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=1

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: all

variable bitrate – VBR mtrh (default)

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: 0 dB

adjust masking short: 0 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 1

^ amplification: 2

^ stopping: 1

ATH: using

^ type: 5

^ shape: 2 (only for type 4)

^ level adjustement: -0 dB

^ adjust type: 3

^ adjust sensitivity power: 6.309574

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=3.5 dB

using temporal masking effect: no

interchannel masking ratio: 0

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 0:18/ 0:18| 0:19/ 0:19| 53.116x| 0:00

32 [ 37] %

40 [ 4] *

48 [ 14] %

56 [ 8] %

64 [ 105] %

80 [ 423] %*

96 [ 831] %***

112 [ 2596] %%%********

128 [17134] %%%%%%%%%%%%%%%%%%%%***********************************************

160 [12811] %%%%%%%%%%%%%%%%%%%%%%%%***************************

192 [ 1330] %%****

224 [ 836] %%**

256 [ 683] %**

320 [ 216] %

——————————————————————————-

kbps LR MS % long switch short %

144.3 35.5 64.5 90.7 4.6 4.7

Writing LAME Tag…done

ReplayGain: -2.6dB

MAC60090:mp3_demos ggm$ lame –preset standard –verbose compilation_original.wav lame_standard.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

Using polyphase lowpass filter, transition band: 18671 Hz – 19205 Hz

Encoding compilation_original.wav to lame_standard.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=2)

misc:

scaling: 1

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=0

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: all

variable bitrate – VBR mtrh (default)

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: -2.6 dB

adjust masking short: -2.6 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 1

^ amplification: 2

^ stopping: 1

ATH: using

^ type: 5

^ shape: 2 (only for type 4)

^ level adjustement: -3.7 dB

^ adjust type: 3

^ adjust sensitivity power: 1.995262

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=6.25 dB

using temporal masking effect: no

interchannel masking ratio: 0

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 0:19/ 0:19| 0:20/ 0:20| 48.732x| 0:00

32 [ 0]

40 [ 0]

48 [ 1] %

56 [ 0]

64 [ 15] %

80 [ 26] %

96 [ 17] %

112 [ 135] %

128 [ 1673] %*******

160 [15048] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*****************************

192 [15688] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*****************

224 [ 1986] %%%%%****

256 [ 1602] %%%%***

320 [ 837] %%**

——————————————————————————-

kbps LR MS % long switch short %

183.0 60.0 40.0 90.7 4.6 4.7

Writing LAME Tag…done

ReplayGain: -2.6dB

MAC60090:mp3_demos ggm$ lame –preset extreme –verbose compilation_original.wav lame_extreme.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

polyphase lowpass filter disabled

Encoding compilation_original.wav to lame_extreme.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=0)

misc:

scaling: 1

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=0

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: all

variable bitrate – VBR mtrh (default)

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: -6.8 dB

adjust masking short: -6.8 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 1

^ amplification: 2

^ stopping: 1

ATH: using

^ type: 5

^ shape: 1 (only for type 4)

^ level adjustement: -7.1 dB

^ adjust type: 3

^ adjust sensitivity power: 1.000000

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=8.25 dB

using temporal masking effect: no

interchannel masking ratio: 0

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 0:21/ 0:21| 0:22/ 0:22| 44.584x| 0:00

32 [ 0]

40 [ 0]

48 [ 0]

56 [ 0]

64 [ 0]

80 [ 0]

96 [ 0]

112 [ 1] %

128 [ 0]

160 [ 408] %*

192 [ 1961] %%******

224 [16481] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%***************

256 [13387] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*************

320 [ 4790] %%%%%%%%%%%%%*******

——————————————————————————-

kbps LR MS % long switch short %

245.6 70.9 29.1 90.7 4.6 4.7

Writing LAME Tag…done

ReplayGain: -2.6dB

MAC60090:mp3_demos ggm$ lame –preset insane –verbose compilation_original.wav lame_insane.mp3

LAME 3.99.5 64bits (http://lame.sf.net)

Using polyphase lowpass filter, transition band: 20094 Hz – 20627 Hz

Encoding compilation_original.wav to lame_insane.mp3

Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (4.4x) 320 kbps qval=3

misc:

scaling: 1

ch0 (left) scaling: 1

ch1 (right) scaling: 1

huffman search: best (outside loop)

experimental Y=0

…

stream format:

MPEG-1 Layer 3

2 channel – joint stereo

padding: off

constant bitrate – CBR

using LAME Tag

…

psychoacoustic:

using short blocks: channel coupled

subblock gain: 1

adjust masking: -10 dB

adjust masking short: -11 dB

quantization comparison: 9

^ comparison short blocks: 9

noise shaping: 1

^ amplification: 1

^ stopping: 1

ATH: using

^ type: 4

^ shape: 0 (only for type 4)

^ level adjustement: -12 dB

^ adjust type: 3

^ adjust sensitivity power: 1.000000

experimental psy tunings by Naoki Shibata

adjust masking bass=-0.5 dB, alto=-0.25 dB, treble=-0.025 dB, sfb21=0.5 dB

using temporal masking effect: yes

interchannel masking ratio: 0

…

Frame | CPU time/estim | REAL time/estim | play/CPU | ETA

37028/37028 (100%)| 0:28/ 0:28| 0:28/ 0:28| 33.937x| 0:00

——————————————————————————-

kbps LR MS % long switch short %

320.0 73.7 26.3 93.4 3.4 3.1

Writing LAME Tag…done

ReplayGain: -2.6dB

B&O Tech: BeoLab 50 – Set-up procedure

#69 in a series of articles about the technology behind Bang & Olufsen loudspeakers

B&O Tech: BeoLab 50 – Introduction to Features

#68 in a series of articles about the technology behind Bang & Olufsen loudspeakers

B&O Tech: BeoLab 50’s Beam Width Control

#65 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Although active Beam Width Control is a feature that was first released with the BeoLab 90 in November of 2015, the question of loudspeaker directivity has been a primary concern in Bang & Olufsen’s acoustics research and development for decades.

As a primer, for a history of loudspeaker directivity at B&O, please read the article in the book downloadable at this site. You can read about the directivity in the BeoLab 5 here, or about the development Beam Width Control in BeoLab 90 here and here.

Bang & Olufsen has just released its second loudspeaker with Beam Width Control – the BeoLab 50. This loudspeaker borrows some techniques from the BeoLab 90, and introduces a new method of controlling horizontal directivity: a moveable Acoustic Lens.

Fig 1: BeoLab 50 PT1. The “PT” stands for ProtoType. This was the very first full-sized working model of the BeoLab 50, assembled from parts made using a 3D printer.

The three woofers and three midrange drivers of the BeoLab 50 (seen above in Figure 1) are each driven by its own amplifier, DAC and signal processing chain. This allows us to create a custom digital filter for each driver that allows us to control not only its magnitude response, but its behaviour both in time and phase (vs. frequency). This means that, just as in the BeoLab 90, the drivers can either cancel each other’s signals, or work together, in different directions radiating outwards from the loudspeaker. This means that, by manipulating the filters in the DSP (the Digital Signal Processing) chain, the loudspeaker can either produce a narrow or a wide beam of sound in the horizontal plane, according to the preferences of the listener.

Fig 2: Horizontal directivity of the BeoLab 50 in Narrow mode. Contour lines are in steps of 3 dB and are normalised to the on-axis response.

Fig 3: Horizontal directivity of the BeoLab 50 in Wide mode. Contour lines are in steps of 3 dB and are normalised to the on-axis response.

You’ll see that there is only one tweeter, and it is placed in an Acoustic Lens that is somewhat similar to the one that was first used in the BeoLab 5 in 2002. However, BeoLab 50’s Acoustic Lens is considerably different in a couple of respects.

Firstly, the geometry of the Lens has been completely re-engineered, resulting in a significant improvement in its behaviour over the frequency range of the loudspeaker driver. One of the obvious results of this change is its diameter – it’s much larger than the lens on the tweeter of the BeoLab 5. In addition, if you were to slice the BeoLab 50 Lens vertically, you will see that the shape of the curve has changed as well.

However, the Acoustic Lens was originally designed to ensure that the horizontal width of sound radiating from a tweeter was not only more like itself over a wider frequency range – but that it was also quite wide when compared to a conventional tweeter. So what’s an Acoustic Lens doing on a loudspeaker that can also be used in a Narrow mode? Well, another update to the Acoustic Lens is the movable “cheeks” on either side of the tweeter. These can be angled to a more narrow position that focuses the beam width of the tweeter to match the width of the midrange drivers.

Fig 4: Acoustic Lens in “narrow” mode on a later prototype of the BeoLab 50. You can see that this is a prototype, since the disc under the lens does not align very well with the top of the loudspeaker.

In Wide Mode, the sides of the lens open up to produce a wider radiation pattern, just as in the original Acoustic Lens.

Fig 5: Acoustic Lens in “wide” mode on a later prototype of the BeoLab 50. You can see that this is a prototype, since the disc under the lens does not align very well with the top of the loudspeaker.

So, the BeoLab 50 provides a selectable Beam Width, but does so not only “merely” by changing filters in the DSP, but also with moving mechanical components.

Of course, changing the geometry of the Lens not only alters the directivity, but it changes the magnitude response of the tweeter as well – even in a free field (a theoretical, infinitely large room that is free of reflections). As a result, it was necessary to have a different tuning of the signal sent to the tweeter in order to compensate for that difference and ensure that the overall “sound” of the BeoLab 50 does not change when switching between the two beam widths. This is similar to what is done in the Active Room Compensation, where a different filter is required to compensate for the room’s acoustical behaviour for each beam width. This is because, at least as far as the room is concerned, changing the beam width changes how the loudspeaker couples to the room at different frequencies.

Probability Density Functions, Part 3

In the last posting, I talked about the effects of a bandpass filter on the probability density function (PDF) of an audio signal. This left the open issue of other filter types. So, below is the continuation of the discussion…

I made noise signals (length 2^16 samples, fs=2^16) with different PDFs, and filtered them as if I were building a three-way loudspeaker with a 4th order Linkwitz-Riley crossover (without including the compensation for the natural responses of the drivers). The crossover frequencies were 200 Hz and 2 kHz (which are just representative, arbitrary values).

So, the filter magnitude responses looked like Figure 1.

Fig 1: Magnitude responses of the three filter banks used to process the noise signals.

The resulting effects on the probability distribution functions are shown below. (Check the last posting for plots of the PDFs of the full-band signals – however note that I made new noise signals, so the magnitude responses won’t match directly.)

The magnitude responses shown in the plots below have been 1/3-octave smoothed – otherwise they look really noisy.

Fig 2: PDFs of a noise signal with a rectangular distribution that has been split into the three bands shown in Figure 1. Note the DC offset of the signal, visible in the low-pass output’s PDF.

Fig 3: PDFs of a noise signal with a linear distribution that has been split into the three bands shown in Figure 1. Note the DC offset of the signal, visible in the low-pass output’s PDF.

Fig 4: PDFs of a noise signal with a triangular distribution that has been split into the three bands shown in Figure 1.

Fig 5: PDFs of a noise signal with an exponential distribution that has been split into the three bands shown in Figure 1. Note the DC offset of the signal, visible in the low-pass output’s PDF.

Fig 6: PDFs of a noise signal with a Laplacian distribution that has been split into the three bands shown in Figure 1.

Fig 7: PDFs of a noise signal with a Gaussian distribution that has been split into the three bands shown in Figure 1.

Post-script

This posting has a Part 1 that you’ll find here and a Part 2 that you’ll find here.

Probability Density Functions, Part 2

In a previous posting, I showed some plots that displayed the probability density functions (or PDF) of a number of commercial audio recordings. (If you are new to the concept of a probability density function, then you might want to at least have a look at that posting before reading further…)

I’ve been doing a little more work on this subject, with some possible implications on how to interpret those plots. Or, perhaps more specifically, with some possible implications on possible conclusions to be drawn from those plots.

Full-band examples

To start, let’s create some noise with a desired PDF, without imposing any frequency limitations on the signal.

To do this, I’ve ported equations from “Computer Music: Synthesis, Composition, and Performance” by Charles Dodge and Thomas A. Jerse, Schirmer Books, New York (1985) to Matlab. That code is shown below in italics, in case you might want to use it. (No promises are made regarding the code quality… However, I will say that I’ve written the code to be easily understandable, rather than efficient – so don’t make fun of me.) I’ve made the length of the noise samples 2^16 because I like that number. (Actually, it’s for other reasons involving plotting the results of an FFT, and my own laziness regarding frequency scaling – but that’s my business.)

Uniform (aka Rectangular) Distribution

uniform = rand(2^16, 1);

Fig 1: The PDF and the spectrum (1-octave smoothed) of a noise signal with a rectangular distribution. Note that there is a DC component, since there are no negative values in the signal.

Of course, as you can see in the plots in Figure 1, the signal is not “perfectly” rectangular, nor is it “perfectly” flat. This is because it’s noise. If I ran exactly the same code again, the result would be different, but also neither perfectly rectangular nor flat. Of course, if I ran the code repeatedly, and averaged the results, the average would become “better” and “better”.

Linear Distribution

linear_temp_1 = rand(2^16, 1);

linear_temp_2 = rand(2^16, 1);

temp_indices = find(linear_temp_1 < linear_temp_2);

linear = linear_temp_2;

linear(temp_indices) = linear_temp_1(temp_indices);

Fig 2: The PDF and the spectrum (1-octave smoothed) of a noise signal with a linear distribution. Note that there is a DC component, since there are no negative values in the signal.

Triangular Distribution

triangular = rand(2^16, 1) – rand(2^16, 1);

Exponential Distribution

lambda = 1; % lambda must be greater than 0

exponential_temp = rand(2^16, 1) / lambda;

if any(exponential_temp == 0) % ensure that no values of exponential_temp are 0

error(‘Please try again…’)

end

exponential = -log(exponential_temp);

Fig 4: The PDF and the spectrum (1-octave smoothed) of a rise signal with an exponential distribution. Note that there is a DC component, since there are no negative values in the signal. Note as well that the values can be significantly higher than 1, so you might incur clipping if you use this without thinking…

Bilateral Exponential Distribution (aka Laplacian)

lambda = 1; % must be greater than 0

bilex_temp = 2 * rand(2^16, 1);

% check that no values of bilex_temp are 0 or 2

if any(bilex_temp == 0)

error(‘Please try again…’)

end

bilex_lessthan1 = find(bilex_temp <= 1);

bilex(bilex_lessthan1, 1) = log(bilex_temp(bilex_lessthan1)) / lambda;

bilex_greaterthan1 = find(bilex_temp > 1);

bilex_temp(bilex_greaterthan1) = 2 – bilex_temp(bilex_greaterthan1);

bilex(bilex_greaterthan1, 1) = -log(bilex_temp(bilex_greaterthan1)) / lambda;

Fig 5: The PDF and the spectrum (1-octave smoothed) of a rise signal with a bilateral exponential distribution. Note that there is no DC component, since the PDF is symmetrical across the 0 line. Note as well that the values can be significantly higher than 1 (or less than -1), so you might incur clipping if you use this without thinking…

Gaussian

sigma = 1;

xmu = 0; % offset

n = 100; % number of random number vectors used to create final vector (more is better)

xnover = n/2;

sc = 1/sqrt(n/12);

total = sum(rand(2^16, n), 2);

gaussian = sigma * sc * (total – xnover) + xmu;

Fig 6: The PDF and the spectrum (1-octave smoothed) of a rise signal with a Gaussian distribution. Note that there is no DC component, since the PDF is symmetrical across the 0 line. Note as well that the values can be significantly higher than 1 (or less than -1), so you might incur clipping if you use this without thinking…

Of course, if you are using Matlab, there is an easier way to get a noise signal with a Gaussian PDF, and that is to use the randn() function.

The effects of band-passing the signals

What happens to the probability distribution of the signals if we band-limit them? For example, let’s take the signals that were plotted above, and put them through two sets of two second-order Butterworth filters in series, one set producing a high-pass filter at 200 Hz and the other resulting in a low-pass filter at 2 kHz .(This is the same as if we were making a mid-range signal in a 4th-order Linkwitz-Riley crossover, assuming that our midrange drivers had flat magnitude responses far beyond our crossover frequencies, and therefore required no correction in the crossover…)

What happens to our PDF’s as a result of the band limiting? Let’s see…

Fig 7: The PDF of noise with a rectangular distribution that has been band-limited from 200 Hz to 2 kHz.

Fig 8: The PDF of noise with a linear distribution that has been band-limited from 200 Hz to 2 kHz.

Fig 9: The PDF of noise with a triangular distribution that has been band-limited from 200 Hz to 2 kHz.

Fig 10: The PDF of noise with an exponential distribution that has been band-limited from 200 Hz to 2 kHz.

Fig 11: The PDF of noise with a Laplacian distribution that has been band-limited from 200 Hz to 2 kHz.

Fig 12: The PDF of noise with a Gaussian distribution that has been band-limited from 200 Hz to 2 kHz.

So, what we can see in Figures 7 through 12 (inclusive) is that, regardless of the original PDF of the signal, if you band-limit it, the result has a Gaussian distribution.

And yes, I tried other bandwidths and filter slopes. The result, generally speaking, is the same.

One part of this effect is a little obvious. The high-pass filter (in this case, at 200 Hz) removes the DC component, which makes all of the PDF’s symmetrical around the 0 line.

However, the “punch line” is that, regardless of the distribution of the signal coming into your system (and that can be quite different from song to song as I showed in this posting) the PDF of the signal after band-limiting (say, being sent to your loudspeaker drivers) will be Gaussian-ish.

And, before you ask, “what if you had only put in a high-pass or a low-pass filter?” – that answer is coming in a later posting…

Post-script

This posting has a Part 1 that you’ll find here, and a Part 3 that you’ll find here.

For example…

Addendum: Binary basics and SNR

Introduction

A quick introduction to sound

Conversion from analogue to digital and back (but skipping important details)

Some details that I left out…

Error #1

A little associated story

Filter Type

Low Shelving Filter

High Shelving Filter

Peaking Filter

Gain

Centre Frequency

Q

Some extra information

Connectors and signals

Latency

Latency in Beolab loudspeakers

How to Do It

The simple version (that probably won’t work):

The slightly-more complicated version (which might work most of the time)

The only-slightly-even-more complicated version (which has a better chance of working most of the time)

Some additional information about Beolab 50 & 90

Appendix – LAME parameters and verbose output

Post-script

Full-band examples

Uniform (aka Rectangular) Distribution

Linear Distribution

Triangular Distribution

Exponential Distribution

Bilateral Exponential Distribution (aka Laplacian)

Gaussian

The effects of band-passing the signals

Post-script

Conversion from analogue to digital and back
(but skipping important details)