B&O Tech: Naked Truth IV

#29 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Sorry – I’ve been busy lately, so I haven’t been too active on the blog.

Here are some internal shots of the BeoLab 17 and BeoLab 20 loudspeakers. As you can see in the shot of the back of the BeoLab 17, the entire case is the enclosure is for the woofer. The tweeter has its own enclosure which seals it from the woofer cabinet.

What’s not obvious in the photos of the BeoLab 20 is that the midrange and woofer cabinets are separate sealed boxes. There is a bulkhead that separates the two enclosures cutting across the loudspeaker just below the midrange driver.

Cymatics v2

Cymatics v1

B&O Tech: How B&O Makes a Loudspeaker – Part 2/2

#31 in a series of articles about the technology behind Bang & Olufsen loudspeakers

from www.recordere.dk when they visited Struer for the BeoLab 20 launch.

B&O Tech: Near… Far…

#27 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Introduction

To begin with, please watch the following video.

One thing to notice is how they made Grover sound near and far. Two things change in his voice (yes, yes, I know. It’s not ACTUALLY Grover’s voice. It’s really Yoda’s). The first change is the level – but if you’re focus on only that you’ll notice that it doesn’t really change so much. Grover is a little louder when he’s near than when he’s far. However, there’s another change that’s more important – the level of the reverberation relative to the level of the “dry” voice (what recording engineers sometimes call the “wet/dry mix”). When Grover is near, the sound is quite “dry” – there’s very little reverberation. When Grover is far, you hear much more of the room (more likely actually a spring or a plate reverb unit, given that this was made in the 1970’s).

This is a trick that has been used by recording engineers for decades. You can simulate distance in a mix by adding reverb to the sound. For example, listen to the drums and horns in the studio version of Penguins by Lyle Lovett. Then listen to the live version of the same people playing the same tune. Of course, there are lots of things (other than reverb) that are different between these two recordings – but it’s a good start for a comparison. As another example, compare this recording to this recording. Of course, these are different recordings of different people singing different songs – but the thing to listen for is the wet/dry mix and the perception of distance in the mix. Another example is this recording compared to this recording.

So, why does this trick work? The answer lies inside your brain – so we’ll have to look there first.

Distance Perception in the Mix

If you’re in a room with your eyes closed, and someone in the room starts talking to you, you’ll be pretty good at estimating where they are in the room – both in terms of angular location (you can point at them) and distance. This is true, even if you’ve never been in the room before. Very generally speaking, what’s going on here is that your brain is automatically comparing:

the two sounds coming into your two ears – the difference between these two signals tells you a lot about which direction the sound is coming from, AND
the direct sound from the source to the reflected sound coming from the room. This comparison gives you lots of information about a sound source’s distance and the size and acoustical characteristics of the room itself.

If we do the same thing in an anechoic chamber (a room where there are no echoes, because the walls absorb all sound) you will still be good at estimating the angle to the sound source (because you still have two ears), but you will fail miserably at the distance estimation (because there are no reflections to help you figure this out).

If you want to try this in real life, go outside (away from any big walls), close your eyes, and try to focus on how far away the sound sources appear to be. You have to work a little to force yourself to ignore the fact that you know where they really are – but when you do, you’ll find that things sound much closer than they are. This is because outdoors is relatively anechoic. If you go to the middle of a frozen lake that’s covered in fluffy snow, you’ll come as close as you’ll probably get to an anechoic environment in real life. (unless you do this as a hobby)

So, the moral of the story here is that, if you’re doing a recording and you want to make things sound far away, add reflections and reverberation – or at least make them louder and the direct sound quieter.

Distance Perception in the Listening Room

Let’s go back to that example of the studio recording of Lyle Lovett recording of Penguins. If you sit in your listening room and play that recording out of a pair of loudspeakers, how far away do the drums and horns sound relative to you? Now we’re not talking about whether one sounds further away than the other within the mix. I’m asking, “If you close your eyes and try to guess how far away the snare drum is from your listening position – what would you guess?”

For many people, the answer will be approximately as far away as the loudspeakers. So, if your loudspeakers are 3 m from the listening position, the horns (in that recording) will sound about 3 m away as well. However, this is not necessarily the case. Remember that the perception of distance is dependent on the relative levels of the direct and reflected sounds at your ears. So, if you listen to that recording in an anechoic chamber, the horns will sound closer than the loudspeakers (because there are no reflections to tell you how far away things are). The more reflective the room’s surfaces, the more the horns will sound further away (but probably no further than the loudspeakers, since the recording is quite dry).

This effect can also be the result of the width of the loudspeaker’s directivity. For example, a loudspeaker that emits a very narrow beam (like a laser, assuming that were possible) would not send any sound towards the walls – only towards the listening position. So, this would have the same effect as having no reflection (because there is no sound going towards the sidewalls to reflect). In other words, the wider the dispersion of the sound from the loudspeaker (in a reflective room) the greater the apparent distance to the sound (but no greater than the distance to the loudspeakers, assuming that the recording is “dry”).

Loudspeaker directivity

So, we’ve established that the apparent distance to a phantom image in a recording is, in part, and in some (perhaps most) cases, dependent on the loudspeaker’s directivity. So, let’s concentrate on that for a bit.

Let’s build a very simple loudspeaker. It’s a model that has been used to simulate the behaviour of a real loudspeaker, so I don’t feel too bad about over-simplifying too much here. We’ll build an infinite wall with a piston in it that moves in and out. For example:

Here, you can see the piston (in red) moving in and out of the wall (in grey) with the resulting sound waves (the expanding curves) moving outwards in the air (in white).

The problem with this video is that it’s a little too simple. We also have to consider how the sound radiation off the front of the piston will be different at different frequencies. Without getting into the physics of “why” (if you’re interested in that, you can look here or here or here for an explanation) a piston has a general behaviour with repeat to the radiation patten of the sound wave it generates. Generally, the higher the frequency, the narrower the “beam” of sound. At low frequencies, there is basically no beam – the sound is emitted in all directions equally. At high frequencies, the beam to be very narrow.

The question then is “how high a frequency is ‘high’?” The answer to that lies in the diameter of the piston (or the diameter of the loudspeaker driver, if we’re interested in real life). For example, take a look at Figure 1, below.

Radiation of 100 Hz (blue) and 1.5 kHz (green) from a 10" diameter piston (i.e. a woofer). — Fig 1: Radiation of 100 Hz (blue) and 1.5 kHz (green) from a 10″ diameter piston (i.e. a woofer).

Figure 1 shows how loud a signal will be if you measure it at different directions relative to the face of a piston that is 10″ (25.4 cm) in diameter. Two frequencies are shown – 100 Hz (the blue curve) and 1.5 kHz (the green curve). Both curves have been normalised to be the same level (100 dB SPL – although the actual value really doesn’t matter) on axis (at 0°). As you can see in the plot, as you move off to the side (either to 90° or 270°) the blue curve stays at 100 dB SPL. So, no matter what your angle relative to on-axis to the woofer, 100 Hz will be the same level (assuming that you maintain your distance). However, look at the green curve in comparison. As you move off to the side, the 1.5 kHz tone drops by more than 20 dB. Remember that this also means that (if the loudspeaker is pointing at you and the sidewall is to the side of the loudspeaker) then 100 Hz and 1.5 kHz will both get to you at the same level. However, the reflection off the wall will have 20 dB more level at 100 Hz than at 1.5 kHz. This also means, generally, that there is more energy in the room at 100 Hz than there is at 1.5 kHz because, if you consider the entire radiation of the loudspeaker averaged over all directions at the same time the lower frequency is louder in more places.

This, in turn, means that, if all you have is a 10″ woofer and you play music, you’ll notice that the high frequency content sounds closer to you in the room than the low frequency content.

If the loudspeaker driver is smaller, the effect is the same, the only difference is that the effect happens at a higher frequency. For example, Figure 2, below shows the off-axis response for two frequencies emitted by a 1″ (2.54 cm) diameter piston (i.e. a tweeter).

Radiation of 1.5 kHz (blue) and 15 kHz (green) from a 1" diameter piston (i.e. a tweeter). — Radiation of 1.5 kHz (blue) and 15 kHz (green) from a 1″ diameter piston (i.e. a tweeter).

Notice that the effect is identical, however, now, 1.5 kHz is the “low frequency region for the small piston, so it radiates in all directions equally (seen as the blue curve). The high frequency (now 15 kHz) becomes lower and lower in level as you move off to the side of the driver, going as low as -20 dB at 90°.

So, again, if you’re listening to music through that tweeter, you’ll notice that the frequency content at 1.5 kHz sounds further away from the listening position than the content at 15 kHz. Again, the higher the frequency, the closer the image.

Same information, shown differently

If you trust me, figures 1 and 2, above, show you that the sound radiating off the front of a loudspeaker driver gets narrower with increasing frequency. If you don’t trust me (and you shouldn’t – I’m very untrustworthy…) then you’ll be saying “but you only showed me the behaviour at two frequencies… what about the others?” Well, let’s plot the same basic info differently, so that we can see more data.

Figure 3, below, shows the same 10″ woofer, although now showing all frequencies from 20 Hz to 20 kHz, and all angles from -90° to +90°. However, now, instead of showing all levels (in dB) we’re only showing 3 values, at -1 dB, -3 dB, and -10 dB. ( These plots are a little tougher to read until you get used to them. However, if you’re used to looking at topographical maps, these are the same.)

A contour plot showing the directivity of a 10" piston (i.e. a woofer). The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB. — Fig 3: A contour plot showing the directivity of a 10″ piston (i.e. a woofer). The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB.

Now you can see that, as you get higher in frequency, the angles where you are within 1 dB of the on-axis response gets narrower, starting at about 400 Hz. This means that a 10″ diameter piston (which we are pretending to be a woofer) is “omnidirectional” up to 400 Hz, and then gets increasingly more directional as you go up.

Figure 4 shows the same information for a 1″ diameter piston. Now you can see that the driver is omnidirectional up to about 4 kHz. (This is not a coincidence – the frequency is 10 times that of the woofer because the diameter is one tenth.)

A contour plot showing the directivity of a 1" piston (i.e. a tweeter). The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB. — Fig 4: A contour plot showing the directivity of a 1″ piston (i.e. a tweeter). The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB.

Normally, however, you do not make a loudspeaker out of either a woofer or a tweeter – you put them together to cover the entire frequency range. So, let’s look at a plot of that behaviour. I’ve put together our two pistons using a 4th-order Linkwitz-Riley crossover at 1.5 kHz. I have also not included any weirdness caused by the separation of the drivers in space. This is theoretical world where the tweeter and the woofer are in the same place – an impossible coaxial loudspeaker.

A contour plot showing the directivity of a two-way loudspeaker made of a 1" and a 10" piston. The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB. — Fig 5: A contour plot showing the directivity of a two-way loudspeaker made of a 1″ and a 10″ piston. The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB.

In Figure 5 you can see the effects of the woofer’s directivity starting to beam below the crossover, and then the tweeter takes over and spreads the radiation wide again before it also narrows.

So what?

Why should you care about understanding the plot in Figure 5? Well, remember that the narrower the radiation of a loudspeaker, the closer the sound will appear to be to you. This means that, for the imaginary loudspeaker shown in Figure 5, if you’re playing a recording without additional reverberation, the low frequency stuff will sound far away (the same distance as the loudspeakers), So will a narrow band between 3 kHz and 4 kHz (where the tweeter pulls the radiation wider). However, the materials in the band around 700 Hz – 2 kHz and in the band above 7 kHz will sound much closer to you.

Another way to express this is to show a graph of the resulting level of the reverberant energy in the listening room relative to the direct sound, an example of which is shown in Figure 6. (This is a plot copied from “Acoustics and Psychoacoustics” by David Howard and Jamie Angus).

Reverberant energy from the room relative to the direct sound from a two-way loudspeaker. (X and Angus, 2000) — Fig 6: Reverberant energy from the room relative to the direct sound from a two-way loudspeaker. (from Howard and Angus, 2000)

This shows a slightly different loudspeaker with a crossover just under 3 kHz. This is easy to see in the plot, since it’s where the tweeter starts putting more sound into the room, thus increasing the amount of reverberant energy.

What does all of this mean? Well, if we simplify a little, it means that things like voices will pull apart in terms of apparent distance. Consonant sounds like “s” and “t” will appear to be closer than vowels like “ooh”.

So, whaddya gonna do about it?

All of this is why one of the really important concerns of the acoustical engineers at Bang & Olufsen is the directivity of the loudspeakers. In a previous posting, I mentioned this a little – but then it was with regards to identifying issues related to diffraction. In that case, directivity is more of a method of identifying a basic problem. In this posting, however, I’m talking about a fundamental goal in the acoustical design of the loudspeaker.

For example, take a look at Figures 7 and 8 and compare them to Figure 9. It’s important to note here that these three plots show the directivities of three different loudspeakers with respect to their on-axis response. The way this is done is to measure the on-axis magnitude response, and call that the reference. Then you measure the magnitude response at a different angle, and then calculate the difference between that and the reference. In essence, you’re pretending that the on-axis response is flat. This is not to be interpreted that the three loudspeakers shown here have the same on-axis response. They don’t. Each is normalised to its own on-axis response. So we’re only considering how the loudspeaker compares to itself.

Fig 7: A contour plot showing the directivity of a commercially available 3-way loudspeaker. The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB.

Figure 7, above, shows the directivity behaviour of a commercially-available 3-way loudspeaker (not from Bang & Olufsen). You can see that the woofer is increasingly beaming (the directivity gets narrow) up to the 3 – 5 kHz area. The midrange is beaming up above 10 kHz or so. So, a full band signal will sound distant in the low end, in the 6-7 kHz range and around 15 kHz. By comparison, signals at 2-4 kHz and 10-15 kHz will sound quite close.

Fig 8: A contour plot showing the directivity of traditionally designed 4-way loudspeaker. The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB.

Figure 8, above, shows the directivity behaviour of a 3-way loudspeaker we made as a rough prototype. This is just a woofer, midrange and tweeter, each in its own MDF box – nothing fancy – except that the tweeter box is not as wide as the midrange box which is narrower than the woofer box. You can see that the woofer is beaming (the directivity gets narrow) just above 1 kHz – although it has a very weird wide directivity at around 650 Hz for some reason. The midrange is beaming up from 5kHz to 10 kHz, and then the tweeter gets wide. So, this loudspeaker will have the same problem as the commercial loudspeaker

Fig 9: A contour plot showing the directivity of a BeoLab 5. The red area has a magnitude between 0 and -1 dB. The orange area has a magnitude of -1 down to -3 dB. The blue area has a magnitude of -3 down to -10 dB. The white area is lower than -10 dB.

As you can see, the loudspeaker with the directivity shown in Figure 9 (the BeoLab 5) is much more constant as you change frequency (in other words, the lines are more parallel). It’s not perfect, but it’s a lot better than the other two – assuming that constant directivity is your goal. You can also see that the level of the signal that is within 1 dB of the on-axis response is quite wide compared with the loudspeakers in Figures 7 and 8. The loudspeaker in Figure 7 not only beams in the high frequencies, but also has some strange “lobes” where things are louder off-axis than they are on-axis (the red lines).

When you read B&O’s marketing materials about the reason why we use Acoustic Lenses in our loudspeakers, the main message is that it’s designed to spread the sound – especially the high frequencies – wider than a normal tweeter, so that everyone on the sofa can hear the high hat. This is true. However, if you ask one of the acoustical engineers who worked on the project, they’ll tell you that the real reason is to maintain constant directivity as well as possible in order to ensure that the direct-to-reverberant ratio in your listening room does not vary with frequency. However, that’s a difficult concept to explain in 1 or 2 sentences, so you won’t hear it mentioned often. However, if you read this paper (which was published just after the release of the BeoLab 5), for example, you’ll see that it was part of the original thinking behind the engineers on the project.

Addendum 1.

I’ve been thinking more about this since I wrote it. One thing that I realised that I should add was to draw a comparison to timbre. When you listen to music on your loudspeakers in your living room, in a best-case scenario, you hear the same timbral balance that the recording engineer and the mastering engineer heard when they worked on the recording. In theory, you should not hear more bass or less midrange or more treble than they heard. The directivity of the loudspeaker has a similar influence – but on the spatial performance of the loudspeakers instead of the timbral performance. You want a loudspeaker that doesn’t alter the relative apparent distances to sources in the mix – just like you don’t want the loudspeakers to alter the timbre by delivering too much high frequency content.

Addendum 2.

One more thing… I made the plot below to help simplify the connection between directivity and Grover. Hope this helps.

A contour plot showing the directivity of a commercially available 3-way loudspeaker. The wider the plot (vertically), the farther the image.

Audio Mythinformation: 16 vs 24 bit recordings

Preface: Lincoln was right

There is a thing called “argument from authority” which is what happens when you trust someone to be right about something because (s)he knows a lot about the general topic. This is used frequently by pop-documentaries on TV when “experts” are interviewed about something. Example: “we asked an expert in underwater archeology how this piece of metal could wind up on the bottom of the ocean, covered in mud and he said ‘I don’t know’ so it must have been put there by aliens millions of years ago.” Okay, I’m exaggerating a little here, but my point is that, just because someone knows something about something, doesn’t mean that (s)he knows everything about it, and will always give the correct answers for every question on the topic.

In other words, as Abraham Lincoln once said: “Don’t believe everything you read on the Internet.”

Of course, that also means that also applies to everything that follows in the posting below (arrogantly assuming that I can be considered to be an authority on anything), so you might as well stop reading and go do something useful.

My Inspiration

There has been some discussion circulating around the Interweb lately about the question of whether the “new” trend to buy “high-resolution” audio files with word lengths of 24 bits actually provides an improvement in quality over an audio file with “only” 16 bits.

One side of this “religious” war comes from the people who are selling the high-res audio files and players. The assumed claim is that 24 bits makes a noticeable improvement in audio quality (over a “mere” 16 bits) that justifies asking you to buy the track again – and probably at a higher price.

The other side of the war are bloggers and youtube enthusiasts who write things like (a now-removed) article called “24/192 Music Downloads… and why they make no sense” (which, if you looked at the URL, is really an anti-Pono rant) and “Bit Depth & The 24 Bit Audio Myth“

Personally, I’m not a fan of religious wars, so I’d like to have a go at wading into the waters in a probably-vain attempt to clear up some of the confusion and animosity that may be caused by following religious leaders.

Some background

If you don’t know anything about how an audio signal is converted from analogue to digital, you should probably stop reading here and go check out this page or another page that explains the same thing in a different way.

Now to recap what you already know:

An analogue to digital converter makes a measurement of the instantaneous voltage of the audio signal and outputs that measurement as a binary number on each “sample”
The resolution of that converter is dependent on the length of the binary number it outputs. The longer the number, the higher the resolution.
The length of a binary number is expressed in Binary digITs or BITS.
The higher the resolution, the lower the noise floor of the digital signal.
In order to convert the artefacts caused by quantisation error from distortion to program-dependent noise, dither is used. (Note that this is incorrectly called “quantisation noise” by some people)
In a system that uses TPDF (Triangular Probability Distribution Function) dither, the noise has a white spectrum, meaning that is has equal energy per Hz.

A good rule of thumb in a PCM system with TPDF dithering is that the dynamic range of the system is approximately 6 * the number of bits – 3 dB. For example, the dynamic range of a 16-bit system is 6*16-3 = 93 dB. Some people will say that this is the signal-to-noise ratio of the system, however, this is only correct if your signal is always as loud as it can be.

Let’s think about what, exactly, we’re saying here. When we measure the dynamic range of a system, we’re trying to find out what the difference is (in dB) between (1) the loudest sound you can put through the system without clipping and (2) the noise floor of the system.

The goal of an engineer when making a piece of audio gear (or of a recording engineer when making a recording) is to make the signal (the music) so loud that you can’t hear the noise – but not so loud that the signal clips and therefore distorts. There are three ways to improve this: you can either (1) make your gear capable of making the signal louder, (2) design your gear so that it has less noise, or (3) both of those things. In either case, what you are trying to maximise is the ratio of the signal to the noise. In other words, relative to the noise level, you want the signal as high as possible.

However, this is a rather simplistic view of the world that has two fatal flaws:

The first problem is that (unless you like most of the music my kids like) the signal itself has a dynamic range – it gets loud and it also gets quiet. This can happen over long stretches of time (say, if you’re listening to a choral piece written by Arvo Pärt) or over relatively short periods of time (say, the difference between the sharp peak of a rim shot on a snare and the decay of a piano noise in the middle of the piece of music I’ve plotted below.)

You should note that this isn’t a piece that I use to demonstrate wide dynamic range or anything – I just started looking through my classical music collection for a piece that can demonstrate that music has loud AND quiet sections – and this was the second piece I opened (it’s by the Ahn Trio – I was going alphabetically…) So don’t make a comment about how I searched for an exceptional example of the once recording in the history of all recordings that has dynamic range. That would be silly. If I wanted to do that, I would have dug out an Arvo Pärt piece – but Arvo comes after Ahn in the alphabet, so I didn’t get that far.

Screenshot of a a piece of music by the Ahn Trio. — Figure 1: Screenshot of the waveform representation of the Concerto for Piano Trio and Percussion, performed by the Ahn Trio. Note how big the difference is between the peaks and the quiet sections.

The portion of this piece that I’ve highlighted in Figure 1 (the gray section in the middle) has a peak at about 1 dB below full scale, and, at the end gets down to about -46 dB below that. (You might note that there is a higher peak earlier in the piece – but we don’t need to worry about that.) So, that little portion of the music has a dynamic range of about 45 dB or so – if we’re just dumbly looking at the plot.

So, this means that we want to have a recording system and a playback system for this piece of music that has can handle a signal as loud as that peak without distorting it – but has a constant noise floor that is quiet enough that I won’t hear it at the end of that piano note decaying at the end of that little section I’ve highlighted.

What we’re really talking about here is more accurately called the dynamic range of the system (and the recording). We’re only temporarily interested in the Signal to Noise ratio, since the actual signal (the music) has a constantly varying level. What’s more useful is to talk about the dynamic range – the difference (in dB) between the constant noise of the system (or the recording) and the maximum peak it can produce. However, we’ll come back to that later.

The second problem is that the noise floor caused by TPDF dither is white noise, which means that you have equal energy per Hertz as we’ve seen before. We can also reasonably safely assume that the signal is music which usually consists of a subset of all frequencies at any moment in time (if it had all frequencies present, it would sound like noise of some colour instead of Beethoven or Bieber), that are probably weighted like pink noise – with less and less energy in the high frequencies.

In a worst-case situation, you have one note being played by one instrument and you’re hoping that that one note is going to mask (or “drown out”) the noise of the system that is spread across a very wide frequency range.

For example, let’s look again at the decay of that piano note in the example in Figure 1. That’s one note on a piano, dropping down to about -40-something dB FS, with a small collection of frequencies (the fundamental frequency of the pitch and its multiples), and you’re hoping that this “signal” is going to be able to mask white noise that stretches in frequency band from something below 20 Hz all the way up past 20 kHz. This is worrisome, at best.

In other words, it would be easy for a signal to mask a noise if the signal and the noise had the same bandwidth. However, if the signal has a very small bandwidth and the noise has a very wide bandwidth, then it is almost impossible for the signal to mask the noise.

In other words, the end of the decay of one note on a piano is not going to be able to cover up hiss at 5 kHz because there is no content at 5 kHz from the piano note to do the covering up.

So, what this means is that you want a system (either a recording or a piece of audio gear) where, if you set the volume such that the peak level is as loud as you want it to be, the noise floor of the recording and the playback system is inaudible at the listening position. (We’ll come back to this point at the end.) This is because the hope that the signal will mask the noise is just that – hope. Unless you listen to “music” that has no dynamic range and is constantly an extremely wide bandwidth, then I’m afraid that you may be disappointed.

One more thing…

There is another assumption that gets us into trouble here – and that is the one I implied earlier which says that all of my audio gear has a flat magnitude response. (I implied it by saying that we can assume that the noise that we get is white.)

Let’s look at the magnitude response of a pair of earbud headphones that millions and millions of people own. I borrowed this plot from this site – but I’m not telling you which earbuds they are – but I will say that they’re white. It’s the top plot in Figure 2.

Top plot: The frequency response of a pair of earbud headphones. Bottom plot: The magnitude response of a filter I made to "copy" the response. — Fig 2: Top plot: The magnitude response of a pair of earbud headphones. Bottom plot: The magnitude response of a filter I made to mimic the response. It’s not perfect – but it’s close enough for the arguments I’m making here.

This magnitude response is a “weighting” that is applied to everything that gets into the listener’s ears (assuming that you trust the measurement itself). As you can see if you put in a signal that consists of a 20 Hz tone and a 200 Hz tone that are equal in level, then you’ll hear the 200 Hz tone about 40 dB louder than the 20 Hz tone. Remember that this is what happens not only to the signal you’re listening to, but also the noise of the system and the recording – and it has an effect.

For example, if we measure a 16-bit linear PCM digital system with TPDF dithering, we’ll see that it has a 93.3 dB dynamic range. This means that the RMS level of a sine wave (or another signal) that is just below clipping the system (so it’s as loud as you can get before you start distorting) is 93.3 dB louder than the white noise noise floor (yes, the repetition is intentional – read it again). However, that is the dynamic range if the system has a magnitude response that is +/- 0 dB from 0 Hz to half the sampling rate.

If, however, you measured the dynamic range through those headphones I’m talking about in Figure 2, then things change. This is because the magnitude response of the headphones has an effect on both the signal and the noise. For example, if the signal you used to measure the maximum capabilities of the system were a 3 kHz sine tone, then the dynamic range of the system would improve to about 99 dB. (I measured this using the filter I made to “fake” the magnitude response – it’s shown in the bottom of Figure 2.)

Remember that, with a flat magnitude response, the dynamic range of the 16-bit system is about 93 dB. By filtering everything with a weird filter, however, that dynamic range changes to 99 dB IF THE SIGNAL WE USE TO MEASURE THE SYSTEM IS a 3 kHz SINE TONE.

The problem now is that the dynamic range of the system is dependent on the spectrum of the signal we use to measure the peak level with – which will also be true when we consider the signal to noise ratio of the same system. Since the spectrum of the music AND the dither noise are both filtered by something that isn’t flat, the SNR of the system is dependent on the frequency content of the music and how that relates to the magnitude response of the system.

For example, if we measured the dynamic range of the system shown above using sine tones at different frequencies as our measurement signal, we would get the values shown in Figure 3

The dynamic range of a 16-bit TPDF system, if the measurement is done relative to a sine wave. — Fig 3: The dynamic range of a 16-bit TPDF system that includes the filter shown in the bottom of Figure 2, if the measurement is done relative to a sine wave with a frequency shown in the x-axis.

If you’re looking not-very-carefully-at-all at the curve in Figure 3, you’ll probably notice that it’s basically the curve on the bottom of Figure 2, upside down. This makes sense, since, generally, the filter will attenuate the total power of the noise floor, and the signal used to make the dynamic range measurement is a sine wave whose level is dependent on the magnitude response. What this means is that, if your system is “weak” at one frequency band, then the signal to noise ratio of the system when the signal consists of energy in the “weak” band will be worse than in other bands.

Another way to state this is: if you own a pair of those white earbuds, and you listen to music that only has bass in it (say, the opening of this tune) you might have to turn up the level so much to hear the bass that you’ll hear the noise floor in the high end.

Wrapping up

As I said at the beginning, some people say “more bits are better, so you should buy all your music again with 24-bit versions of your 16-bit collection”. Some other people say “24-bits is a silly waste of money for everyone”.

What’s the truth? Probably neither of these. Let’s take a couple of examples to show that everyone’s wrong.

Case 1: You listen to music with dynamic range and you have a good pair of loudspeakers that can deliver a reasonably high peak SPL level. You turn up the volume so that the peak reaches, say, 110 dB SPL (this is loud for a peak, but if it only happens now and again, it’s not that scary). If your recording is a 16-bit recording, then the noise floor is 93 dB below that, so you have a wide-band noise floor of 17 dB SPL which is easily audible in a quiet room. This is true even when the acoustic noise floor of the room is something like 30 dB SPL or so, since the dither noise from the loudspeaker has a white noise characteristic, whereas acoustic background noise in “real life” is usually pink in spectrum. So, you might indeed hear the high-frequency hiss. (Note that this is even more true if you have a playback system with active loudspeakers that protect themselves from high peaks – they’ll reduce the levels of the peaks, potentially causing you to push up the volume knob even more, which brings the noise floor up with it.)

The FFT's of a white noise sample (the blue curve) and a pink noise sample (the red curve), both of which have the same total RMS level. — Fig 4: The FFT’s of a white noise sample (the blue curve) and a pink noise sample (the red curve), both of which have the same total RMS level.

Case 2: You have a system with a less-than-flat magnitude response (i.e. a bass roll-off) and you are listening to music that only has content in that frequency range (i.e. the bass), so you turn up the volume to hear it. You could easily hear the high-frequency noise content in the dither if that high frequency is emphasised by the playback system.

Case 3: You’re listening to your tunes that have no dynamic range (because you like that kind of music) over leaky headphones while you’re at the grocery store shopping for eggs. In this case, the noise floor of the system will very likely be completely inaudible due to the making by the “music” and the background noise of announcements of this week’s specials.

The Answer

So, hopefully I’ve shown that there is no answer to this question. At least, there is no one-size-fits-all answer. For some people, in some situations, 16 bits are not enough. There are other situations where 16 bits is plenty. The weird thing that I hope that I’ve demonstrated is that the people who MIGHT benefit from higher resolution are not necessarily those with the best gear. In fact, in some cases, it’s people with worse gear that benefit the most…

… but Abraham Lincoln was definitely right. Stick with that piece of advice and you’ll be fine.

Appendix 1: Noise shaping

One of the arguments against 24-bit recordings is that a noise-shaped 16-bit recording is just as good in the midrange. This is true, but there are times when noise shaping causes playback equipment some headaches, since it tends to push a lot of energy up into the high frequency band where we might be able to hear it (at least that’s the theory). The problem is that the audio gear is still trying to play that “signal”, so if you have a system that has issues, for example, with Intermodulation Distortion (IMD) with high-frequency content (like a cheap tweeter, as only one example) then that high-frequency noise may cause signals to “fold down” into audible bands within the playback gear. So noise shaping isn’t everything it’s cracked up to be in some cases.

Bang & Olufsen BeoVision Avant Reviews

I was responsible for the final sound design (aka tonal balance) of the loudspeakers built into the BeoVision Avant. So, I’m happy to share some of the blame for some of the comments (at least on the sound quality) from the reviews.

from Home Cinema Choice magazine

“Where the Avant really gets space-age, though, is with … its ability to drive 21 audio channels ”

“This effort by the integrated speakers , together with its unprecedented audio flexibility,makes the Avant the finest sounding TV I’ve ever heard.”

from TrustedReviews.com

“Even a high-end sound bar would struggle to match the gorgeous finesse the Avant combines with its raw power. The speakers reproduce soundtrack subtleties more precisely and elegantly than any other TV we’ve heard. And they do so no matter how dense the soundstage becomes, and without so much as a hint of treble harshness.”

“Then there’s that rear-mounted subwoofer. We had worried that the way this angled subwoofer fires up and out through an actually quite narrow vent could cause boominess or distortion, but not a bit of it. Instead very impressive and well-rounded amounts of bass meld immaculately into the bottom end of the wide mid-range delivered by those terrific left, right and centre speakers.”

from flatpanelshd.com

“Compared to all other TVs on the market (non-B&O) there is no competition. Sound is so much better. However, we also have to point out that the TV did not receive the best conditions for a proper audio demonstration.“

Bang & Olufsen BeoLab 18 Reviews

I was part of the development team, and one of the two persons who decided on the final sound design (aka tonal balance) of the B&O BeoLab 18 loudspeakers. So, I’m happy to share some of the blame for some of the comments (at least on the sound quality) from the reviews.

Bernard Dickinson at Live Magazines said:

“The sound reproduction is flawless”

Lyd & Billede’s August 2014 review said

“Lydkvaliteten er rigtig god med en åben, distinkt og fyldig gengivelse, som ikke gør højopløste lydformater til skamme.” (The sound quality is very good with an open, clear and detailed reproduction, which do not put high-resolution audio formats to shame.)

and ”Stemmerne er lige klare og tydelige, hvad enten vi sidder lige i smørhullet eller befinder os langt ude i siden. Det er faktisk ret usædvanligt og gør, at BeoLab 18 egner sig lige godt til både baggrundsmusik og aktiv lytning” (The voices are crisp and clear, whether we are sitting right in the sweet spot or far off to the side. It’s actually quite unusual and makes the BeoLab 18 equally suited for both background music and active listening)

B&O Tech: How loud are my headphones?

#24 in a series of articles about the technology behind Bang & Olufsen loudspeakers

As you may already be aware, Bang & Olufsen makes headphones: the BeoPlay H6 over-ear headphones and the BeoPlay H3 earbuds.

If you read some reviews of the H6 you’ll find some reviewers like them very much and say things like “…excellent clarity and weight, well-defined bass and a sense of openness and space unusual in closed-back headphones. The sound is rich, attractive and ever-so-easy to enjoy.” and “… by no means are these headphones designed only for those wanting a pounding bass-line and an exciting overall balance: as already mentioned the bass extension is impressive, but it’s matched with low-end definition and control that’s just as striking, while a smooth midband and airy, but sweet, treble complete the sonic picture.” (both quotes are from Gramophone Magazine’s April 2014 issue). However, some other reviewers say things like “My only objection to the H6s is their volume level is not quite as loud as I normally would expect.” (A review from an otherwise-satisfied on Amazon.com). And, of course, there are the people whose tastes have been influenced by the unfortunate trend of companies selling headphones with a significantly boosted low-frequency range, and who now believe that all headphones should behave like that. (I sometimes wonder if the same people believe that, if it doesn’t taste like a Big Mac, it’s not a good burger… I also wonder why they don’t know that it’s possible to turn up the bass on most playback devices… But I digress…)

For this week’s posting, I’ll just deal with the first “complaint” – how loud should a pair of headphones be able to play?

Part 1: Sensitivity

One of the characteristics of a pair of headphones, like a passive loudspeaker, is its sensitivity. This is basically a measurement of how efficient the headphones are at converting electrical energy into acoustical output (although you should be careful to not confuse “Sensitivity” with “Efficiency” – sensitivity is a measure of the sound pressure level or SPL output for the voltage at the input whereas efficiency is a measure of the SPL output for the power in milliwatts). The higher the sensitivity of the headphones, the louder they will play for the same input voltage.

So, if you have a pair of headphones that are not very sensitive, and you plug them into your smartphone playing a tune at full volume, it might be relatively quiet. By comparison, a pair of very sensitive headphones plugged into the same smartphone playing the same tune at the same volume might be painfully loud. For example,let’s look at the measured data for three not-very-randomly selected headphones at https://www.stereophile.com/content/innerfidelity-headphone-measurements.

Brand	Model	Vrms to produce 90 dB SPL	dBV to produce 90 dB SPL
Sennheiser	HD600	0.230	-12.77
Beoplay	H6	0.044	-27.13
Etymotic	ER4PT	0.03	-30.46

If we do a little math, this means that, for the same input voltage, the Etymotic’s will be 3.3 dB louder than the H6’s and the Sennheiser’s will be 14.4 dB quieter. This is a very big difference. (The Etymotic’s are 7.7 times louder than the Sennheisers!)

So, in other words, different headphones have different sensitivities. Some will be quieter than others – some will be louder.

Side note: If you want to compare different pairs of headphones for output level, you could either look them up at the stereophile.com site I mentioned above, or you could compare their data sheet specifications using the Sensitivity to Efficiency converter on this page.

The moral of this first part of the story is that, when someone says “these headphones are not very loud” – the question is “compared to what?”

Part 2: The Source

I guess it goes without saying, but if you want more out of your headphones, the easiest solution is to turn up the volume of your source. The question then is: how much output can your source deliver? This answer also varies greatly from product to product. For example, if I take four not-very-randomly selected measurements that I did myself, I can see the following maximum output levels for a 500 Hz, 0 dB FS sine tone at maximum volume sent to a 31 ohm load (a resistor pretending to be a pair of headphones):

Brand	Model	Vrms	dBV
Lenovo	ThinkPad T420	0.317	-9.98
Apple	iPhone 3Ds	0.89	-1.01
Apple	MacBook Pro	2.085	+6.38
Sony	CDP-D500	6.69	+16.51

In other words, the Sony is more than 26 dB (or 21 times) louder than the ThinkPad, if we’re just measuring voltage. This is a very big difference.

So, as you can see, turning the volume all the way up to 11 on different product results in very different output levels. This is even true if you compare iPod Nano’s of different generations, for example – no two products are the same.

The moral of the story here is: if your headphones aren’t loud enough, it might not be the headphones’ fault.

Part 3: The Details, French Law, and How to Cheat

So much for the obvious things – now we are going to get a little ugly.

Let’s begin the ugliness with a little re-hashing of a previous posting. As I talked about in this posting, your ears behave differently at different listening levels. More specifically, you don’t hear bass and treble as well when the signal is quiet. The louder it gets, the more flat your “frequency response”. This means that, when acoustical consultants are making measurements of quiet things, they usually have to make the microphone signal as “bad” as your hearing at low levels. For example, when you’re measuring air conditioning noise in an office space, you want to make your microphone less sensitive to low frequencies, otherwise you’ll get a reading of a high noise level when you can’t actually hear anything. In order to do this, we use something called an “weighting filter” which is an attempt to simulate your frequency response. There are many different weighting curves – but the one we’ll talk about in this posting is an “A-weighting” curve. This is a filter that attenuates the low and high frequencies and has a small boost in the mid-band – just like you do at quiet listening levels. The magnitude response of that curve is shown below in Figure 1. At higher levels (like measuring the noise level at the end of a runway while a plane is taking off over your head), you might want to use a different weighting curve like a “C-weighting” filter – or none at all.

Fig 1. The magnitude response of an A-weighting filter.

So, let’s say that you get enough money on Kickstarter to create the Fly-by-Night Headphone Company and you’re going to make a flagship pair of headphones that will sweep the world by storm. You do a little research and you start coming across something called “BS EN 50332-1” and “BS EN 50332-2“. Hmmmm… what are these? They’re international standards that define how to measure how loudly a pair of headphones plays. The procedure goes something like this:

get some pink noise
filter it to reduce the bass and the treble so that it has a spectrum that is more like music (the actual filter used for this is quite specific)
reduce its crest factor so your measurement doesn’t jump around so much (this basically just gets rid of the peaks in the signal)
do a quick check to make sure that, by limiting the crest factor, you haven’t changed the spectrum beyond the acceptable limits of the test procedure
play the signal through the headphones and measure the sound pressure level using a dummy head
apply an A-weighting to the measurement
calculate how loud it is (averaged over time, just to be sure)

So, now you know how loud your headphones can play using a standard measurement procedure. Then you find out that, according to another international standard called EN 60065 or EN 60950-1 there are maximum limits to what you’re permitted to legally sell… in France… for now… (Okay, okay, these are European standards, but Europe has more than one country in it, so I think that I can safely call them international…)

So, you make your headphones, you make them sound like you want them to sound (I’ll talk about the details of this in a future posting), and then you test them (or have them tested) to see if they’re legal in France. If not (in other words, if they’re too sensitive), then you’ll have to tweak the sensitivity accordingly.

Okay – that’s what you have to do – but let’s look at that procedure a little more carefully.

Step 1 was to get some pink noise. This is nothing special – you can get or make pink noise pretty easily.

Step 2 was to filter the noise so that its spectrum ostensibly better matches the average spectrum of all recorded and transmitted music and speech in the world. The details of this filter are in another international standard called IEC 60268-1. The people who wrote this step mean well – there’s no point in testing your headphones with frequencies that are outside the range of anything you’ll ever hear in them. However, this means that there is probably some track somewhere that includes something that is not represented by the spectral characteristics of the test signal we’re using here. For example: Figure 2, below shows the spectral curve of the test signal that you are supposed to send to the headphones for the test.

Fig 2. The magnitude response of the filter applied to the pink noise before sending it to the headphones.

Compare that to Figure 3, which shows an analysis of a popular Lady Gaga tune that I use as part of my collection of tunes to make a woofer unhappy. This is a commercially-available track that has not been modified in any way.

Fig 3. The spectrum of a Lady Gaga tune. Compare this with the noise filter plot from Figure 2 (plotted in red for your convenience).

As you can see, there is more energy in the music example than there is in the test signal around the 30 – 60 Hz octave – particularly noticeable due to the relative “hole” in the response that ranges between about 70 and 700 Hz.

Of course, if we took LOTS of tunes and analysed them, and averaged their analyses, we’d find out that the IEC test signal shown in Figure 2 is actually not too bad. However, every tune is different from the average in some way.

So, the test signal used in the EN 50332 test is not going to push headphones as hard as some kinds of music (specifically, music that has a lot of bass content),

We’ll skip Step 3, Step 4, and Step 5.

Step 6 is a curiosity. We’re supposed to take the signal that we recorded coming out of the headphones and apply an A-weighting filter to it. Now, remember from above that an A-weighting filter reduces the low and high frequencies in an effort to simulate your bad hearing characteristics at quiet listening levels. However, what we’re measuring here is how loud the headphones can go. So, there is a bit of a contradiction between the detail of the procedure and what it’s being used for. However, to be fair, many people mis-use A-weighting filters when they’re making noise measurements. In fact, you see A-weighted measurements all the time – regardless of the overall level of the noise that’s being measured. One possible reason for this is that people want to be able to compare the results from the loud measurements to the results from their quiet ones – so they apply the same weighting to both – but that’s just a guess.

Let’s, just for a second, consider the impact of combining Steps 2 and 6. Each of the filters in both of these steps reduce the sensitivity of the test to the low and high frequency behaviour of the headphones. If we combine their effects into a single curve, it looks like the one in Figure 4, below.

Fig 4. The magnitude response of the combination of the A-weighting filter and the filter applied to the pink noise signal.

At this point, you may be asking “so what?” Here’s what.

Let’s take two different pairs of headphones and pretend that we measured them using the procedure I described above. The first pair of headphones (we’ll call it “Headphone A”) has a completely flat frequency response +/- < 0.000001 dB from 20 Hz to 20 kHz. The second pair of headphones has a bass boost such that anything below about 120 Hz has a 20 dB gain applied to it (we’ll call that “Headphone B”). The two unweighted measurements of these two simulated headphones are shown in Figure 5.

The magnitude responses of the two simulated headphones. The blue curve is "Headphone A". The red curve is "Headphone B". — Fig 5. The magnitude responses of the two simulated headphones. The blue curve is “Headphone A”. The red curve is “Headphone B”.

After filtering these measurements with the weighting curves from Steps 2 and 6 (above), the way our measurement system “hears” these headphone responses is slightly different – as you can see in Figure 6, below.

The magnitude responses of the two simulated headphones as "seen" by a EN 50332 measurement. The blue curve is "Headphone A". The red curve is "Headphone B". — Fig 6. The magnitude responses of the two simulated headphones as “seen” by a EN 50332 measurement. The blue curve is “Headphone A”. The red curve is “Headphone B”.

So, what happens when we measure the sound pressure level of the pink noise through these headphones?

Well, if we did the measurements without applying the two weighing curves, but just using good ol’ pink noise and no messin’ around, we’d see that Headphone B plays 13.1 dB louder than Headphone A (because of the 20 dB bass boost). However, if we apply the filters from Steps 2 and 6, the measured difference drops to only 0.46 dB.

This is interesting, since the standard measurement “thinks” that a 20 dB boost in the entire low frequency region corresponds to only a 0.46 dB increase in overall level.

Figure 7 shows the relationship between the bass boost applied below 120 Hz and the increase in overall level as measured using the EN 50332 standard.

The relationship between the increase in SPL as measured using the EN 50332 standard vs. the gain of a bass boost applied to the headphones. (filter characteristics are Low shelving, fc=120 Hz, Q=0707) — Fig 7. The relationship between the increase in SPL as measured using the EN 50332 standard vs. the gain of a bass boost applied to the headphones. (filter characteristics are Low shelving, fc=120 Hz, Q=0.707)

So, let’s go back to you, the CEO of the Fly-by-Night Headphone Company. You want to make your headphones louder, but you also need to obey the law in France. What’s a sneaky way to do this? Boost the bass! As you saw above, you can crank up the bass by 20 dB and the regulators will only see a 0.46 dB change in output level. You can totally get away with that one! Some people might complain that you have too much bass in your headphones, but hey – kids love bass. And plus, your competitors will get complaints about how quiet their headphones are compared to yours. All because people listening to children’s records at high listening levels hear much more bass than the EN 50332 measurement can!

Of course, one other way is to just ignore the law and make the headphones louder by increasing their sensitivity… but no one would do that because it’s illegal. In France.

Appendix 1: Listen to your Mother!

My mother always told me “Turn down that Walkman! You’re going to go deaf!” The question is “Was my mother right?” Of course, the answer is “yes” – if you listen to a loud enough sound for a long enough time, you will damage your hearing – and hearing damage, generally speaking, is permanent. The louder the sound, the less time it takes to cause the damage. The question then is “how loud and how long?” The answer is different for everyone, however you can find some recommendations for what’s probably safe for you at sites that deal with occupational health and safety. For example, this site lists the Canadian recommendations for maximum exposure time limits to noise in the workplace. This site shows a graph for the US recommendations for the same thing – I’ve used the formula on that site to make the graph in Figure 8, below.

Fig 8. Recommendations for the maximum time of exposure to noise sources according to The National Institute for Occupational Safety and Health (NIOSH) in the USA.

How do these noise levels compare with what comes out of my headphones? Well, let’s go back to the numbers I gave in Part 1 and Part 2. If we take the measured maximum output levels of the 4 devices listed in Part 2, and calculate what the output level in dB SPL would be through the measured sensitivities of the headphones listed in Part 1 (assuming that everything else was linear and nothing distorted or clipped or became unhappy – and ignoring the fact that the headphones do not have the same impedance as the one I used to do the measurements of the 4 devices… and assuming that the measurements of the headphones are unweighted on that website), then the maximum output level you can get from those devices are shown in Figure 9.

Fig 9. Calculated maximum output levels in dB SPL for the four devices and three headphones listed in Parts 1 and 2, above. Blue is the Etymotic ER4PT, red is the BeoPlay H6, and black is the Sennheiser HD600.

So, if you take the calculations shown in Figure 8 and compare them to the recommendations shown in Figure 7, then you might reach the conclusion that, if you set your volume to maximum (and your tune is full-band pink noise mastered to a constant level of 0 dB FS, and we do a small correction for the A-weighting based on the assumption that the 90 dB SPL headphone measurements listed above are unweighted ), then the maximum recommended time that you should listen to your music, according to the federal government in the USA is as shown in Figure 10.

Recommended maximum exposure time for 3 different headphones connected to 4 different sources playing at maximum volume. — Fig 10. Recommended maximum exposure time for 3 different headphones connected to 4 different sources playing at maximum volume (based on a large number of assumptions that may or may not be correct).

So, if I only listen to full-bandwidth pink noise at 0 dB FS at maximum volume on my MacBook Pro over my BeoPlay H6’s, then the American government thinks that after 0.1486 of a second, I am damaging my hearing.

It seems that my mother was right.

Appendix 2: Why does France care about how loud my headphones are?

This is an interesting question, the answer to which makes sense to some people, and doesn’t make any sense at all to other people – with a likely correlation with your political beliefs and your allegiance to the Destruction of the Tea in Boston. The best answer I’ve read was discussed in this forum where one poster very astutely point out that, in France, the state pays for your medical care. So, France has a right to prevent the French from making themselves go deaf by listening to this week’s top hit on Spotify at (literally) deafening levels. If you live in a place where you have to pay for your own medical care, then you have the right to self-harm and induce your own hearing impairment in order to appreciate the subtle details buried within the latest hip-hop remix of Mr. Achy-Breaky Heart’s daughter “singing” Wrecking Ball while you’re riding on a bus. In France, you don’t.

Appendix 3: Additional Reading

ISVR Consulting’s page

Rodhe and Schwarz’s pdf manual for running the EN 50332 test on their equipment

B&O Tech: Visual Analogies to Problems in Audio

#23 in a series of articles about the technology behind Bang & Olufsen loudspeakers

Audio people throw words around like “frequency” and “distortion” and “resolution” and “” without wondering whether anyone else in the room (a) understands or (b) cares. One of the best ways to explain things to people who do not understand but do care is to use analogies and metaphors. So, this week, I’d like to give some visual analogies of common problems in audio.

Let’s start with a photograph. Assuming that your computer monitor is identical to mine, and the background light in your room is exactly the same as it is in mine, then you’re seeing what I’m seeing when you look at this photo.

Let’s say that you, sitting there, looking at this photo is analogous to you, sitting there, listening to a recording on a pair of loudspeakers or over headphones. So what happens when something in the signal path messes up the signal?

Perhaps, for example, you have a limited range in your system. That could mean that you can’t play the very low and/or high frequencies because you are listening through a smaller set of loudspeakers instead of a full-range model. Limiting the range of brightness levels in the photo is similar to this problem – so nothing is really deep black or bright white. (We could have an argument about whether this is an analogy to a limited dynamic range in an audio system, but I would argue that it isn’t – since audio dynamic range is limited by a noise floor and a clipping level, which we’ll do later…) So, the photo below “sounds” like an audio system with a limited range:

Of course, almost everything is there – sort of – but it doesn’t have the same depth or sparkle as the original photo.

Or what if you have a noisy device in your signal chain For example, maybe you’re listening to a copy of the recording on a cassette tape – or the air conditioning is on in your listening room. Then the result will “sound” like this:

As you can see, you still have the original recording – but there is an added layer of noise with it. This is not only distracting, but it can obscure some of the more subtle details that are on the same order of magnitude as the noise itself.

In audio, the quietest music is buried in the noise of the system (either the playback system or the recording system). On the other extreme is the loud music, which can only go so loud before it “clips” – meaning that the peaks get chopped off because the system just can’t go up enough. In other words, the poor little woofer wants to move out of the loudspeaker by 10 mm, but it can only move 4 mm because the rubber holding on to it just can’t stretch any further. In a photo, this is the same as turning up the brightness too much, resulting in too many things just turning white because they can’t get brighter (in the old days of film, this was called “blowing out” the photo), as is shown below.

This “clipping” of the signal is what many people mean when they say “distorted” – however, distortion is a much broader range of problems then just clipping. To be really pedantic, any time the output of a system is not identical to its input, then the signal is distorted.

A more common problem that many people face is a modification of the frequency response. In audio, the frequency is (very generally speaking) the musical pitch of the notes you’re hearing. Low notes are low frequencies, high notes are high frequencies. Large engines emit low frequencies, tiny bells emit high frequencies. With light, the frequency of the light wavicle hitting your eyeball determines the colour that you see. Red is a low frequency and violet is a high frequency (see the table on this page for details). So, if you have a pair of headphones that, say, emphasises bass (the low frequencies) more than the other areas, then it’s the same as making the photo more red, as shown below.

Of course, not all impairments to the audio signal are accidental. Some are the fault of the user who makes a conscious decision to be more concerned with convenience (i.e. how many songs you can fit on your portable player) than audio quality. When you choose to convert your CD’s to a “lossy” format (like MP3, for example), then (as suggested by the description) you’re losing something. In theory, you are losing things that aren’t important (in other words, your computer thinks that you can’t hear what’s thrown away, so you won’t miss it). However, in practice, that debate is up to you and your computer (and your bitrate, and the codec you’ve chosen, and the quality of the rest of your system, and how you listen to music, and what kind of music you’re listening to, and whether or not there are other things to listen to at the same time, and a bunch of other things…) However, if we’re going to make an analogy, then we have to throw away the details in our photo, keeping enough information to be moderately recognisable.

As you can see, all the colours are still there. And, if you stand far enough away (or if you take off your glasses) it might just look the same. But, if you look carefully enough, then you might notice that something is missing… Keep looking… you’ll see it…

So, as you can see, any impairment of the “signal” is a disruption of its quality – but we should be careful not to confuse this with reality. There are lots of people out there who have a kind of weird religious belief that, when you sit and listen to a recording of an orchestra, you should be magically transported to a concert hall as if you were there (or as if the orchestra were sitting in your listening room). This is silly. That’s like saying when you sit and watch a re-run of Friends on your television, you should feel like you’re actually in the apartment in New York with a bunch of beautiful people. Or, when you watch a movie, you feel like you’re actually in a car chase or a laser battle in space. Music recordings are no more of a “virtual reality” experience than a television show or a film. In all of these cases (the music recording, the TV episode and the film), what you’re hearing and seeing should not be life-like – they should be better than life. You never have to wait for the people in a film to look for a parking space or go out to pee. Similarly, you never hear a mistake in the trumpet solo in a recording of Berlin Philharmonic and you always hear Justin Bieber singing in tune. Even the spatial aspects of an “audiophile” classical recording are better-than-reality. If you sit in a concert hall, you can either be close (and hear the musicians much louder than the reverberation) or far (and hear much more of the reverberation). In a recording, you are sitting both near and far – so you have the presence of the musicians and the spaciousness of the reverb at the same time. Better than real life!

So, what you’re listening to is a story. A recording engineer attended a music performance, and that person is now recounting the story of what happened in his or her own style. If it’s a good recording engineer, then the storytelling is better than being there – it’s more than just a “police report” of a series of events.

To illustrate my point, below is a photo of what that sinking WWII bunker actually looked like when I took the photo that I’ve been messing with.

Of course, you can argue that this is a “better” photo than the one at the top – that’s a matter of your taste versus mine. Maybe you prefer the sound of an orchestra done recorded with only two microphones played through two loudspeakers. Maybe you prefer the sound of the same orchestra recorded with lots of microphones played through a surround system. Maybe you like listening to singers who can sing. Maybe you like listening to singers who need auto tuners to clean up the mess. This is just personal taste. But at least you should be choosing to hear (or see) what the artist intended – not a modified version of it.

This means that the goal of a sound system is to deliver, in your listening room, the same sound as the recording engineer heard in the studio when he or she did the recording. Just like the photos you are looking at on the top of this page should look exactly the same as what I see when I see the same photo.

earfluff and eyecandy

mostly audio, but with some other stuff occasionally

Category: loudspeakers