Audio people throw words around like “frequency” and “distortion” and “resolution” and “” without wondering whether anyone else in the room (a) understands or (b) cares. One of the best ways to explain things to people who do not understand but do care is to use analogies and metaphors. So, this week, I’d like to give some visual analogies of common problems in audio.
Let’s start with a photograph. Assuming that your computer monitor is identical to mine, and the background light in your room is exactly the same as it is in mine, then you’re seeing what I’m seeing when you look at this photo.
Let’s say that you, sitting there, looking at this photo is analogous to you, sitting there, listening to a recording on a pair of loudspeakers or over headphones. So what happens when something in the signal path messes up the signal?
Perhaps, for example, you have a limited range in your system. That could mean that you can’t play the very low and/or high frequencies because you are listening through a smaller set of loudspeakers instead of a full-range model. Limiting the range of brightness levels in the photo is similar to this problem – so nothing is really deep black or bright white. (We could have an argument about whether this is an analogy to a limited dynamic range in an audio system, but I would argue that it isn’t – since audio dynamic range is limited by a noise floor and a clipping level, which we’ll do later…) So, the photo below “sounds” like an audio system with a limited range:
Of course, almost everything is there – sort of – but it doesn’t have the same depth or sparkle as the original photo.
Or what if you have a noisy device in your signal chain For example, maybe you’re listening to a copy of the recording on a cassette tape – or the air conditioning is on in your listening room. Then the result will “sound” like this:
As you can see, you still have the original recording – but there is an added layer of noise with it. This is not only distracting, but it can obscure some of the more subtle details that are on the same order of magnitude as the noise itself.
In audio, the quietest music is buried in the noise of the system (either the playback system or the recording system). On the other extreme is the loud music, which can only go so loud before it “clips” – meaning that the peaks get chopped off because the system just can’t go up enough. In other words, the poor little woofer wants to move out of the loudspeaker by 10 mm, but it can only move 4 mm because the rubber holding on to it just can’t stretch any further. In a photo, this is the same as turning up the brightness too much, resulting in too many things just turning white because they can’t get brighter (in the old days of film, this was called “blowing out” the photo), as is shown below.
This “clipping” of the signal is what many people mean when they say “distorted” – however, distortion is a much broader range of problems then just clipping. To be really pedantic, any time the output of a system is not identical to its input, then the signal is distorted.
A more common problem that many people face is a modification of the frequency response. In audio, the frequency is (very generally speaking) the musical pitch of the notes you’re hearing. Low notes are low frequencies, high notes are high frequencies. Large engines emit low frequencies, tiny bells emit high frequencies. With light, the frequency of the light wavicle hitting your eyeball determines the colour that you see. Red is a low frequency and violet is a high frequency (see the table on this page for details). So, if you have a pair of headphones that, say, emphasises bass (the low frequencies) more than the other areas, then it’s the same as making the photo more red, as shown below.
Of course, not all impairments to the audio signal are accidental. Some are the fault of the user who makes a conscious decision to be more concerned with convenience (i.e. how many songs you can fit on your portable player) than audio quality. When you choose to convert your CD’s to a “lossy” format (like MP3, for example), then (as suggested by the description) you’re losing something. In theory, you are losing things that aren’t important (in other words, your computer thinks that you can’t hear what’s thrown away, so you won’t miss it). However, in practice, that debate is up to you and your computer (and your bitrate, and the codec you’ve chosen, and the quality of the rest of your system, and how you listen to music, and what kind of music you’re listening to, and whether or not there are other things to listen to at the same time, and a bunch of other things…) However, if we’re going to make an analogy, then we have to throw away the details in our photo, keeping enough information to be moderately recognisable.
As you can see, all the colours are still there. And, if you stand far enough away (or if you take off your glasses) it might just look the same. But, if you look carefully enough, then you might notice that something is missing… Keep looking… you’ll see it…
So, as you can see, any impairment of the “signal” is a disruption of its quality – but we should be careful not to confuse this with reality. There are lots of people out there who have a kind of weird religious belief that, when you sit and listen to a recording of an orchestra, you should be magically transported to a concert hall as if you were there (or as if the orchestra were sitting in your listening room). This is silly. That’s like saying when you sit and watch a re-run of Friends on your television, you should feel like you’re actually in the apartment in New York with a bunch of beautiful people. Or, when you watch a movie, you feel like you’re actually in a car chase or a laser battle in space. Music recordings are no more of a “virtual reality” experience than a television show or a film. In all of these cases (the music recording, the TV episode and the film), what you’re hearing and seeing should not be life-like – they should be better than life. You never have to wait for the people in a film to look for a parking space or go out to pee. Similarly, you never hear a mistake in the trumpet solo in a recording of Berlin Philharmonic and you always hear Justin Bieber singing in tune. Even the spatial aspects of an “audiophile” classical recording are better-than-reality. If you sit in a concert hall, you can either be close (and hear the musicians much louder than the reverberation) or far (and hear much more of the reverberation). In a recording, you are sitting both near and far – so you have the presence of the musicians and the spaciousness of the reverb at the same time. Better than real life!
So, what you’re listening to is a story. A recording engineer attended a music performance, and that person is now recounting the story of what happened in his or her own style. If it’s a good recording engineer, then the storytelling is better than being there – it’s more than just a “police report” of a series of events.
To illustrate my point, below is a photo of what that sinking WWII bunker actually looked like when I took the photo that I’ve been messing with.
Of course, you can argue that this is a “better” photo than the one at the top – that’s a matter of your taste versus mine. Maybe you prefer the sound of an orchestra done recorded with only two microphones played through two loudspeakers. Maybe you prefer the sound of the same orchestra recorded with lots of microphones played through a surround system. Maybe you like listening to singers who can sing. Maybe you like listening to singers who need auto tuners to clean up the mess. This is just personal taste. But at least you should be choosing to hear (or see) what the artist intended – not a modified version of it.
This means that the goal of a sound system is to deliver, in your listening room, the same sound as the recording engineer heard in the studio when he or she did the recording. Just like the photos you are looking at on the top of this page should look exactly the same as what I see when I see the same photo.
Let’s say that you go to the store and you listen to a pair of loudspeakers with some demo music they have on a shelf there, and you decide that you like the loudspeakers, so you buy them.
Then, you take them home, you set them up, and you put one of your recordings, and you change your mind – you don’t like the loudspeakers.
What happened? Well, there could be a lot of reasons behind this.
Tip #1: Loudness
In the last article, I discussed why a “loudness” function is necessary when you change the volume setting while listening to your system. The setup of this article discussed the issue of Equal Loudness Contours, shown again as a refresher in Figure 1, below.
Let’s say that, when you heard the loudspeakers at the store, the volume knob was set so that, if you had put in a -20 dB FS, 1 kHz sine wave, it would have produced a level of 70 dB SPL at the listening position in the store. Then, let’s say that you go home and set the volume such that it’s about 10 dB quieter than it was when you heard it at the store. This means that, even if you listen to exactly the same recording, and even if your listening room at home were exactly the same as the room at the store, and even if the placement of the loudspeakers and the listening position in your house were exactly the same as at the store, the loudpspeakers would sound different at home than at the store.
Figure 2 below shows the difference between the 70 phon curve from Figure 1 (sort of what you heard at the store) and the 60 phon curve (sort of what you hear at home, because you turned down the volume). (To find out which curve is which in Fig 1, the phon value of the curve is its value at 1 kHz.)
As you can see in Figure 2, by turning down the volume by 10 dB, you’ve changed your natural sensitivity to sound – you’re as much as 5 or 6 dB less sensitive to low frequencies and also less sensitive to the high end by a couple of dB. In other words, by turning down the volume, even though you have changed nothing else, you’ve lost bass and treble.
In fact, even if you only turned down the volume by 1 dB, you would get the same effect, just by a different amount, as is shown in Figure 3.
So, as you can see here, even by changing the volume knob by 1 dB, you change the perceived frequency response of the system by about 0.5 dB in a worst case. The quieter you listen, the less bass and treble you have (relative to the perceived level at 1 kHz).
So, this means that, if you’re comparing two systems (like the loudspeakers at the store and the loudspeakers at home, or two different DAC’s or your system before and after you connect the fancy new speaker wire that you were talked into buying), if you are not listening at exactly the same level, your hearing is not behaving the same way – so any differences you hear may not be a result of the system.
Looking at this a different way, if you were to compare two systems (let’s say a back-to-back comparison of two DAC’s) that had frequency response differences like the ones shown in Figure 3, I would expect that you might expect that you could hear the difference between them. However, this is YOUR frequency response difference, just by virtue of the fact that you are not comparing them at the same listening level. The kicker here is that, if the difference in level is only 1 dB, you might not immediately hear one as being louder than the other – but you might hear the timbral differences between them… So, unless you’ve used a reliable SPL meter to ensure that they’re they same level, then they’re probably not the same level – unless you’re being REALLY careful – and even then, I’d recommend being more careful than that.
This is why, when researchers are doing real listening tests, they have to pay very careful attention to the listening level. And, if the purpose of the listening test is to compare two things, then they absolutely must be at the same level. If they aren’t, then the results of the entire listening test can be thrown out the window – they’re worthless.
It’s also why professionals who work in the audio industry like recording engineers, mastering engineers, and re-recording engineers always work at the same listening level. This, in part, ensures that they have consistency in their mixes – in other words, they have the same bass-midrange-treble balance in all their recordings, because they were all monitored at the same listening level.
Tip #2: Recordings
If you were selling your house, and you got a call from your real estate agent that there were some potential buyers coming tomorrow to see your place, you would probably clean up. If you were really keen, not only would you clean up, but you would put out some fresh flowers in a vase, and, a half-an-hour before your “guests” arrived, you’d be pulling a freshly-based loaf of bread out of the oven (because there’s nothing more welcoming than walking into a house that smells like freshly-baked bread…) You would NOT leave the bathroom in a mess, your bed unmade, dirty dishes in the sink, and yesterday’s dirty socks on the floor. In short, you want your house to look its best – otherwise you won’t get people through the front door (unless the price is REALLY good…)
Similarly, if you worked in a shop selling loudspeakers, part of your job is to sell loudspeakers. This means that you spend a good amount of time listening to a lot of different types of music on the loudspeakers in your shop. Over time, you’ll find that some recordings sound better than others for some reason that has something to do with the interactions between the recordings, the loudspeakers, the room’s acoustics, and your preferences. If you were a smart salesperson, you would make a note of the recordings that sound “bad” (for whatever reason) and you would not play them for potential customers that come into your store. Doing so would be the aural equivalent of leaving your dirty socks on the floor.
So, this means that, if you are the customer in the shop, listening to a pair of loudspeakers that you may or may not buy, you should remember that you’re probably going to be presented with a best-case scenario. At the very least, you should not expect the salesperson to play a recording that makes the loudspeakers sound terrible. Of course, this might mean many things. For example, it might mean that the loudspeakers are GREAT – but if they’re being used to play a really bad recording that you’ve never heard before, then you might think that the reason it sounds bad is the loudspeakers, and not the recording. So, you’ll walk out of the shop hating the loudspeakers instead of the recording.
So, the moral of the story here is simple: if you’re going to a shop to listen to a pair of loudspeakers, bring your own recordings. That way, you know what to expect – and you’ll test the loudspeakers on music that you like. Even if you bring just one CD and listen to just one song – as long as the song is one that you’ve hear A LOT, then you’re going to get a much better idea of how the loudspeakers are behaving than if you let the salesperson choose the demo music. In a perfect reality, you’ll put on your song, and your jaw will drop while you think “I’ve NEVER heard it sound this good!”.
Tip #3: Room Acoustics
It goes without saying that the acoustical behaviour of a room has a huge effect on how a loudspeaker sounds (I talked about this a lot in this posting). So does the specific placement of the loudspeakers and the listening position within a room. (I talked about this a lot in this posting). So, this also means that a pair of loudspeakers in a shop’s showroom will NOT sound the same as the same loudspeakers in your house – not even if you’ve aligned the listening levels and you’re playing the same recording. Maybe you have a strong sidewall reflection in your living room that they didn’t have in the showroom. Maybe the showroom is smaller than your living room, so the room modes are at higher frequencies and “helping out” the upper bass instead of the lower bass. Maybe, in the showroom, the loudspeakers were quite far from the wall behind them, but in your house, you’re going to push the loudspeakers up against the wall. Any of these differences will have massive effects on the sound at the listening position.
Of course, there is only one way around this problem. If you’re buying a pair of loudspeakers, then you should talk to the salesperson about taking a demo pair home for a week or so – so that you can hear how they sound in your room. If you’re buying some other component in the audio chain that might have an impact on the sound, you should ask to take it home and try it out with your loudspeakers.
If you were buying a car, you would take it for a test drive – and you would probably get out of the parking lot of the car dealer when you did so. You have to take it out on the road to see how it feels. The same is true for audio equipment – if you can’t take it home to try it out, make sure that the shop has a good return policy. Just because it sounds good in the shop doesn’t mean that it’s going to sound good in your living room.
Tip #4: Personal Taste
I like single-malt scotch. Personally, I really like peaty, smoky scotch – other people like other kinds of scotch. There’s a good book by Michael Jackson (no, not that Michael Jackson – another Michael Jackson) that rates scotches. Personally, this is a good book for me, because, not only does he give a little background for each of the distilleries, and a description of the taste of each of the scotches in there – but he scores them according to his own personal ranking system. Luckily for me, Michael Jackson and I share a lot of the same preferences – so if he likes a scotch, chances are that I will too. So, his ranking scores are a pretty good indicator for me. However, if he and I had different preferences, then his ranking system would be pretty useless.
One of my favourite quotations is from Duke Ellington who said “If it sounds good, it is good.” I firmly believe that this is true. If you like the sound of a pair of loudspeakers, then they’re good for you. Any measurement or review in a magazine that disagrees is wrong. Of course, a measurement or a reviewer might be able to point you to something that you haven’t noticed about your loudspeakers (which may make you like them a little more or a little less…) but if you like the way they sound, then there’s no need to apologise for that.
However, remember that, when you read a review, that you are reading the words of someone who also has personal taste. Yes, he/she may have heard many more loudspeakers than you have in many more listening rooms – but they still have personal preference. And, part of that personal preference is a ranking of the categories in which a loudspeaker should perform. Personally, I divide an audio system’s (or a recording’s) qualities into 5 broad categories: 1. Timbral (tone colour), 2. Spatial (i.e. imaging and spaciousness), 3. Temporal (i.e. punch, transient response), 4. Dynamics (not just total dynamic range, but also things like short term “dynamic contrast”) and 5. Distortion & Noise. Each of these has sub-headings in my head – but the relative importance of these 5 qualities are really an issue of my personal preference (and my expectations for a given product – nobody expects an iThing dock to deliver good imaging, for example…). If your personal preference of the weighting of these 5 categories (assuming that you agree with my 5 categories) is different from mine, then we’re going to like different audio systems. And that’s okay. No problem – apart from the minor issue that, if I were a reviewer working for an audio magazine, you shouldn’t buy anything I recommend. I like sushi – you like steak – so if I recommend a good restaurant, you should probably eat somewhere else.
Of course, the fact that I will listen to different music played at a different level in a different listening room than you will might also have an effect on the difference between our opinions.
Tip #5: Close your eyes
This one is a no-brainer for people who do “real” listening tests for scientific research – but it still seems to be a mystery to people who review audio gear for a living. If you want to make a fair comparison between two pieces of audio gear, you cannot, under any circumstances, know what it is that you’re comparing. There was a perfect proof of this done by Kristina Busenitz at an Audio Engineering Society convention one year. Throughout the convention, participants were invited to do a listening test on two comparable automotive audio systems. Both were installed in identical cars, parked side-by-side. The two cars were aligned to have identical reproduction levels and you listened to exactly the same materials to answer exactly the same judgements about the systems. You had to sit in the same seat (i.e. Front Passenger side) for both tests, and you had to do the two evaluations back to back. One car had a branded system in it, the other was unbranded – made obvious by the posters hanging on the wall next to one of the cars. The cars were evaluated by lots of people over the 3 or 4 days of the convention. At the end, the results were processed and it was easily proven that the branded system was judged by a vast majority of the participants to be better than the unbranded system.
There was just one catch – every couple of hours, the staff running the test would swap the posters to the opposite wall. The two cars were actually identical. The only difference was the posters that hung outside them.
So, the vast majority of professional audio engineers agreed, in a completely “fair” test, that the car with the posters (which was the opposite car every couple of hours) sounded better than the one that didn’t.
Of course, what Kristina proved was that your eyes have a bigger effect on your opinion than your ears. If you see more expensive loudspeakers, they’ll probably sound better. This is why, when we’re running listening tests internally at Bang & Olufsen, we hide the loudspeakers behind an acoustically transparent, but visually opaque curtain. We can’t help but be influenced by our pre-formed opinions of products. We’ve even seen that a packing box for a (competitor’s) loudspeaker sitting outside the listening room will influence the results of a blind listening test on a loudspeaker that has nothing to do with the label on the box. (the box was a plant – just to see what would happen).
Tip #6: Are you sure?
One last thing that really should go without saying: If you’re doing a back-to-back comparison of two different aspects of an audio system, be absolutely sure that you’re only comparing what you think you’re comparing. For example, I’ve read of people who do comparisons of things like the following:
sending a “bitstream” vs. “PCM” from a Blu-ray or DVD player to an AVR/Surround Processor
PCM vs. DSD
“normal” resolution vs. “high-resolution” recordings
If you’re making such a comparison, and you plan on making some conclusions, be absolutely sure that the only thing that changing in your comparison is what you think you’re comparing. In the three examples I gave above, there are potentially lots of other things changing in your signal path in addition to the thing your changing. If you’re not absolutely sure that you’re only changing one thing, then you can’t be sure that the reason you might hear a difference in the things you’re comparing is due to the thing you’re comparing. (did that make sense?) For example, given the three above examples:
some AVR’s apply different processing to bitstream vs. PCM signals. Some use the metadata in a bitstream, and some players don’t when they convert to PCM. So, the REASON the bitstream and the PCM signals might sound different is not necessarily because of the signals themselves, but how the gear treats them. (see this posting for more information on this)
Some DAC’s (meaning the chip on the circuit board inside the box that you have in your gear) apply different filters to a DSD signal than a PCM signal. Some have a different gain on the DSD signal (some “high resolution” software-based players also apply different gains to DSD and PCM signals). So, don’t just switch from DSD to PCM and think that, because you can hear a difference, that difference is the difference in DSD and PCM. It might just be your Equal Loudness Contours playing tricks on you.
Some DAC’s (see previous point for my current definition of “DAC”) apply different filters to signals at different sampling rates. Don’t judge two recordings you bought at different sampling rates and think that the only difference is the sampling rate. The gear that you’re using to play the files might behave differently at different rates.
And so on.
A good analogy to this is to go to a coffee shop and buy two cups of coffee – one medium roast and one dark roast. While you’re not looking, I’ll add a little sugar to the dark roast cup – and I’ll bribe the person that made your coffee to make the medium one a couple of degrees colder than the dark one. You taste both, and you decide that dark roast is better than medium roast. But is your decision valid? No. Maybe you like sugar in your coffee. Maybe you prefer hotter coffee. Be careful how you make up your mind…
So, to wrap up, there are (at least) four things to remember when you’re shopping for audio gear:
If you’re comparing systems, make sure that you’re listening at the same level.
Always listen to a system using a recording with which you’re familiar – even if it’s a bad recording. Better something you know than something you don’t.
Evaluate a system that you’re planning on buying in your own listening room.
Don’t let anyone tell you what sounds good or bad. Ask them what they are listening to and for in a recording or a sound system – but decide for yourself what you like.
If the listening test isn’t blind, it’s not worth much. Don’t even trust your own ears if you know what you’re listening to – your ears are easily fooled by your eyes and your pre-conceived notions. And you’re not alone.
Be very sure that if you’re comparing two things, then the things you think you’re comparing are the only things that you’re comparing.
Let’s start with some depressing news: You can’t trust your ears. Sorry, but none of us can.
There are lots of reasons for this, and the statement is actually far more wide-reaching than any of us would like to admit. However, in this article, we’re going to look at one small aspect of the statement, and what we might be able to do to get around the problem.
We’ll begin with a thought experiment (although, for some of you, this may be an experiment that you have actually done). Imagine that you go into the quietest room that you’ve ever been in, and you are given a button to press and a pair of headphones to put on. Then you sit and wait for a while until you calm down and your ears settle in to the silence… While that’s happening you read the instructions of the task with which you are presented:
Whenever you hear a tone in the headphones in either one of your ears, please press the button.
Simple! Hear a beep, press the button. What could be more difficult to do than that?
Then, the test begins: you hear a beep in your left ear and you press the button. You hear another, quieter beep and you press the button again. You hear an even quieter beep and you press the button. You hear nothing, and you don’t press the button. You hear a beep and you press the button. Then you hear a beep at a lower frequency and so on and so on. This goes on and on at different levels, at different frequencies, in your two ears, until someone comes in the room and says “thank you, that will be all”.
While this test seems like it would be pretty easy to do, it’s a little unnerving. This is because the room that you’re sitting in is so quiet and the beeps are also so quiet that, sometimes you think you hear a beep – but you’re not sure, because things like the sound of your heartbeat, and your breathing, and the “swooshing” of blood in your body, and that faint ringing in your ears, and the noise you made by shifting in your chair are all, relatively speaking VERY loud compared to the beeps that you’re trying to detect.
Anyways, when you’re done, you’ll might be presented with a graph that shows something called your “threshold of hearing”. This is a map of how loud a particular frequency has to be in order for you to hear it. The first thing that you’ll notice is that you are less sensitive to some frequencies than others. Specifically, a very low frequency or a very high frequency has to be much louder for you to hear it than if you’re listening to a mid-range frequency. (There are evolutionary reasons for this that we’ll discuss at the end.) Take a look at the bottom curve on Figure 1, below:
The bottom curve on this plot shows a typical result for a threshold of hearing test for a person with average hearing and no serious impairments or temporary issues (like wax build-up in the ear canal). What you can see there is that, for a 1 kHz tone, your threshold of hearing is 0 dB SPL (in fact, this is how 0 dB SPL is defined…) As you go lower in frequency from there, you will have to turn up the actual signal level just in order for you to hear it. So, for example, you would need to have approximately 60 dB SPL at 30 Hz in order to be able to detect that something is coming out of your headphones or loudspeakers. Similarly, you would need something like 10 dB SPL at 10 kHz in order to hear it. However, at 3.5 kHz, you can hear tones that are quieter than 0 dB SPL! It stands to reason, then, that a 30 Hz tone at 60 dB SPL and a 1 kHz tone at 0 dB SPL and a 3.5 kHz tone at about -10 dB SPL and a 10 kHz tone at about 10 dB SPL would all appear to have the same loudness level (since they are all just audible).
Let’s now re-do the test, but we’ll change the instructions slightly. I’ll give you a volume knob instead of a button and I’ll play two tones at different frequencies. The volume knob only changes the level of one of the two tones, and your task is to make the two tones the same apparent level. If you do this over and over for different frequencies, and you plot the results, you might wind up with something like the red or the top curves in Fig 1. These are called “Equal Loudness Contours” (some people call them “Fletcher-Munson Curves because the first two researchers to talk about them were Fletcher and Munson) because they show how loud different frequencies have to be in order for you to think that they have the same loudness. So, (looking at the red curve) a 40 Hz tone at 100 dB SPL sounds like it’s the same loudness as a 1 kHz tone at 70 dB SPL or a 7.5 kHz tone at 80 dB SPL. The loudness level that you think you’re hearing is measured in “phons” – and the phon value of the curve is its value in dB SPL at 1 kHz. For example, the red curve crosses the 1 kHz line at 70 dB SPL, so it’s the “70 phon” curve. Any tone that has an actual level in dB SPL that corresponds to a point on that red line will have an apparent loudness of 70 phons. The top curve is for the 90 phons.
Figure 2 shows the Equal Loudness Contours from 0 phons (the Threshold of Hearing) to 90 phons in steps of 10 phons.
There are two important things to notice about these curves. The first is that they are not “flat”. In other words, your ears do not have a flat frequency response. In fact, if you were measured the same way we measure microphones or loudspeakers, you’d have a frequency response specification that looked something like “20 Hz – 15 kHz ±30 dB” or so… This isn’t something to worry about, because we all have the same problem. So, this means that the orchestra conductor asked the bass section to play louder because he’s bad at hearing low frequencies, and the recording engineer balancing the recording adjusted the bass-to-midrange-to-treble relative levels using his bad hearing, and, assuming that the recording system and your playback system are reasonably flat-ish, then hopefully, your hearing is identically bad to the conductor and recording engineer, so you hear what they want you to.
However, I said that there are two things to notice – that was just the first thing. The second thing is that the curves are different at different levels. For example, if you look at the 0 phon curve (the bottom one) you’ll see that it raises a lot more in the low frequency region than, say, the 90 phon curve (the top one) relative to their mid-range values. This means that, the quieter the signal, the worse your ability to hear bass (and treble). For example, let’s take the curves and assume that the 70 phon line is our reference – so we’ll make that one flat, and adjust all of the others accordingly and plot them so we can see their difference. That’s shown in Figure 3.
What does Figure 3 show us, exactly? Well, one way to think of it is to go back to our “recording engineer vs. you” example. Let’s say that the recording engineer that did the recording set the volume knob in the recording studio so that (s)he was hearing the orchestra with a loudness at the 70 phon line. On other words, if the orchestra was playing a 1 kHz sine tone, then the level of the signal was 70 dB SPL at the listening position – and all other frequencies were balanced by the conductor and the engineer to appear to sound the same level as that. Then you take the recording home and set the volume so that you’re hearing things at the 30 phon level (because you’re having a dinner party and you want to hear the conversation more than you want to hear Beethoven or Justin Bieber, depending on your taste or lack thereof). Look at the curve that intersects the -40 dB line at 1 kHz (the 4th one from the bottom) in Figure 3. This shows you your sensitivity difference relative to the recording engineer’s in this example. The curve slopes downwards – meaning that you can’t hear bass as well – so, your recording playing in the background will appear to have a lot less bass and a little less treble than what the recording engineer heard – just because you turned down the volume. (Of course, this may be a good thing, since you’re having dinner and you probably don’t want to be distracted from the conversation by thumpy bass and sparkly high frequencies.)
Part 2: Compensation
In order to counter-act this “misbehaviour” in your hearing, we have to change the balance of the frequency bands in the opposite direction to what your ears are doing. So if we just take the curves in Figure 3 and flip each of them upside down, you have a “perfect” correction curve showing that, when you turn down the volume by, say 40 dB (hint: look at the value at 1 kHz) then you’ll need to turn up the low end by lots to compensate and make the overall balance sound the same.
Of course, these curves shown in Figure 4 are normalised to one specific curve – in this case, the 70 phon curve. So, if your recording engineer was monitoring at another level (say, 80 phons) then your “perfect” correction curves will be wrong.
And, since there’s no telling (at least with music recordings) what level the recording and mastering engineers used to make the recording that you’re listening to right now (or the one you’ll hear after this one), then there’s no way of predicting what curve you should use to do the correction for your volume setting.
All we can really say is that, generally, if you turn down the volume, you’ll have to turn up the bass and treble to compensate. The more you turn down the volume, the more you’ll have to compensate. However, the EXACT amount by which you should compensate is unknown, since you don’t know anything about the playback (or monitoring) levels when the recording was done. (This isn’t the same for movies, since re-recording engineers are supposed to work at a fixed monitoring level which should be the same as all the cinemas in the world… in theory…)
This compensation is called “loudness” – although in some cases it would be better termed “auto-loudness”. In the old days, a “loudness” switch was one that, when engaged, increased the bass and treble levels for quiet listening. (Of course, what most people did was hit the “loudness”switch and left it on forever.) Nowadays, however, this is usually automatically applied and has different amounts of boost for different volume settings (hence the “auto-” in “auto-loudness”). For example, if you look at Figure 5 you’ll see the various amounts of boost applied to the signal at different volume settings of the BeoPlay V1 / BeoVision 11 / BeoSystem 4 / BeoVision Avant when the default settings have not been changed. The lower the volume setting, the higher the boost.
Of course, in a perfect world, the system would know exactly what the monitoring levels was when they did the recording, and the auto-loudness equalisation would change dynamically from recording to recording. However, until there is meta-data included in the recording itself that can tell the system information like that, then there will be no way of knowing how much to add (or subtract).
I mentioned above that the extra sensitivity we have in the 3 kHz region is there due to evolution. In fact, it’s a natural boost applied to the signal hitting your eardrum as a result of the resonance of the ear canal. We have this boost (I guess, more accurately, we have this ear canal) because, if you snap a twig or step on some dry leaves, the noise that you hear is roughly in that frequency region. So, once-upon-a-time, when our ancestors were something else’s lunch, the ones with the ear canals and the resulting mid-frequency boost were more sensitive to the noise of a sabre-toothed tiger trying to sneak up behind them, stepping on a leaf, and had a little extra head start when they were running away. (It’s like the T-shirt that you can buy when you’re visiting Banff, Alberta says: “I don’t need to run faster than the bear. I just need to run faster than you.”)
As an interesting side note to this: the end result of this is that our language has evolved to use this sensitive area. The consonants in our speech – the “s” and”t” sounds, for example, sit right in that sensitive region to make ourselves easiest to understand.
You might come across some youtube video or a downloadable file that let’s you “check your hearing” using a swept sine wave. Don’t bother wasting your time with this. Unless the headphones that you’re using (and everything else in the playback chain) are VERY carefully calibrated, then you can’t trust anything about such a demonstration. So don’t bother.
Warning note #2 – Post script…
I just saw on another website here that someone named John Duncan made the following comment about what I wrote in this article. “Having read it a couple of times now, tbh it feels like it is saying something important, I’m just not quite sure what. Is it that a reference volume is the most important thing in assessing hifi?” The answer to this is “Exactly!” If you compare two sound systems (say, two different loudspeakers, or two different DAC’s or two different amplifiers and so on… The moral of the stuff I talk about above is that, not only in such a comparison do you have to make sure that you only change one thing in the system (for example, don’t compare two DAC’s using a different pair of loudspeakers connected to each one) you absolutely must ensure that the two things you’re comparing are EXACTLY the same listening level. A different of 1 dB will have an effect on your “frequency response” and make the two things sound like they have different timbral balances – even when they don’t.
For example, when I’m tuning a new loudspeaker at work, I always work at the same fixed listening level. (for me, this is two channels of -20 dB FS full-band uncorrelated pink noise produces 70 dB SPL, C-weighted at the listening position). Before I start tuning, I set the level to match this so that I don’t get deceived by my own ears. If I tuned loudspeakers quieter than this, I would push up the bass to compensate. If I tuned louder, then I would reduce the bass. This gives me some consistency in my work. Of course, I check to see how the loudspeakers sound at other listening levels, but, when I’m tuning, it’s always at the same level.
This was a presentation I did at a technical press event where we showed the BeoVision Avant for the first time. The video was done by one of the journalists at the event. The associated article (in Danish) is here at http://www.recordere.dk.
If you take a careful look around the connection panel of almost any Bang & Olufsen loudspeaker, you’ll find a three-position switch that is labelled something like “Free / Wall / Corner” or “F / W / C” or “Pos 1 / Pos 2 / Pos 3”. What does this do and how should you use it?
Part 1: Unreal acoustics
Let’s pretend that you have a loudspeaker that is perfectly omnidirectional, and it is in a free field (meaning that the sound that radiates from it is free to propagate forever without hitting anything – in other words, it’s floating in infinite space). Let’s then say that we measure the magnitude response of that loudspeaker and we find out that it has a perfectly flat response from 0 Hz to infinity Hz. Remember that the source is perfectly omnidirectional, so the response will be the same regardless of which direction you measure it from. This also means that if you do a lot of magnitude response measurements around the source and average them, it will also be flat, since the average of a whole bunch of the same thing is the same as any one of the things (i.e. the average of 5 & 5 & 5 & 5 & 5 & 5 & 5 is 5).
Now let’s divide the infinite space in half with a very large, perfectly flat wall that extends infinitely – and we’ll put it fairly close to the loudspeaker. Now, if we do a magnitude response measurement at one position, we’ll see a response that is comprised of alternating boosts and cuts as we go up in frequency. This is caused by the interference between the direct sound of the loudspeaker and its reflection off the wall. These two sounds arrive at the measurement microphone at two different times – which means that different frequencies will be separated in phase differently. The higher the frequency, the greater the phase difference between the direct and reflected sounds. And, depending on the phase at any one frequency, the result may either be constructive interference (where the two signals add) or destructive interference (where they cancel each other). If it helps, an easier way to think of this is that the wall is a mirror that results in a reflection of the loudspeaker on the other side of it. The sound that arrives at the microphone is the combination of the two loudspeakers (the real one and the one on the other side of the mirror). If we do an averaged pressure response measurement, the averaging that we have to do results in the fact that we lose the phase information in each of our individual measurements. However, each individual response that we measure has peaks and dips that affects how it adds to the other responses. In the very low frequencies the “two” loudspeakers are very close together relative to the very long wavelengths of low frequencies in air – so they add together almost perfectly. This means that the total output will be doubled at the very low end – 200% of the output (or 6 dB more) than without the wall. At very high frequencies, the outputs of the two loudspeakers add randomly – sometimes increasing, sometimes cancelling the total. The end result of this average is a messy response, but is roughly the same level as 141% (or 3 dB more) than if the wall weren’t there. (Note that the low end is 2 times louder, (because there are two “sources” – the real one and the reflected one. However, the high end is 1.41 times louder. 1,41 is the square root of 2 – the reason for this involves an explanation of power being proportional to the square of the pressure, so doubling the power results in multiplying the pressure by sqrt(2).) Take a look at Figures 2 and 3. You’ll see that the result of placing the theoretical wall near the theoretical loudspeaker is that the low end and the high end are boosted – but the low end is boosted about 3 dB more than the high end. If you compare Figures 2 and 3, you’ll see that the closer the loudspeaker is to the wall, the higher the top frequency of the “low end”.
If you divide space once more, using a second wall that is perpendicular to the first (so now your speaker is on the floor, next to a wall, for example), you are doubling the number of “loudspeakers” again. Now we have one “real” loudspeaker and 3 reflections. Let’s forget about the magnitude response at one location for now and just deal with the power response, since that’s a little less complicated. Now we have 4 times the output (or 12 dB more) in the low frequencies and, 2 times the output (or 6 dB more) in the high frequency ranges. (Notice again that the multiplier for the output in the low end is the number of loudspeakers – either real or reflected – and that the multiplier for the output in the high frequencies is the square root of that number.)
Finally, let’s add one last wall, perpendicular to the other two (i.e. two walls and the floor). This resuts in a total of 8 sources (one real and 7 reflected) which means that the output will be 8 times louder (or 18 dB) in the low end (than if the walls weren’t there) and 2.8 (sqrt(8)) times louder (or 9 dB) in the high end.
So, the first lesson to be learned here, for now, is that, in a theoretical world, where loudspeakers are perfectly omnidirectional and walls go forever, the more walls you have the bigger the bass boost. (Of course, you’ll also get a boost in the high end, but it will be smaller than the low-end boost, and you’ll probably compensate for that with the volume knob when the vocals and snare drum come in…) There is a second, nearly-as-important lesson. Look carefully, for example, at Figures 7 and 8. Starting in the low end, you can see the bass boost resulting from the collective reflections off the two walls. As you go up in frequency, you can see that the boost drops. However, before it levels out (albeit messily) at the high end, you can see that there is a deep drop in the level (i.e. in Figure 8, it’s at 100 Hz). This is because, for the particular wall distances we’re looking at, there is more cancellation of signals going on than there is constructive interference. So, the average is lower than if the walls weren’t there. This will be important later…
Part 2: Increasingly realistic acoustical behaviour
We can then take it one step further and consider that the very pretty graph shown in Figure 1 is extremely theoretical. A free field is an imaginary space – the reality is that a “free standing” loudspeaker is not really in a free field. For starters, it has to stand on something (unless you’re hanging it from the ceiling) – so the floor is not very far away – probably 1 m or so. Secondly, unless you live in a VERY large house, even when the loudspeaker is placed far from a wall, it’s probably not going to tens of metres away from any way. We can set a limit of something like 1 m on this – meaning, if you’re more than 1 m from any wall, we’ll call that “free”. This means that, in a real space, where the loudspeaker is at least 1 m from any surface, the response you get as a result of those three adjacent walls is roughly like the graph shown in Figure 8. The implications of this previous paragraph, in the real world, are important. What this means is that, when we do the sound design for a loudspeaker, we have to choose its position in a room rather carefully. Typically, it’s in a “free” position, which means, in a real world, about 1 m from each of the two adjacent walls (this isn’t measured exactly – everything in this article should be considered to be approximate). (Of course, loudspeakers that are, in all likelihood destined for a wall bracket are tuned on a wall instead.) So, the “free” position isn’t the same as the theoretical free field in Figure 1. It’s more like the not-very-close-to-a-surface case shown in Figure 8. The behaviour of the loudspeaker in this location is then the “reference” – the goal is then to ensure that, if a customer places the same loudspeaker against one wall or in a corner of two walls, the loudspeaker will sound the same as it does in the reference position. We do this by looking at the difference between the averaged response of the loudspeaker in the “wall” or “corner” location and the reference “free” position. For example, if we were making perfectly omnidirectional loudspeakers, and we say that 30 cm from a wall is close enough to call the loudspeaker in a “wall” position, then we would subtract the response curve shown in Figure 8 (the reference “Free” response) from the response shown in Figure 4. This difference, shown in Figure 9, below, is the “eq curve” applied to a loudspeaker that is placed closer to a wall. So, you can see that we get a large bump in the low end (in this case, with these dimensions) around 100 Hz, and dips below and above this peak (at 20 Hz and around 400 Hz).
If we were making perfectly omnidirectional loudspeakers, and we compare a “corner” position 30 cm from three perpendicular wall, then we would subtract the response curve shown in Figure 8 (the reference “Free” response) from the response shown in Figure 7. This difference, shown in Figure 10, is the “eq curve” applied to a loudspeaker that is placed closer to a wall. So, you can see that we get a larger bump in the low end (in this case, with these dimensions), still around 100 Hz, and a dip above this peak (at around 300 Hz).
So, this means that, for these perfectly omnidirectional loudspeakers, considering only these dimensions, the equalisation filters we would have to apply to the loudspeaker to compensate for a “wall” or “corner” position would have to be the inverse of the curves in Figures 9 and 10. In other words, we would just flip them upside down to undo the change in the loudspeaker’s timbre as a result of its placement. However, in real life, loudspeakers are not omnidirectional at all frequencies. In real life, they don’t even have the same directional characteristics (omnidirectional or not…) as themselves at all frequencies. Due to their physical shape, the size of the loudspeaker drivers and the choice of crossover frequencies (amongst other things…) a typical loudspeaker will radiate different frequencies at different levels in different directions – even if it has been equalised to be perfectly flat on-axis in a free field. In addition, additional (perhaps unwanted) moving “parts” such as air flow in and out of a port, a slave driver or even a moving panel in the loudspeaker cabinet (see this article for a discussion about this) will not only affect the magnitude of a signal in a given direction, but also its phase relative to the on-axis response. So, what impact does reality have on the rule-of-thumb lessons learned above? Let’s take an only-slightly-more realistic example of a loudspeaker that is omnidirectional in the low frequency bands and very directional in the mid and upper frequency bands. Now, the energy in the low end will radiate forwards and backwards, reflecting off the wall (or walls) and still resulting in a boost. However, since the high frequency bands are not omnidirectional, you won’t get a boost from the reflections in the power response of the loudspeaker in the room. Consequently, the bass boost caused by the presence of the walls will be exaggerated due to the difference in directivity of the loudspeaker in different frequency bands. Of course, the actual directivity of a loudspeaker is considerably more complicated and messy than a simple description like “omni in the low end and beaming in the high end” – but we won’t delve very far into the details of that in this article. Let’s just stop at “real life is complicated”. The end result of this is that, if we do the same math as I used to do the plots shown above, but we include the actual directivity measurements of the actual loudspeaker, then we can calculate the final equalisation curves that we need to make the wall and corner positions sound more like the free position. An example of these curves are shown below in Figure 11. Note that these curves are applied to the “free” setting which, in the case of this loudspeaker, is the reference position in which it was tuned during the sound design process. The two things to note here are the dip at around 100 Hz which counteracts the boost that we see in the theoretical curves in Figures 9 and 10. There is also a slight boost around 200 Hz which compensates for the dip that can be seen in Figures 9 and 10. The very low end is untouched, since there is very little difference in the extreme low end of the loudspeaker. This is because, in a normal room, you can’t get far enough away for the walls to “not exist” at 20 Hz – the wavelength of the very low end is just too big.
So, as you can see, the “Free / Wall / Corner” position switch, supplied on almost all Bang & Olufsen loudspeakers, is not merely a simple shelving filter with a 3 dB or 6 dB difference on the low end. It’s a rather complicated filter that is customised for each loudspeaker that we make, since it is dependent on the specific directional characteristics of that loudspeaker.
This week, instead of talking about what is inside the loudspeakers, let’s talk about what I listen for when sound is coming out of them. Specifically, let’s talk about one spatial aspect of the mix – where instruments and voices are located in two-dimensional space. (This will be a short posting this week, because it includes homework…)
Step 1: Go out and buy a copy of Jennifer Warnes’s album called “Famous Blue Raincoat: The Songs of Leonard Cohen” and play track 2 – “Bird on a Wire”.
Step 2: Close your eyes and really concentrate on where the various voices and instruments are located in space relative to your loudspeakers. If you hear what I hear, you’ll hear something like what I’ve tried to represent on the map shown in the figure below.
I’ve used some colour coding, just to help keep things straight:
Voices are in Red
Drums are in blue
Metallic instruments (including cymbals) are in green
Bass is gray
Synth and Saxophone are in purple
Note that Jennifer sings her own backup vocals, so the “voice”, and the two “bk” (for backup – not Burger King) positions are all her. It also sounds like she’s singing in the “choir” on the left – but it’s hard for me to hear exactly where she is.
Whenever I’m listening to a pair of loudspeakers (or a car audio system, or the behaviour of an upmix algorithm) to determine the spatial properties, I use this map (which I normally keep in my head – not on paper…) to determine how things are behaving. The two big questions I’m trying to answer when considering a map like this revolve around the loudspeakers’ ability to deliver the (1) accuracy and (2) the precision I’m looking for. (Although many marketing claims will use these words interchangeably, they do not mean the same thing.)
The question of accuracy is one of whether the instruments are located in the correct places, both in terms of left and right, but also in terms of distance. For example, the tune starts with a hit on the centre tom-tom, followed immediately by the bigger tom-tom on the left of the mix. If I have to point at that second, deeper-pitched tom-tom – which direction am I pointing in? Is it far enough left-of-centre, but not hard over in the left loudspeaker? (This will be determined by how well the loudspeakers’ signals are matched at the listening position, as well as the location of the listening position.) Secondly, how far away does it sound, relative to other sound sources in the mix? (This will be influenced primarily by the mix itself.) Finally, how far away does it sound from the listening position in the room? (This will be influenced not only by the mix, but by the directivity of the loudspeakers and the strength of sidewall reflections in the listening room. I talked about that in another blog posting once-upon-a-time.)
The question of precision can be thought of as a question of the size of the image. Is it a pin-point in space (both left/right and in distance)? Or is a cloud – a fuzzy location with indistinct edges? Typically, this characteristic is determined by the mix (for example, whether the panning was done using amplitude or delay differences between the two audio channels), but also by the loudspeaker matching across the frequency range and their directivity. For example, one of the experiments that we did here at B&O some years ago showed that a difference as small as 3 degrees in the phase response matching of a pair of loudspeakers could cause a centrally-located phantom image to lose precision and start to become fuzzy.
Some things I’ve left out of this map:
The locations of the individual voices in the “choir”
Extra cowbells at around 2:20
L/R panned cabasa (or shaker?) at about 2:59
Some additional notes:
The triangles on the right side happen around 2:12 in the tune. The ones on the left come in much earlier in the track.
The “synth-y fx around 2:20” might be a guitar with a weird modulation on it. I don’t want to get into an argument about exactly what instrument this is.
I’ve only identified the location of the bass in the choir. There are other singers, of course…
You might note that I used the term “two-dimensional space” in the beginning of this posting. In my head, the two dimensions are (1) angle to the source and (2) distance to the source. I don’t think in X-Y cartesian terms, but Polar terms.
An important thing to mention before I wrap up is that this aspect of a loudspeaker’s performance (accuracy and precision of phantom imaging) is only one quality of many. Of course, if you’re not sitting in the sweet spot, none of this can be heard, so it doesn’t matter. Also, if your loudspeakers are not positioned “correctly” (±30 degrees of centre and equidistant from the listening position) then none of this can be heard, so it doesn’t matter. And so on and so on. The point I’m trying to make here is that phantom image representation is only one of the many things to listen for, not only in a recording but also when evaluating loudspeakers.
Take a balloon and blow it up. It will look something like the drawing in the centre of Figure 1.
If you put your hands on the top and bottom of the ballon and compress them, you’ll make the balloon shorter, but you’ll also make it wider (as is shown on the left side of Figure 1). This is because the air inside the balloon is under a higher pressure when you squeeze your hands together, and that pressure pushes harder on the parts of the balloon that aren’t being held in by your hands.
If, instead, you grab the top and bottom of the balloon and stretch them further apart, you’ll make the balloon narrower (as is shown on the right side of Figure 1). This is because you’ve created more space inside the balloon, thus lowering the internal pressure pushing out on its walls. The lower the pressure, the less the air pushes outwards, so the balloon collapses.
What we’ve described here is basically the same as what happens to a sealed loudspeaker cabinet (if you’re not careful when you do its mechanical design). However, instead of your hands squeezing a balloon (a sealed “cabinet” of air), we use the signal from a power amplifier to pull the woofer into the cabinet. This increases the pressure inside the cabinet, since it’s sealed and there’s nowhere for the air to go. As a result, the trapped air tries to push outwards on the sides of the cabinet. If the walls of the cabinet are thin, then the cabinet itself will act like the balloon and expand.
Similarly, if you put a positive signal on the power amplifier, you’ll move the woofer out of the cabinet, reducing the air pressure inside it and “sucking” the sidewalls of the loudspeaker in (if they are so thin that they can move).
Generally speaking, you want a loudspeaker to behave as a piston in a baffle (like this). This means (in this case) that you want the woofer to move in and out of a loudspeaker cabinet and you don’t want the cabinet itself to move at all. (We’ll talk later about why this is a bad thing.)
There are a number of different ways to do this. One way is to make your loudspeaker cabinet out of something very, very, very stiff (and probably, as a result, very, very, very heavy). For example, we make our prototypes out of 22 mm or 25 mm thick MDF (and sometimes we use two sheets of it, glued back to back, to make sure that things are stiff enough). Most of our loudspeakers like the BeoLab 9, for this week’s example, have enclosures that are made of plastic, so we use some different methods to achieve the stiffness and rigidity we need to ensure that our enclosures are not singing along with the drivers.
The first step in ensuring that the walls of the loudspeaker cabinet are stiff enough is to make them thick enough. How thick is “enough” depends on the loudspeaker itself. A subwoofer with a small enclosure volume will have to withstand more internal air pressure than a midrange in a larger enclosure. Of course, we can’t simply make the walls of the cabinet out of overly thick plastic, since that would not only be an unnecessary waste of materials, it would increase the weight of the loudspeaker (a consideration for shipping) and the cooling time of the plastic when it’s in production (a consideration for the production line timing and costs).
A second way to make the sidewalls stiffer is to use ribs – usually on the inside of the cabinet. These are moulded as part of the sidewall itself – they aren’t just glued on to the inside surface of the cabinet. If you take a look at Figure 3, you can see the ribs on the inside of the sidewalls of the BeoLab 9. These run diagonally and vertically in the case of this loudspeaker – but they’re not just randomly placed inside the loudspeaker. They have been strategically placed using simulations in the early development stages, and measurements of the early prototypes. These measurements are done using very small accelerometers glued to the sides of the loudspeakers and monitoring their outputs while playing signals through the loudspeaker drivers. (In case you’re wondering, the depth of the ribs is about 22 – 25 mm, depending on where you measure.)
A third tactic is to use a plastic that is stiffened by adding things to it. The usual method is to use fibre-reinforced plastic. The fibres in the plastic help give it a structural strength that you can’t get from just plastic alone.
A fourth possibility is to create a laminate material where you build up the enclosure using layers of different materials (or layers of the same material with different structural composition). This increases stiffness in the same way that a sheet of plywood is stiffer than a sheet of wood.
So, in the case of the BeoLab 9 (as with many of our other loudspeakers), three of these tactics were used. The plastic is thick enough, it has strengthening ribs on the inside, and it is a laminate (if you slice it open, you’ll see that it is a layer of foamed plastic, sandwiched between “skin” layers of solid plastic).
However, when they (I wasn’t part of the BeoLab 9 development team – I was still working in the Automotive Department at the time) got to the last stages of the development, it became obvious that there was a problem. There was an audible resonance caused by the woofer. Some digging around resulted in the finding out that the sides of the cabinet were moving too much. In essence, the problem was almost exactly as I described with the balloon in Figure 1.
As I tried to show in Figure 4, when the woofer moved inwards, it pushed the sidewalls of the loudspeaker outwards (shown with the blue lines). When the woofer moved outwards, it sucked the sidewalls inwards (the red lines). (I’m over-simplifying here, but not enough to start a fight.)
However, as I said, this was discovered rather late in the development process. The problem had to be fixed, but the question was how to do it without starting from scratch and creating new moulding tools for making the plastic enclosure. The solution was to use the sidewalls to reinforce each other. Since the movement of the opposite sides was in opposite directions (i.e. the left and right sides of the loudspeaker either wanted to move apart or together at any given moment) they could be braced by connecting them with a bridge. Take a look again at Figure 3. You’ll see a metal rod that goes straight across the middle of the woofer enclosure. You can see it just above the back of the woofer in Figure 5 as well.
That piece became known in the acoustics department as the “dog bone” because its final version had the basic shape of a cartoon dog bone. The end result is that the rod is included in the BeoLab 9 construction to prevent the sidewalls of the woofer cabinet from moving when you’re playing loudly.
Here’s another photo showing the internal PCB with the amplifier and filters – not because it’s relevant to this discussion, but just because it might be interesting…
How is a loudspeaker cabinet like a nude ballet? (Or: so what?)
So, we’ve seen how we get rid of parts of the loudspeaker moving when they shouldn’t. The question is “why is it a big deal?” Well, it’s a little like Sir Robert Helpmann’s comment about the difficulties in choreographing a nude ballet – the problem is that some parts of the body keep moving when the other parts have stopped.
In theory, a loudspeaker should behave the same at all frequencies (that statement can mean a lot of different things – and I am happy to argue that it is both true and false, depending. So don’t try to start a fight with me on that one, and please don’t mis-quote me and say that I’m contradicting Siegfried Linkwitz or anyone else by taking that statement out of context on a hi-fi forum somewhere else…) As I mentioned in a previous posting, we like to pretend that a loudspeaker is just a moving piston in an infinite baffle, since that behaves pretty well. Of course, no one actually believes that this model is true – but it’s a comforting ideal.
Take a look at the shape of the blue and red versions of the loudspeaker cross-section in Figure 4, above. You might notice that, when the woofer goes outwards, the cabinet goes inwards. When the woofer goes inwards, the cabinet goes outwards. (this is an oversimplification of the truth, but let’s go with it for now). This means that, from the point of view of the air pressure radiating from the entire loudspeaker, the woofer goes positive and the cabinet goes negative at the same time. So, the sound pressure radiating off the cabinet cancels the sound pressure radiating off the woofer as can be seen in an example of two opposite-polarity sources causing destructive interference as in the animation below. (Note the intersection of the red and green curves where you will hear no sound at all…)
Not only that, but the cabinet as a sound source has a very different directivity than the woofer by itself. So, the result is… complicated. The truth is even more complicated, since the cabinet will not behave as nicely as this – it will resonate better at some frequencies than others, making some notes “bloom” – they’ll appear to be louder (or quieter, depending on where you are and how the room interacts with the loudspeaker) and they might even appear to come from a different direction.
So, the moral of the story here is that you want all parts of the loudspeaker to not be moving when the parts that are supposed to be moving are doing so.
And, before you go and listen to your loudspeakers and look for “blooming” and blaming it on panel resonances or vibrating cabinets, don’t forget that your room modes are more likely a primary source of mis-behaviour when it comes to some bass notes sounding different from the others. However, if you have problems with your room acoustics, we can’t fix that with ribs and dogbones – unless you also bring in a large dog. (I know, I know – the linked paper is about humans – but it does make a mention of animals in the abstract…)
Before we start talking about curves and corners, let’s have a quick review on the concept of interference. At its most fundamental level, sound is just a relatively small, relatively fast change in barometric pressure. If the instantaneous pressure is higher than average (which happens to be the same as the pressure inside your head), then your eardrum is pushed into your head. If the instantaneous pressure is lower than average, then your eardrum is pulled out of your head. When your eardrum moves in and out, you hear sound.
One way to create a high pressure is to take a loudspeaker driver and push it outwards. In order to create a low pressure, we pull it inwards, as is shown in the animation below. The red thing is a piston which is basically the way we like to pretend a loudspeaker driver (like a tweeter) behaves. The grey thing is a very, very wide loudspeaker cabinet. The red semicircles show the high pressure zones that expand outwards from the front of the loudspeaker. The green semicircles show the low pressure zones.
If you have two sound sources, their pressure differences (relative to the average pressure) add. So, if you have two high pressures arriving at your eardrum, it will be pushed farther into your head than if only one of them arrived at your ear. Similarly, if you get two low pressures arriving at your eardrum, it will be pulled further out of your head than if only one of them was present. HOWEVER, if you have a high pressure and a low pressure arriving at the same time in the same place (for example, at your eardrum) then they cancel each other and, if they have the same magnitude, your eardrum won’t move at all and you won’t hear anything. (This is how noise-cancelling headphones work. The sound from the headphones is in theory identical to the sound coming to your ears from outside the headphones, except that it’s opposite in polarity, so the sum of the two sounds is nothing.)
Keep all that in mind as you read on…
It should not come as a surprise that a sound wave will bounce off a hard surface like a flat concrete wall. The question is “why?” The answer to this question can be complicated – but the simple version is that the molecules in a concrete wall are harder to move than the molecules in air – so we have a change in acoustic impedance. This is essentially a measure of how easily the molecules in the substance are moved by a sound wave… sort of… (Let’s leave it at that, since we really don’t need this article to be a thorough discussion of acoustic impedance.)
The interesting thing is that an acoustic wave will be reflected off any change in impedance. So, you don’t have to be going from a low impedance to a high impedance (as in the case of a sound wave trying to move from air into concrete). It will also reflect on a boundary where you change from a high to a low impedance (for example, a sound wave in concrete trying to get out into air – in this case, the sound wave will bounce off the surface of the concrete and move back into it rather than “leak” out into the surrounding air.).
Imagine yourself standing in a long tunnel. You clap your hands, and the sound wave travels down the length of the tunnel until it reaches the end – what happens then? Well, the answer is “two things”. Some of the sound leaks out of the tunnel. However, since the air inside the tunnel has a higher acoustic impedance than the air outside the tunnel (because the sound is freer to go where it wants on the outside), the end of the tunnel is a boundary where there is a change in acoustic impedance. And, as we saw in the last paragraph, this means that we will get a reflection. So, even though the end of the tunnel is open, it reflects your hand clap back into the tunnel. So, some sound leaks out and some reflects. (One way to really experience this is to notice your ears pop when you enter a long tunnel on a fast-moving train. When you first enter the tunnel, your ears pop because of the sudden change in pressure. Some time later, you might notice that your ears pop again. This is because the high-pressure wave front that the train made when it entered the tunnel travelled to the opposite end of the tunnel, bounced back and hit you again.)
If you didn’t know that the second sound was a reflection off the end of the tunnel (for example, you didn’t hear the first hand clap because you were wearing earplugs) you might think that it was a direct sound from someone down at the far end of the tunnel. So, if you’re not the person doing the clapping, but you’re in the tunnel with that person, you get two sounds – the direct sound and the reflection.
There is another, less obvious case where you have a change in acoustic impedance. This is when you have a sound wave travelling along next to the surface of something, and the surface ends. For example, if, in the animation at the top of this page, the surface of the loudspeaker cabinet was not as wide, there would be a corner where the face of the loudspeaker meets its side. At that corner, the acoustic wave front “sees” a change in acoustic impedance. Consequently, there is something like a reflection that starts at the corner. in essence, the corner of the loudspeaker is a boundary that radiates like a second sound source (just like the end of the tunnel in the example above).
So, if we modify the animation at the top of the page to include a narrower cabinet, the result would be something like the animation below.
As you can see there, the corner of the loudspeaker becomes a second source that radiates its own sound waves after the original, direct sound hits it.
This effect is called acoustic diffraction and it has some significant implications on the sound of a loudspeaker. This is directly because of the interference (see above…) between the direct sound from the loudspeaker driver and the secondary sound source caused by the corner.
Remember we saw above in Fig.1 that, when you have two high pressure zones that meet each other, you get more pressure than either one of them alone. Now take a look at the animation above and look for the places where the black curves from the two “sources” intersect. This is where you’ll get an increase in pressure, and therefore more energy than just the direct sound by itself. As you can see in Fig. 3, below, this means that you have an angle off-axis to the front of the loudspeaker where the signal is louder than it is directly on-axis. Of course, this also means that there will be some angles where you hear less (because the secondary wavefront from the corner cancels the direct sound) – these are where the red and green curves (the high and low pressure zones) intersect.
As you can see in Fig 3a, there are different angles where the high pressures add to give you an even higher pressure (notice that the low pressures also add to give you an even lower pressure. The result is that, along those lines, you get constructive interference and therefore the sound is louder than it is elsewhere. I’ve only shown three such angles in this diagram – there are more. You might note as well that the origin of the high pressure lobes is not the centre of the loudspeaker driver (the piston shown in red). It’s somewhere between the primary and secondary sources (in this case, the loudspeaker driver and the corner).
Fig 3b shows two angles where the high and the low pressures overlap, causing destructive interference and therefore cancellation. Therefore, the sound is quieter along those lines than it is elsewhere.
The real world
What does all of this mean in the real world?
Well, as you’ve probably already guessed, the first conclusion is that building a loudspeaker that has sharp corners is probably a bad idea. For example, if you wanted to build a loudspeaker, and you just put a tweeter on the front and made sharp right angles where the sides meet the front, you will have a problem with diffraction off those corners. As you can see in Figure 3, you will get a boost in the signal at some angles off-axis to the front of the loudspeaker, and some cancellation at other angles. The amount by which the signal will be boosted, the angles where you’ll have the effects, and the frequencies where you’ll have the problems are all dependent on the specific dimensions of the device. For example, the further away the loudspeaker’s corner from the driver, the lower the frequency that will be affected.
Let’s take a real-world example. The very first version prototype of the BeoLab 5 was really just a “normal” three-way loudspeaker that was used to test the ABC algorithm, So, there was a woofer in a cabinet with a microphone for the ABC development, but on top was just a small cabinet with a midrange driver and a tweeter, as you can see in Figures 4 and 5.
Figure 6 shows the “conventional” tweeter cabinet version of one of the BeoLab 5 prototypes which was placed on top of the white woofer cabinet shown in Figure 4 when the Acoustic Lens assembly was removed. As you can see, this is an example of how-not-to-make-a-loudspeaker (if you’re worried about diffraction). We have a tweeter in a flat surface and (some sharp-ish corners at the sides of the loudspeaker face). The result of this is that we have exactly the same problem shown in Figure 3a and 3b, above. We can see this in the measurement of the horizontal directivity of the loudspeaker, shown in Figure 7, below.
It may be a little difficult to read this plot, so I’ll explain a little. The entire plot has been normalised to the on-axis magnitude response of the tweeter. In other words, the measurement doesn’t show the response of the tweeter – it shows how the response changes as you move around the loudspeaker in the horizontal plane. The X-axis is the frequency of the signal in Hz, ranging from 1.8 kHz to 20 kHz. The Y-axis is the horizontal angle of radiation of the loudspeaker where 0° is directly on-axis, in front the tweeter. The lines in the plot can be thought of as a kind of topographical map with a difference of 0.5 dB per contour. So, if you think of a straight “ridge” in the plot along the 0° line in the middle, the plot generally falls off (in other words, the signal is quieter) as you move around to the side and back of the loudspeaker. You can see that, at the high frequencies, the lines are closer together which means that you lose more level at high frequencies than at low frequencies as you come around to the side of the loudspeaker. This is traditionally called loudspeaker “beaming”. The interesting thing to look at are the four red oval areas. The larger ones are centred around 3.2 kHz and ±40°. The smaller ones are up at about 7.5 kHz and about ±15°. Because they’re in red, this means that they are louder than the on-axis response, so they are peaks in the topographical map. These peaks are the direct result of diffraction off the edge of the loudspeaker cabinet. I count 4 red contour lines at the lower frequency peak, which means that we have a beam that is about 2 dB (remember, 0.5 dB per line * 4 lines) louder around 3 kHz at 40° off-axis to the loudspeaker.
This cabinet was built compare the directivity of a normal box-shaped loudspeaker to one with an Acoustic Lens. A close-up of the lens used for this comparison is shown below in Figure 8.
You’ll note in Figure 8 that the Acoustic Lens is slightly different from the final version (hence the “first-generation” qualifier) . This version also suffered from diffraction artefacts caused by the sharp edges where the face of the lens structure meets its side. This was corrected in the second generation version shown below in Figure 9.
Notice that the second-generation lens has curved transitions from face to side to reduce the diffraction problem. This curvature was eventually extended to wrap around the entire structure as can be seen in the photo of the final BeoLab 5 tweeter lens in Figure 10, below.
You may notice that the difference in these two designs was that the original one had sharp corners on the sides. The diffraction effects of these corners were easily visible in the first directivity measurements of the Lens, so the second prototype with the curved transition from front to side was made to eliminate this problem. The directivity measurement of the prototype shown in Figure 9 is seen below in Figure 11.
You’ll see in Figure 11 that there are two significant differences between the directivity of a tweeter in the prototype Acoustic Lens and a conventional cabinet (shown in Figure 7). The first difference is that the beaming effect (seen as a convergence of the contour lines at the high frequencies in Figure 7) does not happen with the lens. The contour lines are much more parallel resulting in a behaviour known as “constant directivity”. This is a way of saying that the loudspeaker has a directivity that is the roughly the same throughout its entire frequency range (rather than beaming in the high end).
The second difference is that the peaks in the 3 kHz and 8 kHz areas, seen in Figure 7, are gone. This is because there are no corners at the edge of the loudspeaker cabinet to cause diffraction. You may note a peak in the magnitude responses off-axis above 15 kHz. We actually don’t know what causes this, however, since it is so high in frequency and only +1 to +1.5 dB, and since this is still only a prototype, it wasn’t really considered to be a significant issue.
So, I’ve killed two birds with one stone in this article (or “two flies with one smack” as they say in Denmark). On the one hand, we’ve seen that, if you’re worried about the directivity and/or the off-axis response of your loudspeaker (I know, the latter is a sub-set of the former…) sticking a tweeter (or a midrange, or a woofer, depending on dimensions and frequency ranges) on the front of a rectangular box is probably a really bad idea. (On the other hand, it’s a pretty easy, and therefore cheap, way to build a speaker, which is why such a design is so popular I guess…) And, on the other hand, we’ve seen one of the characteristics of Acoustic Lenses – being a more constant directivity than a tweeter-on-a-box. The fact that the tweeter mounted in an Acoustic Lens had less diffraction is not because of the Lens geometry in particular, but because of the shaping of its surroundings as part of the development process of BeoLab 5.
There are more stories like this one. For example, if you look carefully at the “plates” of the BeoLab 5 and the prototype in Figure 4 (the part the tweeter and midrange drivers are mounted in), you might notice that the prototype plates are flat, whereas the BeoLab 5 plates curve downwards. This is not because someone thought the curve would look pretty. This was because the circular edge of the prototype plates also caused diffraction, resulting in a higher-level lobe in the vertical plane. Sloping the plates downwards puts their sharp edges in the “shadow” of the plates themselves, reducing the diffraction effects. So, you can see that diffraction and its effects on directivity is one of the other issues that we worry about when we’re building a loudspeaker.
Rather than talk about technologies inside B&O equipment, this week I’ll try to go through a couple of strategies on how to properly calibrate the main channels in a surround system – and how to do it improperly, but make it sound better for your friends. I’ll use the example of a BeoPlay V1, a BeoVision 11 or a BeoSystem 4 as the heart of the system – but the basic concepts are the same for any other surround processor.
Location, location, location
The first step in setting up any surround sound system is the correct placement of your loudspeakers. There are two standard configuration recommendations. The first is from the International Telecommunications Union, in a document called Recommendation ITU-R BS.775-2 – Multichannel stereophonic sound system with and without accompanying picture (available as a PDF file from the ITU here). The second is called Recommendations for Surround Sound Production from the Producers and Engineers Wing of the Recording Academy of the National Academy of Recording Arts and Sciences (or NARAS – better known as the people that bring you the Grammys). Tha recommendation can be downloaded as a PDF file from here).
The short versions of these two recommendations are as follows:
The ITU standard configuration is the one people who do research into multichannel audio use for their experiments. It’s also the one we use at Bang & Olufsen when we’re testing our loudspeakers in the Acoustics Department or tuning the parameters in the TrueImage upmixing algorithm. The nice thing about this configuration is that it matches a surround sound system for someone who sits on a sofa placed against a wall, and has their surround loudspeakers adjacent to the same wall.
In a perfect loudspeaker configuration, all of your loudspeakers are the same distance from the listening position. They have all been calibrated to have the same loudness at the listening position. Also, they are all large, full-range loudspeakers (and therefore, you do not need a subwoofer).
The Centre Front loudspeaker should be in the centre, at the front (we’ll call that 0°). The Left Front and Right Front loudspeakers should be at ±30° relative to that angle. The Left and Right Surround loudspeakers should be located symmetrically at an angle of between ±100° and ±120°.
The ITU-775 document doesn’t specifically state the standard configuration for a 7-channel system, but it does provide a recommendation for a 5-channel system that uses 7 loudspeakers (in cases where you have a larger system and you use two loudspeakers per surround channel). However, the recommendation is still a pretty good recommendation for a 7-channel setup. (This also makes sense, since, if you have 7 loudspeakers, you may occasionally like to use them as a 5-channel system without having to place extra loudspeakers in your room.) If you dig around, you’ll see that this also fits the typical setups used in re-recording studios for doing 7-channel mixes and mastering for Blu-ray releases of films. A good example of this is Tron Legacy, which was produced using a system very much like the one shown below – with matching loudspeakers at 0°, ±30°, ±90° and ±150°. (this also makes sense from a radially symmetry perspective, since, ignoring the centre channel, you have equal loudspeaker spacings of 60°.
The NARAS recommendation is a little different, although the people that wrote it were aware of the ITU recommendation (which came first…), so they made sure that their version didn’t contradict the existing standard. Their version uses the same layout for the front three loudspeakers, but suggests that the surround loudspeakers be a little further back – within the ±110° to ±150° angle, with an “optimal range” of ±135° to ±150°.
Like the ITU standard, the NARAS document also recommends that all loudspeakers be the same type of full-range loudspeakers, all the same distance from the listening position, all level-adjusted to be the same at the listening position.
The Real World
Both the ITU and the NARAS standards are really designed by and for people who work and live in recording studios or run perceptual experiments involving multichannel audio. This means that they have one chair and no friends – at least when they watch movies and listen to music… However, if you have a sofa and friends, then you will start having some questions – or at least some doubts.
For example, if you are “normal” (whatever that might mean) but a little careful about your surround sound setup, you probably have something that looks like the drawing below.
What happens if we were to calibrate this system “perfectly” using the centre of the sofa as our reference “sweet spot” as shown in Figure 4? We’d apply a delay to the Centre Front loudspeaker to make the time of arrival of its signals match the Left Front and Right Front loudspeakers (usually done by setting the Speaker Distance). We’d also apply a delay to the surround loudspeakers to do the same. We’d also probably drop the levels of the centre and surround loudspeakers to match the Left Front and Right Front signals (because they’re closer, and therefore louder).
However, let’s think about what happens if you sit on the left side of that sofa? Now, the Left Surround loudspeaker is very close to your left ear – and that has some serious implications on your experience. Firstly, since sound pressure doubles with every halving of distance, (assuming that this diagram is to scale) then sitting on the left side of the sofa means that you’ll get roughly a 6 dB boost (possibly more, if you’re leaning…) in the signal from that one loudspeaker. In addition, since that loudspeaker is so close and arriving at your listening position early, your brain will be able to figure out that the loudspeaker is close because you’re pretty good at localising sources when they’re near your head. The same problem, albeit on a much smaller scale, happens with the centre loudspeaker. If its time-alignment delay is calibrated using the centre position, then, if you’re sitting on the left side of the sofa, then the Left Front loudspeaker’s signal will arrive before the Centre Front. The end result of this is that, if you’re sitting on the side of the sofa, you’ll have too much from one of the surround loudspeakers and the intelligibility of the dialogue will be reduced a little.
So, how should we calibrate the system to make things a little better for your friends? Take a look at Figure 5, below.
What I’m trying to show with this diagram is that both the distance and the level for each loudspeaker should be measured to the closest person in your listening area. So, in this case, the Left Front and Left Surround loudspeakers are calibrated to the left position on the sofa. However, the Centre Front loudspeaker is calibrated at the centre of the sofa. The result of this is that the centre speaker will be delayed – but less than it would have been if you had calibrated it as in Figure 4, because the Left Front loudspeaker is closer to the person on the left side of the sofa than to the person in the centre of the sofa. Also, the Surround loudspeakers will be delayed much more than they would have been using the scheme in Figure 4. However, they’ll still be symmetrical (so the person in the “sweet spot” won’t feel like the surround channels are lopsided, and the friends on the sides of the sofa won’t notice that they’re sitting on top of a loudspeaker… Also, this will result in the centre channel being a bit louder and the surround channels being a little lower in level – both of which are technically incorrect for the person in the sweet spot, but at least it’s a mistake in the right direction – so you’re improving intelligibility of the dialogue
If you do calibrate the system this way, you’ll technically be incorrectly calibrated at the sweet spot, but your friends on the sides of the sofa will be much happier – and you won’t notice too much. Of course, if you have a BeoPlay V1, a BeoVision 11 or a BeoSystem 4, you can make this configuration just one of your nine available Speaker Groups – you can always use another one for a “perfect” calibration for the sweet spot when you’re home alone.
If, after aligning your system using this method, you still find that the dialogue is a little hard to understand, and the surrounds are a little hot (this is often the case when your sofa and the surround loudspeakers are all situated against the same wall, you should not be afraid to do the following:
make the centre channel one or two milliseconds early. You can do this by telling your surround processor that it’s about 30 to 60 cm farther away than it really is.
raise the level of the centre channel 1 or 2 dB
drop the level of the surrounds as much as necessary – in my experience, it’s not unusual to have to drop them by as much as 6 dB if you’re against the same wall with them. (Note that, if you have a BeoPlay V1, a BeoVision 11 or a BeoSystem 4, you can do this using the “Fader” adjustment in the Sound menus. This will merely control the relative levels of the Front and Surround / Back loudspeakers – so it’s a one-fader solution to doing it manually for each loudspeaker output.)
If your listening area is larger, the technique is the same – you calibrate any given loudspeaker in the system to the closest listening position, and then tweak to taste.
I guess that the big message here is “just because your system is configured ‘correctly’ doesn’t mean that it can’t sound better”. Don’t be more afraid to tweak the adjustments on your calibration than you would be to add cream and sugar to your coffee, or salt and pepper to your meal in a restaurant. As Duke Ellington once said: “If it sounds good, it is good.”