Internal vs. External Volume Control

#93 in a series of articles about the technology behind Bang & Olufsen

A question came to my desk this week from a customer who would like to connect a third-party streaming device to his Beolab 50s. He plans to use a USB-Audio connection and his question was “Should I control the volume of the audio signal in the streamer or in the Beolab 50s?” There are three different ways to configure these two options:

  1. Control the volume in the streamer using its interface, and send a signal that has been volume-regulated to the Beolab 50s, which should then be set to have a start up default volume such that the maximum volume on the streamer results in a level that is as loud as the customer will ever want it to be. In order to do this, the Beolab 50s need to be set to ignore the volume information that is received on the USB-Audio connection.
  2. Set the streamer to output an unregulated signal, and set the Beolab 50s to obey the volume information that is received on the USB-Audio connection, then use the streamer’s interface for the volume control (which would actually be happening inside the Beolab 50s).
  3. Set the streamer to output an unregulated signal, and set the Beolab 50s to disobey the volume information that is received on the USB-Audio connection, then use the Beolab 50’s interface for the volume control (which would actually be happening inside the Beolab 50s).

Of course, one way to answer the question is “where do you want to control the volume?” For example, if it’s with a remote control for the Beolab 50s, then the answer is “use option #3”. If you’d prefer to use the streamer’s app, for example, then the answer is “use option #1 or #2”.

However, the question came to my desk because it was specifically about the technical performance of the audio signal. Which of these three options results in the highest audio “quality”? (I put the word “quality” in quotation marks because it is a loaded term, and might mean different things to different persons…)

The simplest answer without getting into any details is “it probably doesn’t matter“. However, that answer is based on a couple of assumptions that may or may not be wrong.

Hypothetically, the Beolab 50 can output an audio signal that peaks at about 122 dB SPL measured at 1 m in a free field, albeit not at all frequencies present at its output. (This is because there are some physical limitations of how far the woofers can move, which means that you can’t get 122 dB SPL at 20 Hz, for example.) The noise floor of the Beolab 50s is about 0 dB SPL measured in the same place (again, this is frequency-dependent). So, it has a total dynamic range at its output of about 122 dB.

The maximum output level is a result of a combination of the loudspeaker drivers, the amplifiers, and the power supply, however, these have all been chosen to reach their maximum outputs approximately simultaneously, so changing one of the three won’t make a big difference.

The noise floor is a result of the combination of the loudspeaker drivers’ sensitivities, the amplifiers’ noise floors, and the signal that feeds the amplifiers: the DAC outputs’ noise floors. For the purposes of this discussion, I’m sticking with a digital input, so we don’t need to worry about the noise floor of the ADC at the loudspeaker’s input.

If you have an audio signal at one of the digital inputs of the Beolab 50, and that signal is at its loudest possible level (for a sine wave, that’s 0 dB FS; or 0 dB relative to Full Scale). At Beolab 50’s maximum volume setting, this will produce a peak output level of 122 dB SPL (depending on the frequency as I mentioned above).

All digital inputs of the Beolab 50 accept at least a 24 bit word length. This means that the dynamic range of the digital input signal itself is about 6 * 24 – 3 = 141 dB. This in turn means that the hypothetical noise floor of a correctly-dithered 24-bit signal is 19 dB below the noise floor of the loudspeakers even at their maximum volume setting. (because 122 – 141 = -19)

In other words, if we assume that the streamer has a correctly-implemented gain function for its volume control, using TPDF dither implemented at the 24-bit level, then its noise floor will be 19 dB below the “natural” noise floor of the Beolab 50. Therefore, if the volume is controlled in the streamer, any artefacts will be masked by the 50s themselves.

On the other hand, the Beolab 50s volume control is done using a gain function that is performed in a 32-bit floating point calculation, which means that it has a dynamic range of 144 to 150 dB. (See this posting for an explanation and comparison of fixed point and floating point systems.) So the noise generated by the internal volume control will be somewhere between 22 and 26 dB below the “natural” noise floor of the Beolab 50.

So, (assuming my assumptions are correct) the noise floor that is produced by controlling the volume control in either the streamer or the Beolab 50s is FAR below the constant noise floor of the DAC / amplifiers.

In addition, the noise floors have roughly the same spectra (in other words, you don’t have pink noise in one case but white noise in the other; they’re all producing white noise). And since both are so far below, it really doesn’t matter. Arguing about whether the noise is 19 dB lower or 22 dB lower is a waste of good argument time, unless you paid for the four-and-a-half-hour argument instead of the five-minute one…

Important Notes

If the customer was asking about using the analogue input, then the answer MIGHT have been different.

Also, if my assumption about a 24-bit signal coming from the streamer, or that it has a correctly-implemented gain function for its volume control are incorrect, the this answer MIGHT be incorrect as well.

Turntable speed adjustment disc

One of the things on my to-do list today was to get a Bang & Olufsen Stereopladespiller Type 42 up and running. Unfortunately, I didn’t have a stroboscopic disc for testing the speed. Since a quick search on the Internet didn’t turn up anything I liked, I decided to make my own.If you’d like to download it, it’s available here as a PDF file for A4 paper, and contains the lines for 50 Hz and 60 Hz mains. You can change the magnification to make it fit on different paper sizes, or to increase or decrease the size of the disc. If your magnification is the same in the X and Y axes, then it won’t change anything.

This meant that I had to do a little math, which goes as follows:

mains_frequency = 50 Hz (this is the rate at which the lights blink)

rpm = 33+1/3

revolutions_per_second = rpm / 60 = 0.555…

revolutions_per_light_blink = revolutions_per_second / mains_frequency = 0.0111…

degrees_rotation_per_blink = 360 * revolutions_per_light_blink = 4º

So, here in Denmark where we have 50 Hz mains, I needed to make a disc with a line every 4º. Since I use a Mac, I used graphic.app to do this, but any decent drawing program will do the trick.

If you want to make your own disc, and you don’t want to do the math, here are the results of the possible mains frequencies and revolution speeds

RPM50 Hz60 Hz
161.921.60
33 1/34.003.3333…
455.3999…4.50
789.367.80

For anyone who knows a thing or two about the Type 42… then I’m already ahead of you. I know that the lines are built into the turntable mat itself. However, I was working in pretty bright daylight, and so I needed more contrast on the lines to be able to see the interference from the lighting. And besides, it was fun as a little light recreational math.

Tonearm alignment and tracking error

The June 1980 issue of Audio Magazine contains an article written by Subir K. Pramanik called “Understanding Tonearms”. This is a must-read tutorial for anyone who is interested in the design and behaviour of radial tonearms.

One of the things Pram talked about in that article concerned the already well-known relationship between tonearm geometry, its mounting position on the turntable, and the tracking error (the angular difference between the tangent to the groove and the cantilever axis – or the rotation of the stylus with respect to the groove). Since the tracking error is partly responsible for distortion of the audio signal, the goal is to minimise it as much as possible. However, without a linear-tracking system (or an infinitely long tonearm), it’s impossible to have a tracking error of 0º across the entire surface of a vinyl record.

One thing that is mentioned in the article is that “Small errors in the mounting distance from the centre of the platter … can make comparatively large differences in angular error” So I thought that I’d do a little math to find out this relationship.

The article contains the diagram shown below, showing the information required to do the calculations we’re interested in. In a high-end turntable, the Mounting Distance (d) can be varied, since the location of the tonearm’s bearing (the location of the pivot point) is adjustable, as can be seen in the photo above of an SME tonearm on a Micro Seiki turntable.

The tonearm’s Effective Length (l) and Offset Angle (y) are decided by the manufacturer (assuming that the pickup cartridge is mounted correctly). The Minimum and Maximum groove radius are set by international standards (I’ve rounded these to 60 mm and 149 mm respectively). The Radius (r) is the distance from the centre of the LP (the spindle) to the stylus at any given moment when playing the record.

In a perfect world, the tracking error would be 0º at all locations on the record (for all values of r from the Maximum to the Minimum groove radii) which would make the cantilever align with the tangent to the groove. However, since the tonearm rotates around the bearing, the tracking error is actually the angle x (in the diagram above) subtracted from the offset angle. “X” can be calculated using the equation:

x = asin ((l2 + r2 – d2) / (2 l r))

So the tracking error is

Tracking Error = y – asin ((l2 + r2 – d2) / (2 l r))

Just as one example, I used the dimensions of a well-known tonearm as follows:

  • Effective Length (l) : 233.20 mm
  • Mounting Distance (d) : 215.50 mm
  • Offset angle (y) : 23.63º

Then the question is, if I make an error in the Mounting Distance, what is the effect on the Tracking Error? The result is below.

If we take the manufacturer’s recommendation of d = 215.4 mm as the reference, and then look at the change in that Tracking Error by mounting the bearing at the incorrect distance in increments of 0.2 mm, then we get the plot below.

So, as you can see there, a 0.2 mm error in the location of the tonearm bearing (which, in my opinion, is a very small error…) results in a tracking error difference of about 0.2º at the minimum groove radius.

If I increase the error to increments of 1 mm (± 5mm) then we get similar plots, but with correspondingly increased tracking error.

If you go back and take a look at the equation above, you can see that the change in the tracking error is constant with the Offset Angle (unlike its relationship with an error in the location of the tonearm bearing, which results in a tracking error that is NOT constant). This means that if you mount your pickup on the tonearm head shell with a slight error in its angle, then this angular error is added to the tracking error as a constant value, regardless of the location of the stylus on the surface of the vinyl, as shown below.

Volume controls vs. Output levels

#92 in a series of articles about the technology behind Bang & Olufsen

One question people often ask about B&O loudspeakers is something like ”Why doesn’t the volume control work above 50%?”.

This is usually asked by someone using a small loudspeaker listening to pop music.

There are two reasons for this, related to the facts that there is such a wide range of capabilities in different Bang & Olufsen loudspeakers AND you can use them together in a surround or multiroom system. In other words for example, a Beolab 90 is capable of playing much, much more loudly than a Beolab 12; but they still have to play together.

Let’s use the example of a Beolab 90 and a Beolab 12, both playing in a surround configuration or a multiroom setup. In both cases, if the volume control is set to a low enough level, then these two types of loudspeakers should play at the same output level. This is true for quiet recordings (shown on the left in the figure below) and louder recordings (shown on the right).

However, if you turn up the volume control, you will reach an output level that exceeds the capability of the Beolab 12 for the loud song (but not for the quiet song), shown in the figure below. At this point, for the loud song, the Beolab 12 has already begun to protect itself.

Once a B&O loudspeaker starts protecting itself, no matter how much more you turn it up, it will turn itself down by the same amount; so it won’t get louder. If it did get louder, it would either distort the sound or stop working – or distort the sound and then stop working.

If you ONLY own Beolab 12s and you ONLY listen to loud songs (e.g. pop and rock) then you might ask “why should I be able to turn up the volume higher than this?”.

The first answer is “because you might also own Beolab 90s” which can go louder, as you can see in the right hand side of the figure above.

The second answer is that you might want to listen to quieter recording (like a violin solo or a podcast). In this case, you haven’t reached the maximum output of even the Beolab 12 yet, as you can see in the left hand side of the figure above. So, you should be able to increase the volume setting to make even the quiet recording reach the limits of the less-capable loudspeaker, as shown below.

Notice, however, that at this high volume setting, both the quiet recording and the loud recording have the same output level on the Beolab 12.

So, the volume allows you to push the output higher; either because you might also own more capable loudspeakers (maybe not today – but some day) OR because you’re playing a quiet recording and you want to hear it over the sound of the exhaust fan above your stove or the noise from your shower.

It’s also good to remember that the volume control isn’t an indicator of how loud the output should be. It’s an indicator of how much quieter or louder you’re making the input signal.

The volume control is more like how far down you’re pushing the accelerator in your car – not the indication of the speedometer. If you push down the accelerator 50% of the way, your actual speed is dependent on many things like what gear you’re in, whether you’re going uphill or downhill, and whether you’re towing a heavy trailer. Similarly Metallica at volume step 70 will be much louder than a solo violin recording at the same volume step, unless you are playing it through a loudspeaker that reached its maximum possible output at volume step 50, in which case the Metallica and the violin might be the same level.

Note 1: For all of the above, I’ve said “quiet song” and “loud song” or “quiet recording” and “loud recording” – but I could just have easily as said “quiet part of the song” and “loud part of the song”. The issue is not just related to mastering levels (the overall level of the recording) but the dynamic range (the “distance” between the quietest and the loudest moment of a recording).

Note 2: I’ve written a longer, more detailed explanation of this in Posting #81: Turn it down half-way.

B&O Pickup stylus comparison

Below are four photos taken with the same magnification.

The top two photos are a Bang & Olufsen SP2 pickup, compatible with the 25º tonearm on a Type 42 “Stereopladespiller”.

The bottom two are a rather dirty Bang & Olufsen MMC 1/2 pickup, compatible with a range of turntables including the Beogram 4500, for example.

The yellow grid lines have a 0.50 mm spacing.

What is a “virtual” loudspeaker? Part 3

#91.3 in a series of articles about the technology behind Bang & Olufsen

In Part 1 of this series, I talked about how a binaural audio signal can (hypothetically, with HRTFs that match your personal ones) be used to simulate the sound of a source (like a loudspeaker, for example) in space. However, to work, you have to make sure that the left and right ears get completely isolated signals (using earphones, for example).

In Part 2, I showed how, with enough processing power, a large amount of luck (using HRTFs that match your personal ones PLUS the promise that you’re in exactly the correct location), and a room that has no walls, floor or ceiling, you can get a pair of loudspeakers to behave like a pair of headphones using crosstalk cancellation.

There’s not much left to do to create a virtual loudspeaker. All we need to do is to:

  • Take the signal that should be sent to a right surround loudspeaker (for example) and filter it using the HRTFs that correspond to a sound source in the location that this loudspeaker would be in. REMEMBER that this signal has to get to your two ears since you would have used your two ears to hear an actual loudspeaker in that location.
  • Send those two signals through a crosstalk cancellation processing system that causes your two loudspeakers to behave more like a pair of headphones.
Figure 1: A block diagram of the system described above.

One nice thing about this system is that the crosstalk cancellation is only there to ensure that the actual loudspeakers behave more like headphones. So, if you want to create more virtual channels, you don’t need to duplicate the crosstalk cancellation processor. You only need to create the binaurally-processed versions of each input signal and mix those together before sending the total result to the crosstalk cancellation processor, as shown below.

Figure 2: You only need one crosstalk cancellation system for any number of virtual channels.

This is good because it saves on processing power.

So, there are some important things to realise after having read this series:

  • All “virtual” loudspeakers’ signals are actually produced by the left and right loudspeakers in the system. In the case of the Beosound Theatre, these are the Left and Right Front-firing outputs.
  • Any single virtual loudspeaker (for example, the Left Surround) requires BOTH output channels to produce sound.
  • If the delays (aka Speaker Distance) and gains (aka Speaker Levels) of the REAL outputs are incorrect at the listening position, then the crosstalk cancellation will not work and the virtual loudspeaker simulation system won’t work. How badly is doesn’t work depends on how wrong the delays and gains are.
  • The virtual loudspeaker effect will be experienced differently by different persons because it’s depending on how closely your actual personal HRTFs match those predicted in the processor. So, don’t get into fights with your friends on the sofa about where you hear the helicopter…
  • The listening room’s acoustical behaviour will also have an effect on the crosstalk cancellation. For example, strong early reflections will “infect” the signals at the listening position and may/will cause the cancellation to not work as well. So, the results will vary not only with changes in rooms but also speaker locations.

Finally, it’s worth nothing that, in the specific case of the Beosound Theatre, by setting the Speaker Distances and Speaker Levels for the Left and Right Front-firing outputs for your listening position, then you have automatically calibrated the virtual outputs. This is because the Speaker Distances and Speaker Levels are compensations for the ACTUAL outputs of the system, which are the ones producing the signal that simulate the virtual loudspeakers. This is the reason why the four virtual loudspeakers do not have individual Speaker Distances and Speaker Levels. If they did, they would have to be identical to the Left and Right Front-firing outputs’ values.

What is a “virtual” loudspeaker? Part 2

#91.2 in a series of articles about the technology behind Bang & Olufsen

In Part 1, I talked at how a binaural recording is made, and I also mentioned that the spatial effects may or may not work well for you for a number of different reasons.

Let’s go back to the free field with a single “perfect” microphone to measure what’s happening, but this time, we’ll send sound out of two identical “perfect” loudspeakers. The distances from the loudspeakers to the microphone are identical. The only difference in this hypothetical world is that the two loudspeakers are in different positions (measuring as a rotational angle) as shown in Figure 1.

Figure 1: Two identical, “perfect” loudspeakers in a free field with a single “perfect” microphone.

In this example, because everything is perfect, and the space is a free field, then output of the microphone will be the sum of the outputs of the two loudspeakers. (In the same way that if your dog and your cat are both asking for dinner simultaneously, you’ll hear dog+cat and have to decide which is more annoying and therefore gets fed first…)

Figure 2: The output from the microphone is the sum of the outputs from the two loudspeakers. At any moment in time, the value of the top plot + the value of the middle plot = the value of the bottom plot.

IF the system is perfect as I described above, then we can play some tricks that could be useful. For example, since the output of the microphone is the sum of the outputs of the two loudspeakers, what happens if the output of one loudspeaker is identical to the other loudspeaker, but reversed in polarity?

Figure 3: If the output of Loudspeaker 1 is exactly the same as the output of Loudspeaker 2 except for polarity, then the sum (the output of the microphone) is always 0.

In this example, we’re manipulating the signals so that, when they add together, you nothing at the output. This is because, at any moment in time, the value of Loudspeaker 2’s output is the value of Loudspeaker 1’s output * -1. So, in other words, we’re just subtracting the signal from itself at the microphone and we get something called “perfect cancellation” because the two signals cancel each other at all times.

Of course, if anything changes, then this perfect cancellation won’t work. For example, if one of the loudspeakers moves a little farther away than the other, then the system is broken, as shown below.

Figure 4: A small shift in time in the output of Loudspeaker 2 cases the cancellation to stop working so well.

Again, everything that I’ve said above only works when everything is perfect, and the loudspeakers and the microphone are in a free field; so there are no reflections coming in and ruining everything.

We can now combine these two concepts:

  1. using binaural signals to simulate a sound source in a location (although this would normally be done using playback over earphones to keep it simple) and
  2. using signals from loudspeakers to cancel each other at some location in space as a

to create a system for making virtual loudspeakers.

Let’s suspend our adherence to reality and continue with this hypothetical world where everything works as we want… We’ll replace the microphone with a person and consider what happens. To start, let’s just think about the output of the left loudspeaker.

Figure 5: The output of the left loudspeaker reaches both ears with different time/frequency characteristics caused by the HRTF associated with that sound source location.

If we plot the impulse responses at the two ears (the “click” sound from the loudspeaker after it’s been modified by the HRTFs for that loudspeaker location), they’ll look like this:

Figure 6: The impulse responses of the HRTFs for a sound source at 30º left of centre.

What if were were able to send a signal out of the right loudspeaker so that it cancels the signal from the left loudspeaker at the location of the right eardrum?

Figure 7: What if we could cancel the signal from the left loudspeaker at the right ear using the right loudspeaker?

Unfortunately, this is not quite as easy as it sounds, since the HRTF of the right loudspeaker at the right ear is also in the picture, so we have to be a bit clever about this.

So, in order for this to work we:

  • Send a signal out of the left loudspeaker.
    We know that this will get to the right eardrum after it’s been messed up by the HRTF. This is what we want to cancel…
  • …so we take that same signal, and
    • filter it with the inverse of the HRTF of the right loudspeaker
      (to undo the effects of the HRTF of the right loudspeaker’s signal at the right ear)
    • filter that with the HRTF of the left loudspeaker at the right ear
      (to match the filtering that’s done by your head and pinna)
    • multiply by -1
      (so that it will cancel when everything comes together at your right eardrum)
    • and send it out the right loudspeaker.

Hypothetically, that signal (from the right loudspeaker) will reach your right eardrum at the same time as the unprocessed signal from the left loudspeaker and the two will cancel each other, just like the simple example shown in Figure 3. This effect is called crosstalk cancellation, because we use the signal from one loudspeaker to cancel the sound from the other loudspeaker that crosses to the wrong side of your head.

This then means that we have started to build a system where the output of the left loudspeaker is heard ONLY in your left ear. Of course, it’s not perfect because that cancellation signal that I sent out of the right loudspeaker gets to the left ear a little later, so we have to cancel the cancellation signal using the left loudspeaker, and back and forth forever.

If, at the same time, we’re doing the same thing for the other channel, then we’ve built a system where you have the left loudspeaker’s signal in the left ear and the right loudspeaker’s signal in the right ear; just like a pair of headphones!

However, if you get any of these elements wrong, the system will start to under-perform. For example, if the HRTFs that I use to predict your HRTFs are incorrect, then it won’t work as well. Or, if things aren’t time-aligned correctly (because you moved) then the cancellation won’t work.

on to Part 3

What is a “virtual” loudspeaker? Part 1

#91.1 in a series of articles about the technology behind Bang & Olufsen

Without connecting external loudspeakers, Bang & Olufsen’s Beosound Theatre has a total of 11 independent outputs, each of which can be assigned any Speaker Role (or input channel). Four of these are called “virtual” loudspeakers – but what does this mean? There’s a brief explanation of this concept in the Technical Sound Guide for the Theatre (you’ll find the link at the bottom of this page), which I’ve duplicated in a previous posting. However, let’s dig into this concept a little more deeply.

To begin, let’s put a “perfect” loudspeaker in a free field. This means that it’s in a space that has no surfaces to reflect the sound – so it’s an acoustic field where the sound wave is free to travel outwards forever without hitting anything (or at least appear as this is the case). We’ll also put a “perfect” microphone in the same space.

Figure 1: A loudspeaker and a microphone (the circle) in a free field: an infinite space completely free of reflective surfaces.

We then send an impulse; a very short, very loud “click” to the loudspeaker. (Actually a perfect impulse is infinitely short and infinitely loud, but this is not only inadvisable but impossible, and probably illegal.)

Figure 2: The “click” signal that’s sent to the input of the loudspeaker.

That sound radiates outwards through the free field and reaches the microphone which converts the acoustic signal back to an electrical one so we can look at it.

Figure 3: The “click” signal that is received at the microphone’s location and sent out as an electrical signal.

There are three things to notice when you compare Figure 3 to Figure 2:

  • The signal’s level is lower. This is because the microphone is some distance from the loudspeaker.
  • The signal is later. This is because the microphone is some distance from the loudspeaker and sound waves travel pretty slowly.
  • The general shape of the signals are identical. This is because I said that the loudspeaker and the microphone were both “perfect” and we’re in a space that is completely free of reflections.

What happens if we take away the microphone and put you in the same place instead?

Figure 4: The microphone has been replaced by something more familiar.

If we now send the same click to the loudspeaker and look at the “outputs” of your two eardrums (the signals that are sent to your brain), these will look something like this:

Figure 5: The outputs of your two eardrums with the same “click” signal from the loudspeaker.

These two signals are obviously very different from the one that the microphone “hears” which should not be a surprise: ears aren’t microphones. However, there are some specific things of which we should take note:

  • The output of the left eardrum is lower than that of the right eardrum. This is largely because of an effect called “head shadowing” which is exactly what it sounds like. The sound is quieter in your left ear because your head is in the way.
  • The signal at the right eardrum is earlier than at the left eardrum. This is because the left eardrum is not only farther away, but the sound has to go around your head to get there.
  • The signal at the right eardrum is earlier than the output of the microphone output (in Figure 3) because it’s closer to the loudspeaker. (I put the microphone at the location of the centre of the simulated head.) Similarly the left ear output is later because it’s farther away.
  • The signal at the right eardrum is full of spikes. This is mostly caused by reflections off the pinna (the flappy thing on the side of your head that you call your “ear”) that arrive at slightly different times, and all add together to make a mess.
  • The signal at the left eardrum is “smoother”. This is because the head itself acts as a filter reducing the levels of the high frequency content, which tends to make things less “spiky”.
  • Both signals last longer in time. This is the effect of the ear canal (the “hole” in the side of your head that you should NOT stick a pencil in) resonating like a little organ pipe.

The difference between the signals in Figures 2 and 4 is a measurement of the effect that your head (including your shoulders, ears/pinnae) has on the transfer of the sound from the loudspeaker to your eardrums. Consequently, we geeks call it a “head-related transfer function” or HRTF. I’ve plotted this HRTF as a measurement of an impulse in time – but I could have converted it to a frequency response instead (which would include the changes in magnitude and phase for different frequencies).

Here’s the cool thing: If I put a pair of headphones on you and played those two signals in Figure 5 to your two ears, you might be able to convince yourself that you hear the click coming from the same place as where that loudspeaker is located.

Although this sounds magical, don’t get too excited right away. Unfortunately, as with most things in life, reality tends to get in the way for a number of reasons:

  • Your head and ears aren’t the same shape as anyone else’s. Your brain has lived with your head and your ears for a long time, and it’s learned to correlate your HRTFs with the locations of sound sources. If I suddenly feed you a signal that uses my HRTFs, then this trick may or may not work, depending on how similar we are. This is just like borrowing someone else’s glasses. If you have roughly the same prescription, then you can see. However, if the prescriptions are very different, you’ll get a headache very quickly.
  • In reality, you’re always moving. So, even if the sound source is not moving, the specific details of the HRTFs are always changing (because the relative positions and angles to your ears are changing) but my system doesn’t know about this – so I’m simulating a system where the loudspeaker moves around you as you rotate your head. Since this never happens in real life, it tends to break the simulation.
  • The stuff I showed above doesn’t include reflections, which is how you determine distance to sources. If I wanted to include reflections, each reflection would have to have its own HRTF processing, depending on its angle relative to your head.

However, hypothetically, this can work, and lots of people have tried. The easiest way to do this is to not bother measuring anything. You just take a “dummy head” -a thing that is the same size as an average human head (maybe with an average torso) and average pinnae* – but with microphones where the eardrums are – and you plunk it down in a seat in a concert hall and record the outputs of the two “ears”. You then listen to this over earphones (we don’t use headphones because we want to remove your pinnae from the equation) and you get a “you are there” experience (assuming that the dummy head’s dimensions and shape are about the same as yours). This is what’s known as a binaural recording because it’s a recording that’s done with two ears (instead of two or more “simple” microphones).

If you want to experience this for yourself, plug a pair of headphones into your computer and do a search for the “Virtual Barber Shop” video. However, if you find that it doesn’t work for you, don’t be upset. It just means that you’re different: just like everyone else.* Typically, recordings like this have a strange effect of things sounding very close in the front, and farther away as sources go to the sides. (Personally, I typically don’t hear anything in the front. All of the sources sound like they’re sitting on the back of my neck and shoulders. This might be because I have a fat head (yes, yes… I know…) and small pinnae (yes, yes…. I know…) – or it might indicate some inherent paranoia of which I am not conscious.)

* Of course, depressingly typically, it goes without saying that the sizes and shapes of commercially-available dummy heads are based on averages of measurements of men only. Neither women nor children are interested in binaural recordings or have any relevance to such things, apparently…

on to Part 2