Click here to purchase the entire book in PDF format.



next up previous contents index
Next: Practical Implementation Up: Ambisonics Previous: Ambisonics   Contents   Index

Theory

Go to a room (you may already be in one...) and put a perfect omnidirectional microphone in it. As we discussed in Section 6.7, an omnidirectional microphone is also known as a pressure transducer which means that it responds to the changes in air pressure at the diaphragm of the microphone. If you make a perfect recording of the output of the perfect omnidirectional microphone when stuff is happening in the room, you have captured a record (here, I'm using the word ``record'' as in a historical record, not as in a record that you buy at a record shop from a record lady[Lovett, 1994]) of the change in pressure over time at that location in that room on that day. If, at a later date, you play back that perfect recording over a perfect loudspeaker in a perfectly anechoic space, then you will hear a perfect representation (think ``re-presentation'') of that historical record. Interestingly, if you have a perfect loudspeaker and you're in a perfectly anechoic space, then what you hear from the playback is exactly what the microphone ``heard'' when you did the recording.

This is a good idea, however, let's take it a step farther. Since a pressure transducer has an omnidirectional polar pattern, we don't have any information regarding the direction of travel of the sound wavefront. This information is contained in the velocity of the pressure wave (which is why a single directional microphone of any sort must have a velocity component). So, let's put up a perfect velocity microphone in the same place as our perfect pressure microphone. As we saw in Section 6.7 a velocity microphone (if we're talking about directional characteristics and not transducer design) is a bidirectional microphone. Great, so we put a bidirectional mic facing forward so we can tell if the wave is coming from the front or the rear. If the outputs of the omni and the bidirectional have the same polarity, then the sound source is in the front. If they're opposite polarity, then the sound source is in the rear. Also, we can see from the relative levels of the two mic outputs what the angle to the sound source is, because we know the relative sensitivities of the two microphones. For example, if the level is 3 dB lower in the bidirectional than the omni and both have the same polarity, then the sound source must be 45$^{\circ }$ away from directly forward. The problem is that we don't know if it's to the left or the right. This problem is easily solved by putting in another bidirectional microphone facing to the side. Now we can tell, using the relative polarities and outputs of the three microphones where the sound source is... but we can't tell what the sound source elevation is. Again, no problem, we'll just put in a bidirectional facing upwards.

So, with a single omni and three bidirectionals facing forward, to the right and upwards, we can derive all sorts of information about the location of the sound source. If all four microphones have the same polarity, and the outputs of the three bidirectionals are each 3 dB below the output of the omni, then the sound source must be 45$^{\circ }$ to the right and 45$^{\circ }$ up from the microphone array.

Figure 10.148: Top views of a two-dimensional version of the system described in the text. These are three examples showing the relationship between the outputs of the omnidirectional and two of the bidirectional microphones for sound sources in various locations producing a positive impulse.
\includegraphics[width=5in]{10recording/graphics/ambisonics_01}

Take a look at the top example in Figure 10.148. We can see here that if the sound source is directly in front of the microphone array, then we get equal positive outputs from the omni (we'll call this the W channel) and the forward-facing bidirectional (we'll call that one the Y channel) and nothing from the side-facing bidirectional (the X channel). Also, if I had been keen and did a 3D diagram and drawn the upwards-facing bidirectional (the Z channel), we'd see that there was no signal from that one either if the sound source is on the same horizontal plane as the microphones.

Let's record the pressure wave (using the omni mic) and the velocity, and therefore the directional information (using the three bidirectional mic's) on a perfect four-channel recorder. Can we play these channels back to reproduce all of that information in our anechoic listening room? Take a look at Figure 10.149.

Figure 10.149: A simple configuration for playing back the information captured by the three microphones in Figure 10.148.
\includegraphics[width=2.75in]{10recording/graphics/ambisonics_playback_01}

Let's think about what happens to the sound source in the top example in Figure 10.148 if we play back the W, X, and Y channels through the system in Figure 10.149. In this system, we have four identical loudspeakers placed at 0$^{\circ }$, 180$^{\circ }$, and $\pm $90$^{\circ }$. These loudspeakers are all identical distances from the sweet spot.

The top example in Figure 10.148 results in a positive spike in both the W and Y channels, and nothing in the X channel. As a result, in the playback system:

  • the front loudspeaker produces a positive pressure.
  • The two side speakers produce equal positive pressures that are one-third the outputs of the front (because there's nothing in the X channel and they don't play the Y channel.
  • Finally, the rear speaker produces a negative pressure at one-third the output of the front loudspeaker because the information in the W and the negative Y channels cancel each other a little when they're mixed together at the speaker, but the negative signal is louder.

The loudspeakers produce a signal at exactly the same time, and the different waves will propagate towards the sweet spot at the same speed. At the sweet spot, the waves all add together (think of adding vectors together) to produce a resulting pressure wave that has a velocity that is moving towards the rear loudspeaker (because the two side speakers push equally against each other, so there's no sideways velocity, and because the front speaker is pushing towards the rear one which is pulling the wave towards itself).

If we used perfect microphones and a perfect recording system and perfect loudspeakers, the result, at the sweet spot in the listening room, is that the sound wave has exactly the same pressure and velocity components as the original wave that existed at the microphones' position a the time of the recording.

Consequently, we say that we have re-created the soundfield in the recording space. If we pretend that the sound wave has only two components, the pressure and the velocity, then our perfect system perfectly duplicates reality.

As an exercise, before you keep reading, you might want to consider what will come out of the loudspeakers for the other two examples in Figure 10.148.

So far, what we've got here is a simple first-order Ambisonics system. The collective outputs of the four microphones is what as known as an Ambisonics B-Format signal. Notice that the B-Format signal contains all four channels. If we wanted to restrict ourselves to just the horizontal plane, we can legally leave out the Z-channel (the upwards-facing bidirectional). This is legal because most people don't have loudspeakers in their ceiling and floors... not good ones anyways... The Ambisonics people have fancy names for their two versions of the system. If we include the height information with the Z-channel, then we call it a periphonic system (think periscope and you'll remember that there's stuff above you...). If we leave out the Z-channel and just capture and playback the horizontal plan directional information, then we call it a panphonic system (think stereo panning or panoramic).

Let's take this a step further. We begin by mathematically describing the relationship between the angle to the sound source and the sensitivity patterns (and therefore the relative outputs) of the B-format channels. Then we define the mix of each of these channels for each loudspeaker. That mix is determined by the angle of the loudspeaker in your setup. It's generally assumed that you have a circle of loudspeakers with equal apertures (meaning that they are all equally spaced around the circle). Also, notice that there is an equation to define the minimum number of loudspeakers required to accurately reproduce the Ambisonics signal. These equations are slightly different for panphonic and periphonic systems

One important thing to notice in the following equations is that, at its most complicated (meaning a periphonic system), a first-order Ambisonics system has only 4 channels of recorded information within the B-format signal. However, you can play that signal back over any number of loudspeakers. This is one of the attractive aspects of Ambisonics - unlike traditional two-channel stereo, or discrete 5.1, the number of recording channels is not defined by the number of output channels. You always have the same number of recording channels whose mix is changed according to the number of playback channels (loudspeakers).

First-order panphonic


$\displaystyle W$ $\textstyle =$ $\displaystyle P_{\Psi}$ (11.30)
$\displaystyle X$ $\textstyle =$ $\displaystyle P_{\Psi} \cos \Psi$ (11.31)
$\displaystyle Y$ $\textstyle =$ $\displaystyle P_{\Psi} \sin \Psi$ (11.32)

Where $W$, $X$ and $Y$ are the amplitudes of the three ambisonics B-format channels, $P_{\Psi}$ is the pressure of the incident sound wave and $\Psi$ is the angle to the sound source (where 0$^\circ $ is directly forward) in the horizontal plane. Notice that these are just descriptions of an omnidirectional microphone and two bidirectionals. The bidirectionals have an included angle of 90$^{\circ }$ - hence the cosine and sine (these are the same function, just 90$^{\circ }$ apart - $Y = P_{\Psi} \sin \Psi$ is shorter than writing $Y = P_{\Psi} \cos (\Psi + 90^{\circ})$).


\begin{displaymath}
P_{n} = \frac{W + 2 X \cos \varphi_{n} + 2 Y \sin \varphi_{n} }{N}
\end{displaymath} (11.33)

Where

$P_{n}$ is the amplitude of the $n^{th}$ loudspeaker, $\varphi_{n}$ is the angle of the $n^{th}$ loudspeaker in the listening room, and $N$ is the number of loudspeakers.

The decoding algorithm used here is one suggested by Vanderkooy and Lipshitz which differs from GerzonÕs original equations in that it uses a gain of 2 on the $X$ and $Y$ channels rather than the standard $\sqrt{2}$. This is due to the fact that this method omits the 1 gain from $\frac{1}{\sqrt{2}}$ the $W$ channel in the encoding process for simpler analysis [Bamford and Vanderkooy, 1995].


\begin{displaymath}
B = 2 m + 1
\end{displaymath} (11.34)

Where $B$ is the minimum number of loudspeakers required to accurately produce the panphonic ambisonics signal and $m$ is the order of the system. (So far, we have only discussed first-order Ambisonics in this book.)


next up previous contents index
Next: Practical Implementation Up: Ambisonics Previous: Ambisonics   Contents   Index
Geoff Martin 2006-10-15

Click here to purchase the entire book in PDF format.