I had an interesting email from an old recording-engineer friend of mine this week regarding a debate he had with a student concerning the issue of “depth” in recordings (in his specific case, 2-channel stereo recordings done with an ORTF mic configuration). This got me thinking about to a bunch of thoughts I had once-upon-a-time about distance perception, and a newer bunch of thoughts about loudspeaker directivity. Now, those two bunches of thoughts are congealing into a single idea regarding how to achieve (and experience) a reasonable perceived sensation of distance and depth in 2-channel stereo.
To start, some definitions:
Go to an anechoic chamber with a loudspeaker and a friend. Sit there and close your eyes and get your friend to place the loudspeaker some distance from you. Keep your eyes closed, play some sounds out of the loudspeaker and try to estimate how far away it is. You will be wrong (unless you’re VERY lucky). Why? It’s because, in real life with real sources in real spaces, distance information (in other words, the information that tells you how far away a sound source is) comes mainly from the relationship between the direct sound and the early reflections. If you get the direct sound only, then you get no distance information. Add the early reflections and you can very easily tell how far away it is. This has been proven in lots of “official” listening tests. (For example, go check out this report as a basic starting point).
Anecdote #1: Back in the old days when I was working on my Ph.D. we had an 8-loudspeaker system in the lab – one speaker every 45° in a circle around the listening position. We were trying to build a multichannel room simulator where we were building a sound field, piece by piece – the direct sound and (up to 3rd-order) early reflections had the “correct” panning, delay and gain, and we added a diffuse field to tail in behind it. One of the interesting things that I found with that system was that the simulated distance to the source was easily to achieve with just the 1st-order reflections, but that the precision of that perceived distance was increased as we added 2nd- and 3rd-order reflections. (We didn’t have enough computing power to simulate higher-order reflections at the time. It would be interesting to go back and try again to see what would happen with higher-order stuff now that my Mac has gotten a little faster…) Another interesting thing (although, in retrospect, it shouldn’t surprise anyone) was that the location and the distance to the simulated sound source were also easy to determine without the direct sound being part of the sound field at all. Just the 1st- to 3rd-order reflections by themselves were enough to tell you where things were.
Anecdote #2: I did a recording for Atma once-upon-a-time in a large church in Montreal with a very long reverb time. During the sessions, I sat in the church (no control room), about 20 m from the mic pair. So, when I and the organist discussed what take to do next, we were talking live in the same room – no talkback speakers. During the editing for this disc, I happened to be shuttling around, looking for the beginning of a take – so I’d drop the cursor somewhere on the screen and hit “play” quickly to see where I was. One of the takes ended with the organist asking “did we get it?” and I responded “yup” quickly and loudly. It just so happened that, when I was shuttling around, looking for the right take, I hit “play” at the beginning of the “yup” and then quickly hit “stop”. The interesting thing is that it sounded, for that split second, like I was right next to the microphones – not 20 m away like I knew I was. So, I hit “play” again, and this time didn’t hit stop. This time, I sounded far away. What’s going on? Well, because the church was so big, it was possible to hit the stop button before any of the first reflections came in (save maybe the one off the floor), so it was possible (with a fast enough thumb on the transport buttons of the editing machine) to make the recording of my voice anechoic. The result was that I sounded 0 m away instead of 20 m.
The moral of the stories thus far? In order to deliver a perception of precise distance and depth (even if it’s not accurate…) you need early reflections in the recording, and they have to be panned and delayed appropriately.
Step 3: The delivery
Think back to Step 1. We agreed (or at least I said…) that early reflections tell your brain how far away the sound source is. Now think to a loudspeaker in a listening room.
Case #1: If you have an anechoic room, there are no early reflections, and, regardless of how far away the loudspeakers are, a sound source in the recording without early reflections (i.e. a close-mic’ed vocal) will sound much closer to you than the loudspeakers.
Case #2: If you have a listening room with early reflections, but the loudspeakers are directional such that there is no energy being delivered to the side walls (for example, a dipole with the angles carefully chosen to point the null of the loudspeaker at the point of specular reflection from the side wall), then the result is the same as in Case 1. This time there are no early reflections because of loudspeaker directivity instead of wall absorption, but the effect at the listening position is the same.
Case #3: If you have a listening room with early reflections, and the loudspeakers are omni-directional, then the early reflections from the side walls tell you how far away the loudspeakers are. Therefore, the close-mic’ed vocal track from Case #1 cannot sound any closer than the loudspeakers – your brain is too smart to be told otherwise.
So, if you want to achieve precision in the distance and depth of your stereo recordings (whether you’re on the recording end or the playback end) you’re going to need to make sure that you have a reasonable mix of the following: