I once read a discussion about microphone placement in an Usenet Forum (it was a long time ago). Someone asked “where is the best place to position the microphones to record a french horn?” Lots of people had opinions, but the answer that I liked most was “that’s like asking ‘where is the best place to stand to take a photo of a mountain?” Of course, that answer might have been too facetious for the person asking the question, but, in my opinion, it was a good analogy. The correct answer, as always, is “it depends” – in this case, on perspective.
Take a look at the image below.
As you can read in the caption, this is a 640 x 480 black and white photo of a fishing boat off the coast of Newfoundland, near where I grew up, on a foggy day. Of course, if I didn’t tell you that, then it would be impossible to know it – but that’s because you’re looking at the “data” (the information in the pixels in the photo) from the wrong place… I’ll rotate the image a little and we’ll try again.
Figure 2 is just a rotation of Figure 1 – we’re still looking at the same photo, but from another direction. It still doesn’t look like a boat… Let’s rotate some more…
Figure 3 is looking more like something – but there’s still no boat in sight… If you come back to Figure 3 after you look at Figure 4, you’ll recognise the trees on the land, the sky, and the water – you’ll also be able to see where the boat is. But this view of the photo is just off-position enough to scramble the data into being almost unrecognisable. So, let’s rotate the view of the data one last time…
So, what was the point of this, somewhat obscure analogy? It was to try to show that, by looking at the data from only one viewpoint, or one dimension (say, Figure 1, for example), you might arrive at an incorrect interpretation of the data.
Watch this video.
In this video, Penn and Teller do the same trick twice. Both times, the trick is impressive, but for two different reasons. This is because your perspective changes. The first time, it’s just a good magic trick – or at least an old one. The second time, you’re impressed because of their skill in executing it. Two different perceptions resulting from two different perspectives.
Once-upon-a-time, I taught a course in electroacoustic measurements at McGill University. I remember one class, early in the year, where I started one day by saying “What is a ‘frequency response’?” and one of the students, with a smile on his face replied “The only thing that matters…”
I went through some old data and found a measurement of a loudspeaker. Figure 5, below, shows the magnitude response of a three-way loudspeaker, measured in free-field (therefore, no reflections or influence of the room) at a distance of three meters from the loudspeaker, on-axis to the tweeter.
This is just the kind of measurement that you’d see in a magazine… It’s also the kind of measurement that you’d use to make a “frequency response” for a data sheet. This one would read something like “<40 Hz – >20 kHz ±1 dB”, give or take.
However, let’s think about what this really is and whether it actually tells you anything at all… It’s a measurement of the relationship between input voltage to output pressure, in one place in space, at one listening level, with one type of signal (maybe a swept sine wave or an MLS signal, or something else…), at one temperature of the drivers’ voice coils, at one relative humidity level of the air (okay, okay… now I’m getting into excruciating minutæ…)
However, does this tell us anything about how the loudspeaker will sound? Well, yes. If you use it outdoors in a large field and you stand 3 m in front of it and listen to the same signal that was used to do the measurement. If, however, you stand closer, or not directly in front of it, or if you listen to music over time, or if you bring it indoors, this is just one piece of information – perhaps useful, but certainly inadequate…
Let’s look at another measurement of the same loudspeaker
As you can see, this loudspeaker’s magnitude response looks “pretty bad” – or at least “not very flat” off-axis (which implies that I just equated “flat” with “good” – which might not necessarily be correct…).
This is the magnitude response of the signal that this loudspeaker will send out the side while you’re listening to that “nice” flat direct sound. Something like this will hit the side wall and reflect back, different frequencies reflecting with different intensities according to the absorptive properties of the wall, the total distance travelled by the reflection, and the relative humidity (okay, okay …I’ll stop with the humidity references…)
As is obvious in Figure 6, this “sound” is almost completely unlike the “sound” in Figure 5 (assuming that a free-field magnitude response can be translated to “sound” – which is a stretch…)
So, just like in Example 1 and Example 2, by “looking” at the data from another direction, we get some more information that should be used to influence our opinion. The more data from the more perspectives, the better…
So, we have one measurement that shows that this loudspeaker is “flat” and therefore “good”, in some persons’ opinions. However, we have a bunch of other measurements that prove that this is not enough information. And, if we measure the same loudspeaker at a different listening level, or at a different temperature, or with a different stimulus, we’d probably get a different answer. How different the measurement is is dependent on how different the measurement conditions are.
The “punch line” is that you cannot make any assumptions about how that loudspeaker will sound based on that one measurement in Figure 5 or the “frequency response” information in its datasheet. In fact, it could just be that having that graph in your hand will be worse than having no graphs in your hand, because your eyes might tell you that this speaker should sound good, and they get into a debate with your ears, who might disagree…
So, without more information, that one plot in Figure 5 is just a plot of one parameter – or one dimension – of many. And you can’t make any conclusions based on that.
Or put another way:
An astronomer, a physicist and a mathematician are on a train in Scotland. The astronomer looks out of the window, sees a black sheep standing in a field, and remarks, “How odd. All the sheep in Scotland are black!” “No, no, no!” says the physicist. “Only some Scottish sheep are black.” The mathematician rolls his eyes at his companions’ muddled thinking and says, “In Scotland, there is at least one sheep, at least one side of which appears to be black from here some of the time.” Link
In the last posting, I talked about the effects of a bandpass filter on the probability density function (PDF) of an audio signal. This left the open issue of other filter types. So, below is the continuation of the discussion…
I made noise signals (length 2^16 samples, fs=2^16) with different PDFs, and filtered them as if I were building a three-way loudspeaker with a 4th order Linkwitz-Riley crossover (without including the compensation for the natural responses of the drivers). The crossover frequencies were 200 Hz and 2 kHz (which are just representative, arbitrary values).
So, the filter magnitude responses looked like Figure 1.
The resulting effects on the probability distribution functions are shown below. (Check the last posting for plots of the PDFs of the full-band signals – however note that I made new noise signals, so the magnitude responses won’t match directly.)
The magnitude responses shown in the plots below have been 1/3-octave smoothed – otherwise they look really noisy.
In a previous posting, I showed some plots that displayed the probability density functions (or PDF) of a number of commercial audio recordings. (If you are new to the concept of a probability density function, then you might want to at least have a look at that posting before reading further…)
I’ve been doing a little more work on this subject, with some possible implications on how to interpret those plots. Or, perhaps more specifically, with some possible implications on possible conclusions to be drawn from those plots.
To start, let’s create some noise with a desired PDF, without imposing any frequency limitations on the signal.
To do this, I’ve ported equations from “Computer Music: Synthesis, Composition, and Performance” by Charles Dodge and Thomas A. Jerse, Schirmer Books, New York (1985) to Matlab. That code is shown below in italics, in case you might want to use it. (No promises are made regarding the code quality… However, I will say that I’ve written the code to be easily understandable, rather than efficient – so don’t make fun of me.) I’ve made the length of the noise samples 2^16 because I like that number. (Actually, it’s for other reasons involving plotting the results of an FFT, and my own laziness regarding frequency scaling – but that’s my business.)
Uniform (aka Rectangular) Distribution
uniform = rand(2^16, 1);
Of course, as you can see in the plots in Figure 1, the signal is not “perfectly” rectangular, nor is it “perfectly” flat. This is because it’s noise. If I ran exactly the same code again, the result would be different, but also neither perfectly rectangular nor flat. Of course, if I ran the code repeatedly, and averaged the results, the average would become “better” and “better”.
linear_temp_1 = rand(2^16, 1);
linear_temp_2 = rand(2^16, 1);
temp_indices = find(linear_temp_1 < linear_temp_2);
linear = linear_temp_2;
linear(temp_indices) = linear_temp_1(temp_indices);
triangular = rand(2^16, 1) – rand(2^16, 1);
lambda = 1; % lambda must be greater than 0
exponential_temp = rand(2^16, 1) / lambda;
if any(exponential_temp == 0) % ensure that no values of exponential_temp are 0
error(‘Please try again…’)
exponential = -log(exponential_temp);
Bilateral Exponential Distribution (aka Laplacian)
lambda = 1; % must be greater than 0
bilex_temp = 2 * rand(2^16, 1);
% check that no values of bilex_temp are 0 or 2
if any(bilex_temp == 0)
error(‘Please try again…’)
bilex_lessthan1 = find(bilex_temp <= 1);
bilex(bilex_lessthan1, 1) = log(bilex_temp(bilex_lessthan1)) / lambda;
bilex_greaterthan1 = find(bilex_temp > 1);
bilex_temp(bilex_greaterthan1) = 2 – bilex_temp(bilex_greaterthan1);
bilex(bilex_greaterthan1, 1) = -log(bilex_temp(bilex_greaterthan1)) / lambda;
sigma = 1;
xmu = 0; % offset
n = 100; % number of random number vectors used to create final vector (more is better)
xnover = n/2;
sc = 1/sqrt(n/12);
total = sum(rand(2^16, n), 2);
gaussian = sigma * sc * (total – xnover) + xmu;
Of course, if you are using Matlab, there is an easier way to get a noise signal with a Gaussian PDF, and that is to use the randn() function.
The effects of band-passing the signals
What happens to the probability distribution of the signals if we band-limit them? For example, let’s take the signals that were plotted above, and put them through two sets of two second-order Butterworth filters in series, one set producing a high-pass filter at 200 Hz and the other resulting in a low-pass filter at 2 kHz .(This is the same as if we were making a mid-range signal in a 4th-order Linkwitz-Riley crossover, assuming that our midrange drivers had flat magnitude responses far beyond our crossover frequencies, and therefore required no correction in the crossover…)
What happens to our PDF’s as a result of the band limiting? Let’s see…
So, what we can see in Figures 7 through 12 (inclusive) is that, regardless of the original PDF of the signal, if you band-limit it, the result has a Gaussian distribution.
And yes, I tried other bandwidths and filter slopes. The result, generally speaking, is the same.
One part of this effect is a little obvious. The high-pass filter (in this case, at 200 Hz) removes the DC component, which makes all of the PDF’s symmetrical around the 0 line.
However, the “punch line” is that, regardless of the distribution of the signal coming into your system (and that can be quite different from song to song as I showed in this posting) the PDF of the signal after band-limiting (say, being sent to your loudspeaker drivers) will be Gaussian-ish.
And, before you ask, “what if you had only put in a high-pass or a low-pass filter?” – that answer is coming in a later posting…
If you have a bunch of audio devices in a chain (say, a CD player connected to a preamplifier connected to a power amplifier connected to a loudspeaker) then one of the simplest things that you can do to improve or optimise your audio quality is to look after the gain of the signal through the system. It’s also free – and getting a lot for free is always a good thing…
Let’s start by taking a simple view of one device – a piece of audio gear. It doesn’t matter what the gear is – it could be an MP3 player, it could be a giant mixing console. What we’ll do is just look at the output of this device as it tries to play an audio signal with a varying level.
Let’s use a very simple example of a sine wave as our audio signal; we’ll look at the output of the Audio Device as we increase the level of our sine wave from very quiet to very loud.
This screen shot shown in Figure 2, by itself, is not that interesting. Let’s zoom in to the three points on the plot to see what’s going on.
Figure 3 shows a zoomed-in view of point “A” in Figure 2. Notice that you cannot see a sine wave in that signal – it’s just noise. This is the noise that is naturally generated by the device for some reason. This may be natural noise in the analogue chain – caused by thermal movement of electrons in resistors, amplified by the device itself. It may be intentional noise like dither which is added to the signal to randomise errors in a digital audio chain. Or, it may be something else entirely…
But be careful not to jump to conclusions… Just because you can’t see a sine wave there doesn’t mean that you won’t be able to hear it. As the level of the sine wave is increased, we’ll be able to hear it along with the noise before we’ll be able to see it on the screen.
In this case, we have a very low “signal-to-noise ratio”. In other words, the level of the signal (the sine wave) divided by (because it’s a ratio) the level of the noise gives us a low number. Or, in normal English – the sine wave is “drowned out” by the noise.
Figure 4 shows a nice, clean-looking sine wave coming out of our audio device. It’s what’s going on at point “B” in Figure 2. We’ve zoomed in so much that you can’t see the increase in level over time – but trust me, it’s happening there.
The noise is still there, “riding the wave” of the sine tone. In fact, if we were to zoom in on the sine wave in that figure, we’d see the same kind of noise that we saw in Figure 3 – like little ripples on big ocean waves. Now, however, the sine wave is much louder than the noise – so we have a reasonably high “signal-to-noise ratio”. In other words, the level of the signal (the sine wave) divided by (because it’s a ratio) the level of the noise gives us a high number. Or, in normal English – the sine wave “drowns out” the noise.
Figure 5 shows what’s happening at point “C” in Figure 2. Notice that this doesn’t really look like a sine wave any more – the top and bottom has been chopped off or “clipped”. This has happened because we are trying to make our audio device have an output level that is beyond its abilities. As the sine wave increases, the audio device follows along, until its output can go no higher, so it stops and holds that output level until the sine wave comes back down.
At the point, the noise is still very much lower in level than the signal – but we have caused a problem – the input is a sine wave, but the output is not. In other words, we have distorted the shape of the audio signal.
Note that distortion of an audio signal can take an infinite number of forms. The example here is symmetrical clipping of the signal – which is what many people mean when they say “distorted” – but don’t be fooled… “Distortion” means a whole lot more than this.
So, there’s a moral to the story-thus-far: every audio device has an upper and lower limit for audio level. (Yes, even a wire has a lower limit set by thermal noise in the electrons it contains and an upper limit set by the amount of current it will pass through before melting.) That range of dynamics or dynamic range is (hopefully) big – in other words, the noise floor (the quietest sound) should be MUCH MUCH quieter than a just-clipped signal (the loudest sound). Because this difference is so big, we’ll measure it in decibels (for kind of the same reason it doesn’t make sense to measure the speed of a car in millimetres per year, or the area of Canada in square micrometres.)
We can also represent these two numbers (the level of the noise floor and the level of a just-clipped signal) as two values relative to each other. Let’s say, for the purposes of keeping the numbers pretty, that we have an audio device that just so happens to have a level of noise floor that it 100 dB below the level of a signal that just starts to clip at its output.
Figure 6 shows one way to represent this. The red vertical rectangle on the left shows the range of audio levels that is possible to achieve with “Device #1”. It has a noise floor of 10 µV and will clip at 1 V – therefore it has a total dynamic range of 100 dB. Since, in this example, Device #1 is the only device in our audio system, the dynamic range of the entire system is also 100 dB (shown as the ride rectangle on the right) – since the entire system consists of just one device.
What happens if we add another device in our chain? Let’s say, for example, that we put a second device in the system after Device #1. Let’s also say that Device #2 can play louder signals than Device #1 – and it has a lower noise floor, as is shown in Figure 7.
There are three things to notice in Figure 7:
- Device #2 can play louder than Device #1
- Device #2 has a lower noise floor than Device #1
- Therefore Device #2 has a wider dynamic range than Device #1
- The dynamic range of the total system is set by Device #1, since it is not limited by Device #2.
However, we should be careful here. The fact that Device #2 has a wider dynamic range than Device #1 does not automatically mean that the total system has a dynamic range that is defined by the “weakest link” (Device #1). Look at Figure 8, for example.
In Figure 8, we have not changed the devices – Device #2 still has a 120 dB dynamic range – but the Total System has a dynamic range that is reduced to 90 dB because of the alignment of levels in the system. Now, the noise floor of the system comes from Device #2 because we have not been careful about setting up the alignment of the levels of the devices.
Another way to think of this is that Device #2 is set up with the expectation that it will go much louder – but it doesn’t because of the limitations of Device #1. Because of that incorrect setup, the noise that you hear at the output of the system comes from Device #2.
An example of a system like the one shown in Figure 8 is when you connect a low-end audio device’s output (say, the headphone jack of your computer or phone) to a better device that is built to handle a much higher input level. The possible result is that the “headroom” (the amount by which the better device can handle higher level signals) is wasted (since the lower-quality device doesn’t deliver those high levels) and the total system has a degraded dynamic range.
So, the moral of the story here so far is that you should always try to ensure that your system’s dynamic range is not limited by the way it’s connected.
For example, if you have a system that has an adjustable input sensitivity, you should set it so that the input is not expecting more level than the device that’s feeding it can deliver. If your output device can only deliver 2 V RMS maximum, it my not be helpful for the thing it’s connected to to be “expecting” to see 4 V coming from it. If this is the way things are setup, then you might be “throwing away” 6 dB of dynamic range (because 4 V is 6 dB louder than 2 V).
Generally, there are two good “rules of thumb” that can help you here.
The first one is to try to align all your maximum levels as much as possible. So, as in the last example above, if your source device has a maximum output of 2.0 V RMS, set the input sensitivity of your next device to “expect” 2.0 V RMS maximum. This will make the tops of the red rectangles all align, and the dynamic range will be defined by the worst link in the chain instead of the way the devices are connected.
The second rule of thumb is to put as much gain as possible at the beginning of the chain. This is particularly true if you’re working in a recording studio. This is because every piece of gear contributes noise to the audio signal. If you put all the gain at the end of the chain, then you are making the signal louder, but you’re also making all of the noise from all of the gear “upstream” louder as well. If you put all the gain at the beginning of the chain, then you might wind up in a situation where you have to turn DOWN the signal through the chain, those reducing your signal to the correct level, and bringing the noise floor down with it. (Two obvious examples of this are using lots of gain at your mic preamp in a recording studio, or getting a RIAA preamp with a healthy output level for your turntable…) Another good example of this is the case where you have a headphone output from your phone connected to the aux input of a small stereo system. You want to turn up the phone as much as possible, and turn down the stereo volume. If you do the opposite, you’ll be using the stereo system to turn up the noise output of your phone.
One last thing: connecting devices digitally will probably help with your dynamic range, however, this is not necessarily always true. You certainly cannot make an automatic conclusion that a digital connection is better in all respects than an analogue one – or vice versa. For example, in some cases, the errors in a sampling rate converter at a digital input stage may result in a higher level of “noise” floor than the analogue noise caused by an analogue-to-digital converter on the same device. Or, it might be that these two inputs have the same measurable noise floor, but those two noises have very different characteristics. Typically analogue noise is program independent – meaning it’s unrelated to the signal – whereas poorly-implemented digital transmission and processing typically results in program-dependent errors. These can be interpreted by the listener as being part of the signal (more like distortion artefacts than noise) and therefore will be different for different signals. To make things even more confusing, different digital inputs on the same device (e.g. Optical, S/P-DIF, and HDMI) may (or may not) behave differently – so any decisions you make about one of them may (or may not) be applicable to the others…