Phase vs Polarity
I know that language evolves. I know that a dictionary is a record of how we use words; not an arbiter of how words should be used. However, I also believe very firmly that if you don’t use words correctly, then you won’t be saying what you mean, and therefore you can be misconstrued.
One of the more common phrases that you’ll hear audio people use is “out of phase” when they mean “180º out of phase” or possibly even “opposite polarity”. I recently heard someone I work with say “out of phase” and I corrected them and said “you mean ‘opposite polarity'” and so a discussion began around the question of whether “180º out of phase” and “opposite polarity” can possibly result in two different things, or whether they’re interchangeable.
Let’s start by talking about what “phase” is. When you look at a sine wave, you’re essentially looking at a two-dimensional view of a three-dimensional shape. I’ve talked about this a lot in two other postings: this one and this one. However, the short form goes something like “Look at a coil spring from the side and it will look like a sine wave.” A coil is a two-dimensional circle that has been stretched in the third dimension so that when you rotate 360º, you wind up back where you started in the first two dimensions, but not the third. When you look at that coil from the side, the circular rotation (say, in degrees) looks like a change in height.
Notice in the two photos above how the rotation of the circle, when viewed from the side, looks only like a change in height related to the rotation in degrees.
The figure above is a classic representation of a sine wave with a peak amplitude of 1, and as you can see there, it’s essentially the same as the photo of the Slinky. In fact, you get used to seeing sine waves as springs-viewed-from-the-side if you force yourself to think of it that way.
Now let’s look at the same sine wave, but we’ll start at a different place in the rotation.
The figure above shows a sine wave whose rotation has been delayed by some number of degrees (22.5º, to be precisely accurate).
If I delay the start of the sine wave by 180 degrees instead, it looks like Figure 5..
However, if I take the sine wave and multiply each value by -1 (inverting the polarity) then it looks like this:
As you can probably see, the plots in Figure 5 and 6 are identical. Therefore, in the case of a sine wave, shifting the phase of the signal by 180 degrees has the same result at inverting the polarity.
What happens when you have a signal that is the sum of multiple sine waves? Let’s look at a simple example below.
The top plot above shows two sine waves, one with a frequency of three times the other, and with 1/3 the amplitude. If I add these two together, the result is the red curve in the lower plot. There are two ways to think of this addition: You can add each amplitude, degree by degree to get the red curve. You can also think of the slopes adding. At the 180º mark, the two downward-going slopes of the two sine waves cause the steeper slope in the red curve.
If we shift the phase of each of the two sine wave components, then the result looks like the plots below.
As you can see in the plots above, shifting the phases of the sine waves is the same as inverting their polarities, and so the resulting total sum (the red curve) is the same as if we had inverted the polarity of the previous total sum.
So, so far, we can conclude that shifting the phase by 180º gives the same result as inverting the polarity.
In the April, 1946 edition of Wireless World magazine, C.E. Cooper wrote an article called “Phase Relationships: ‘180 Degrees Out of Phase’ or ‘Reversed Polarity’?” (I’m not the first one to have this debate…) In this article, it’s states that there is a difference between “phase” and “polarity” with the example shown below.
There is a problem with the illustration in Figure 9, which is the fact that you cannot say that the middle plot has been shifted in phase by 180 degrees because that waveform doesn’t have a “phase”. If you decomposed it to its constituent sines/cosines and shifted each of those by 180º, then the result would look like (c) instead of (b). Instead, this signal has had a delay of 1/2 of a period applied to it – which is a different thing, since it’s delaying in time instead of shifting in phase.
However, there is a hint here of a correct answer… If we think of the black and blue sine waves in the 2-part plots above as sine waves with frequencies 1 Hz and 3 Hz, we can add another “sine wave” with a frequency of 0 Hz, or DC, as shown in Figure 10, below.
In the plot above, the top plot has a DC component (the blue line) that is added to the sine component (the black curve) resulting in a sine wave with a DC offset (the red curve).
If we invert the polarity of this signal, then the result is as shown in Figure 11.
However, if we delay the components by 180º, the result is different, as shown in Figure 12:
The hint from the 1946 article was the addition of a DC offset to the signal. If we think of that as a sine wave with a frequency of 0 Hz, then it can be “phase-shifted” by 180º which results in the same value instead of inverting polarity.
However, to be fair, most of the time, shifting the phase by 180º gives the same result as inverting the polarity. However, I still don’t like it when people say “flip the phase”…
Volume controls vs. Output levels
#92 in a series of articles about the technology behind Bang & Olufsen
One question people often ask about B&O loudspeakers is something like ”Why doesn’t the volume control work above 50%?”.
This is usually asked by someone using a small loudspeaker listening to pop music.
There are two reasons for this, related to the facts that there is such a wide range of capabilities in different Bang & Olufsen loudspeakers AND you can use them together in a surround or multiroom system. In other words for example, a Beolab 90 is capable of playing much, much more loudly than a Beolab 12; but they still have to play together.
Let’s use the example of a Beolab 90 and a Beolab 12, both playing in a surround configuration or a multiroom setup. In both cases, if the volume control is set to a low enough level, then these two types of loudspeakers should play at the same output level. This is true for quiet recordings (shown on the left in the figure below) and louder recordings (shown on the right).
However, if you turn up the volume control, you will reach an output level that exceeds the capability of the Beolab 12 for the loud song (but not for the quiet song), shown in the figure below. At this point, for the loud song, the Beolab 12 has already begun to protect itself.
Once a B&O loudspeaker starts protecting itself, no matter how much more you turn it up, it will turn itself down by the same amount; so it won’t get louder. If it did get louder, it would either distort the sound or stop working – or distort the sound and then stop working.
If you ONLY own Beolab 12s and you ONLY listen to loud songs (e.g. pop and rock) then you might ask “why should I be able to turn up the volume higher than this?”.
The first answer is “because you might also own Beolab 90s” which can go louder, as you can see in the right hand side of the figure above.
The second answer is that you might want to listen to quieter recording (like a violin solo or a podcast). In this case, you haven’t reached the maximum output of even the Beolab 12 yet, as you can see in the left hand side of the figure above. So, you should be able to increase the volume setting to make even the quiet recording reach the limits of the less-capable loudspeaker, as shown below.
Notice, however, that at this high volume setting, both the quiet recording and the loud recording have the same output level on the Beolab 12.
So, the volume allows you to push the output higher; either because you might also own more capable loudspeakers (maybe not today – but some day) OR because you’re playing a quiet recording and you want to hear it over the sound of the exhaust fan above your stove or the noise from your shower.
It’s also good to remember that the volume control isn’t an indicator of how loud the output should be. It’s an indicator of how much quieter or louder you’re making the input signal.
The volume control is more like how far down you’re pushing the accelerator in your car – not the indication of the speedometer. If you push down the accelerator 50% of the way, your actual speed is dependent on many things like what gear you’re in, whether you’re going uphill or downhill, and whether you’re towing a heavy trailer. Similarly Metallica at volume step 70 will be much louder than a solo violin recording at the same volume step, unless you are playing it through a loudspeaker that reached its maximum possible output at volume step 50, in which case the Metallica and the violin might be the same level.
Note 1: For all of the above, I’ve said “quiet song” and “loud song” or “quiet recording” and “loud recording” – but I could just have easily as said “quiet part of the song” and “loud part of the song”. The issue is not just related to mastering levels (the overall level of the recording) but the dynamic range (the “distance” between the quietest and the loudest moment of a recording).
Note 2: I’ve written a longer, more detailed explanation of this in Posting #81: Turn it down half-way.
I found this at a flea market yesterday and I couldn’t resist buying it. It’s a Sharp EL-805M “pocket” calculator that was released for sale in 1973 and discontinued in 1974.
This would have been a time when a Liquid Crystal display was a feature worth advertising on the front panel of the calculator (since this was the first calculator with an LCD).
Sharp was one of the pioneers of calculators using the DSM (Dynamic Scattering Mode) LCD (Liquid Crystal Display). These DSM LCDs have the now unusual feature of silver-like reflective digits on a dark background, rather than the now common black digits on a light background.http://www.vintagecalculators.com/html/facit_1106-sharp_el-805s.html
It was also from a time when instructions were included on how to use it. Notice the instructions for calculating 25 x 36, for example…
Undoubtably, the best 20 DKK I spent all weekend, given that the original price in 1973 was 110 USD.
For a peek inside, this site has some good shots, but it seems that it proves to be a challenge for automatic translators. There’s also a good history here.
Variations on the Goldberg Variations
As part of a listening session today, I put together a playlist to compare piano recordings. I decided that an interesting way to do this was to use the same piece of music, recorded by different artists on different instruments in different rooms by different engineers using different microphone and techniques. The only constant was the notes on the page in front of the performer.
A link to the playlist is here: LINK TO TIDAL
Playing through this, it’s interesting to pay attention to things like:
- Overall level of the recording
- Notice how much (typically) quieter the Dolby Atmos-encoded recording is than the 2.0 PCM encoded ones. However, there’s a large variation amongst the 2.0 recordings.
- Monophonic vs. stereo recordings
- Perceived width of the piano
- Perceived width of the room
- How enveloping the room is (this might be different from the perceived width, but these two attributes can be co-related, possibly even correlated)
- Perceived distance to the piano.
- On some of the recordings, the piano appears to be close. The attack of each note is quite fast, and there is not much reveberation.
- On some of the recordings, the piano appears to be distant – more reveberant, with a soft, slow attack on each note.
- On other recordings, it may appear that the piano is both near (because of the fast attack on each hammer-to-string strike) and far (because of the reverberation). (Probably achieved by using a combination of microphones at different distances – or using digital reverb…)
- The length of the reverberation time
- Whether the piano is presented as one instrument or a collection of strings (e.g. can you hear different directions to (or locations of) individual notes?)
- If the piano is presented as a wide source with separation between bass and treble, is the presentation from the pianist’s perspective (bass on the left, treble on the right) or the audience’s perspective (bass on the left, treble on the right… sort of…)
32 is a lot of bits…
Once upon a time, I did a blog posting about why, when we test digital audio systems, we typically use a 997 Hz sine wave instead of a 1000 Hz tone.
The short version of this is the following:
Let’s say that I digitally create a (not-dithered) 1000 Hz sine wave at 0 dB FS in a 16-bit system running at 48 kHz. This means that every second, there are exactly 1000 cycles of the wave, and since there are 48,000 samples per second, this, in turn means that there is one cycle every 48 samples, so sample #49 is identical to sample #1.
So, we are only testing 48 of the possible 2^16 ( = 65,536) quantisation values, right?
Wrong. It’s worse than you think.
If we zoom in a little more, we can see that Sample #1 = 0 (because it’s a sine wave). Sample #25 is also equal to 0 (because 48,000 / 1,000 is a nice number that is divisible by 2).
Unfortunately, 48,000 / 1,000 is a nice number that is also divisible by 4. So what? This means that when the sine wave goes up from 0 to maximum, it hits exactly the same quantisation values as it does on the way from maximum back down to 0. For example, in the figure below, the values of the two samples shown in red are identical. This is true for all symmetrical points in the positive side and the negative side of the wave.
Jumping ahead, this means that, if we make a “perfect” 1 kHz sine wave at 48 kHz (regardless of how many bits in the system) we only test a total of 25 quantisation steps. 0, 12 positive steps, and 12 negative ones.
Not much of a test – we only hit 25 out of a possible 65,546 values in a 16-bit system (or 25 out of 16,777,216 possible values in a 24-bit system).
What if I wanted to make a signal that tested ALL possible quantisation values in an LPCM system? One way to do this is to simply make a linear ramp that goes from the lowest possible value up to the highest possible value, step by step, sample by sample. (of course, there are other ways, but it doesn’t matter… we’re just trying to hit every possible quantisation value…)
How long would it take to play that test signal?
First we convert the number of bits to the number of quantisation steps. This is done using the equation 2^bits. So, you get the following results
|Number of Bits||Number of Quantisation Steps|
If the value of each sample has a different quantisation value, and we play the file at the sampling rate then we can calculate the time it will take by dividing the number of quantisation steps by the sampling rate. This results in the following:
|Sampling Rate (kHz)||16 Bits||24 Bits||32 Bits|
|44.1||1.5 seconds||6.4 minutes||27.1 hours|
|48||1.4 seconds||5.8 minutes||24.9 hours|
|88.2||0.7 seconds||3.2 minutes||13.5 hours|
|96||0.7 seconds||2.9 minutes||12.4 hours|
|176.4||0.4 seconds||1.6 minutes||6.8 hours|
|192||0.3 seconds||1.5 minutes||6.2 hours|
|352.8||0.2 seconds||47.6 seconds||3.4 hours|
|384||0.2 seconds||43.7 seconds||3.1 hours|
|705.6||0.1 seconds||23.8 seconds||1.7 hours|
|768||0.1 seconds||21.8 seconds||1.6 hours|
So, the moral of the story is, if you’re testing the validity of a quantiser in a 32-bit fixed-point system, and you’re not able to do it off-line (meaning that you’re locked to a clock running at the correct sampling rate) you’d either (1) hope that it’s also a crazy-high sampling rate or (2) that you’re getting paid by the hour.
Why I am thinking about this?
I often get asked for my opinion about audio players; these days, network streamers especially, since they’re in style.
Let’s say, for example, that someone asked me to recommend a network streamer for use with their system. In order to recommend this, I need to measure it to make sure it behaves.
One of the tests I’m going to run is to ensure that every sample value on a file is accurately output from the device. Let’s also make it simple and say that the device has a digital output, and I only need to test 3 LPCM audio file formats (WAV, AIFF and FLAC – since those can be relied to give a bit-for-bit match from file to output). (We’ll also pretend that the digital output can support a 32-bit audio word…)
So, to run this test, I’m going to
- create test files that I described above (checking every quantisation value at all three bit depths and all 10 sampling rates)
- play them
- record them
- and then compare whether I have a bit-for-bit match from input (the original file) to the output
If you add up all the values in the table above for the 10 sampling rates and the three bit depths, then you get to a total of 4.2 DAYS of play time (playing audio constantly 24 hours a day) per file format.
So, say I wanted to test three file formats for all of the sampling rates and bit depths, then I’m looking at playing & recording 12.6 days of audio – and then I can start the analysis.
Of course this is silly… I’m not going to test a 32-bit, 44.1 kHz file… In fact, if I don’t bother with the 32-bit values at all, then my time per file format drops from 4.2 days down to 23.7 minutes of play time, which is a lot more feasible, but less interesting if I’m getting paid by the hour.
However, it was fun to calculate – and it just goes to show how big a number 2^32 is…
B&O Pickup stylus comparison
Below are four photos taken with the same magnification.
The top two photos are a Bang & Olufsen SP2 pickup, compatible with the 25º tonearm on a Type 42 “Stereopladespiller”.
The bottom two are a rather dirty Bang & Olufsen MMC 1/2 pickup, compatible with a range of turntables including the Beogram 4500, for example.
The yellow grid lines have a 0.50 mm spacing.
Microprocessors in B&O consumer products
A blog for anyone interested in some geeky history about Bang & Olufsen products.
What is a “virtual” loudspeaker? Part 3
#91.3 in a series of articles about the technology behind Bang & Olufsen
In Part 1 of this series, I talked about how a binaural audio signal can (hypothetically, with HRTFs that match your personal ones) be used to simulate the sound of a source (like a loudspeaker, for example) in space. However, to work, you have to make sure that the left and right ears get completely isolated signals (using earphones, for example).
In Part 2, I showed how, with enough processing power, a large amount of luck (using HRTFs that match your personal ones PLUS the promise that you’re in exactly the correct location), and a room that has no walls, floor or ceiling, you can get a pair of loudspeakers to behave like a pair of headphones using crosstalk cancellation.
There’s not much left to do to create a virtual loudspeaker. All we need to do is to:
- Take the signal that should be sent to a right surround loudspeaker (for example) and filter it using the HRTFs that correspond to a sound source in the location that this loudspeaker would be in. REMEMBER that this signal has to get to your two ears since you would have used your two ears to hear an actual loudspeaker in that location.
- Send those two signals through a crosstalk cancellation processing system that causes your two loudspeakers to behave more like a pair of headphones.
One nice thing about this system is that the crosstalk cancellation is only there to ensure that the actual loudspeakers behave more like headphones. So, if you want to create more virtual channels, you don’t need to duplicate the crosstalk cancellation processor. You only need to create the binaurally-processed versions of each input signal and mix those together before sending the total result to the crosstalk cancellation processor, as shown below.
This is good because it saves on processing power.
So, there are some important things to realise after having read this series:
- All “virtual” loudspeakers’ signals are actually produced by the left and right loudspeakers in the system. In the case of the Beosound Theatre, these are the Left and Right Front-firing outputs.
- Any single virtual loudspeaker (for example, the Left Surround) requires BOTH output channels to produce sound.
- If the delays (aka Speaker Distance) and gains (aka Speaker Levels) of the REAL outputs are incorrect at the listening position, then the crosstalk cancellation will not work and the virtual loudspeaker simulation system won’t work. How badly is doesn’t work depends on how wrong the delays and gains are.
- The virtual loudspeaker effect will be experienced differently by different persons because it’s depending on how closely your actual personal HRTFs match those predicted in the processor. So, don’t get into fights with your friends on the sofa about where you hear the helicopter…
- The listening room’s acoustical behaviour will also have an effect on the crosstalk cancellation. For example, strong early reflections will “infect” the signals at the listening position and may/will cause the cancellation to not work as well. So, the results will vary not only with changes in rooms but also speaker locations.
Finally, it’s worth nothing that, in the specific case of the Beosound Theatre, by setting the Speaker Distances and Speaker Levels for the Left and Right Front-firing outputs for your listening position, then you have automatically calibrated the virtual outputs. This is because the Speaker Distances and Speaker Levels are compensations for the ACTUAL outputs of the system, which are the ones producing the signal that simulate the virtual loudspeakers. This is the reason why the four virtual loudspeakers do not have individual Speaker Distances and Speaker Levels. If they did, they would have to be identical to the Left and Right Front-firing outputs’ values.
What is a “virtual” loudspeaker? Part 2
#91.2 in a series of articles about the technology behind Bang & Olufsen
In Part 1, I talked at how a binaural recording is made, and I also mentioned that the spatial effects may or may not work well for you for a number of different reasons.
Let’s go back to the free field with a single “perfect” microphone to measure what’s happening, but this time, we’ll send sound out of two identical “perfect” loudspeakers. The distances from the loudspeakers to the microphone are identical. The only difference in this hypothetical world is that the two loudspeakers are in different positions (measuring as a rotational angle) as shown in Figure 1.
In this example, because everything is perfect, and the space is a free field, then output of the microphone will be the sum of the outputs of the two loudspeakers. (In the same way that if your dog and your cat are both asking for dinner simultaneously, you’ll hear dog+cat and have to decide which is more annoying and therefore gets fed first…)
IF the system is perfect as I described above, then we can play some tricks that could be useful. For example, since the output of the microphone is the sum of the outputs of the two loudspeakers, what happens if the output of one loudspeaker is identical to the other loudspeaker, but reversed in polarity?
In this example, we’re manipulating the signals so that, when they add together, you nothing at the output. This is because, at any moment in time, the value of Loudspeaker 2’s output is the value of Loudspeaker 1’s output * -1. So, in other words, we’re just subtracting the signal from itself at the microphone and we get something called “perfect cancellation” because the two signals cancel each other at all times.
Of course, if anything changes, then this perfect cancellation won’t work. For example, if one of the loudspeakers moves a little farther away than the other, then the system is broken, as shown below.
Again, everything that I’ve said above only works when everything is perfect, and the loudspeakers and the microphone are in a free field; so there are no reflections coming in and ruining everything.
We can now combine these two concepts:
- using binaural signals to simulate a sound source in a location (although this would normally be done using playback over earphones to keep it simple) and
- using signals from loudspeakers to cancel each other at some location in space as a
to create a system for making virtual loudspeakers.
Let’s suspend our adherence to reality and continue with this hypothetical world where everything works as we want… We’ll replace the microphone with a person and consider what happens. To start, let’s just think about the output of the left loudspeaker.
If we plot the impulse responses at the two ears (the “click” sound from the loudspeaker after it’s been modified by the HRTFs for that loudspeaker location), they’ll look like this:
What if were were able to send a signal out of the right loudspeaker so that it cancels the signal from the left loudspeaker at the location of the right eardrum?
Unfortunately, this is not quite as easy as it sounds, since the HRTF of the right loudspeaker at the right ear is also in the picture, so we have to be a bit clever about this.
So, in order for this to work we:
- Send a signal out of the left loudspeaker.
We know that this will get to the right eardrum after it’s been messed up by the HRTF. This is what we want to cancel…
- …so we take that same signal, and
- filter it with the inverse of the HRTF of the right loudspeaker
(to undo the effects of the HRTF of the right loudspeaker’s signal at the right ear)
- filter that with the HRTF of the left loudspeaker at the right ear
(to match the filtering that’s done by your head and pinna)
- multiply by -1
(so that it will cancel when everything comes together at your right eardrum)
- and send it out the right loudspeaker.
- filter it with the inverse of the HRTF of the right loudspeaker
Hypothetically, that signal (from the right loudspeaker) will reach your right eardrum at the same time as the unprocessed signal from the left loudspeaker and the two will cancel each other, just like the simple example shown in Figure 3. This effect is called crosstalk cancellation, because we use the signal from one loudspeaker to cancel the sound from the other loudspeaker that crosses to the wrong side of your head.
This then means that we have started to build a system where the output of the left loudspeaker is heard ONLY in your left ear. Of course, it’s not perfect because that cancellation signal that I sent out of the right loudspeaker gets to the left ear a little later, so we have to cancel the cancellation signal using the left loudspeaker, and back and forth forever.
If, at the same time, we’re doing the same thing for the other channel, then we’ve built a system where you have the left loudspeaker’s signal in the left ear and the right loudspeaker’s signal in the right ear; just like a pair of headphones!
However, if you get any of these elements wrong, the system will start to under-perform. For example, if the HRTFs that I use to predict your HRTFs are incorrect, then it won’t work as well. Or, if things aren’t time-aligned correctly (because you moved) then the cancellation won’t work.