I spent some time this week helping to track down the source of an error in a digital audio signal flow chain, and we wound up having a discussion that I thought might be worth repeating here.
Let’s start at the very beginning.
Let’s take an analogue audio signal and convert it to a Linear Pulse Code Modulation (LPCM) representation in the dumbest possible way.
In order to save this signal as a string of numerical values, we have to first accept the fact that we don’t have an infinite number of numbers to use. So, we have to round off the signal to the nearest usable value or “quantisation value”. This process of rounding the value is called “quantisation”.
Let’s say for now that our available quantisation values are the ones shown on the grid. If we then take our original sine wave and round it to those values, we get the result shown below.
Of course, I’m leaving out a lot of important details here like anti-aliasing filtering and dither (I said that we were going to be dumb…) but those things don’t matter for this discussion.
So far so good. However, we have to be a bit more specific: an LPCM system encodes the values using binary representations of the values. So, a quantisation value of “0.25”, as shown above isn’t helpful. So, let’s make a “baby” LPCM system with only 3 bits (meaning that we have three Binary digITs available to represent our values).
To start, let’s count using a 3-bit system:
0 x 4 +
0 x 2 +
0 x 1
0 x 4 +
0 x 2 +
1 x 1
0 x 4 +
1 x 2 +
0 x 1
0 x 4 +
1 x 2 +
1 x 1
1 x 4 +
0 x 2 +
0 x 1
1 x 4 +
0 x 2 +
1 x 1
1 x 4 +
1 x 2 +
0 x 1
1 x 4 +
1 x 2 +
1 x 1
Table 1: The 8 numbers that can be represented using a 3-bit binary representation
and that’s as far as we can go before needing 4 bits. However, for now, that’s enough.
Take a look at our signal. It ranges from -1 to 1 and 0 is in the middle. So, if we say that the “0” in our original signal is encoded as “000” in our 3-bit system, then we just count upwards from there as follows:
Now what? Well, let’s look at this a little differently. If we were to divide a circle into the same number of quantisation values, make the “12:00” position = 000, and count clockwise, it would look like this:
The question now is “how do we number the negative values?” but the answer is already in the circle shown above… If I make it a little more obvious, then the answer is shown below.
If we use the convention shown above, and represent that on the graph of our audio signal, then it looks like this:
One nice thing about this way of doing things is that you just need to look at the first digit in the binary word to know whether the value is positive or negative. A 0 means it’s positive, and a 1 means it’s negative.
However, there are two issues here that we need to sort out… The first is that, since we have an even number of values, but an odd number of quantisation steps (4 above zero, 4 below zero, and zero = 9 steps) then we had to do something asymmetrical. As you can see in the plot above, there are no numbers assigned to the top quantisation value, which actually means that it doesn’t exist.
So, if we’re still being dumb, then the result of our quantisation will either look like this:
But what happens when you make two mistakes simultaneously? Let’s go back and look at an earlier plot.
Let’s say that you’re writing some DSP code, and you forget about the asymmetry problem, so you scale things so they’ll TRY to look like the plot above.
However, as we already know, that top quantisation value doesn’t exist – but the code will try to put something there. If you’ve forgotten about this, then the system will THINK that you want this:
As you can see there, your code (because you’ve forgotten to write an IF-THEN statement) will think that the top-most positive quantisation value is just the number after 011, which is 100. However, that value means something totally different… So, the result coming out will ACTUALLY look like this:
As you can see there, the signal is very different from what we think it should be.
This error is called a “wrapping” error, because the signal is “wrapped” too far around the circle shown in Figure 5, shown above. It sounds very bad – much worse than “normal” clipping (as shown in Figure 7) because of that huge nearly-instantaneous transition from maximum positive to maximum negative and back.
Of course, the wrapping can also happen in the opposite direction; a negatively-clipped signal can wrap around and show up at the top of the positive values. The reason is the same because the values are trying to go around the same circle.
As I said: this is actually the result of two problems that both have to occur in the same system:
The signal has to be trying to get to a level that is beyond the limits of the quantisation values
Someone forgot to write a line of code that makes sure that, when that happens, the signal is “just” clipped and not wrapped.
So, if the second of these issues is sitting there, unresolved, but the signal never exceeds the limits, then you’ll never have a problem. However, I will never need the airbags in my car, unless I have an accident. So, it’s best to remember to look after that second issue… just in case.
This method of encoding the quantisation values is called the “Two’s Complement” method. If you want to know more about it, read this.
The magnitude response* of any audio device is a measure of how much its output level deviates from the expected level at different frequencies. In a turntable, this can be measured in different ways.
Usually, the magnitude response is measured from a standard test disc with a sine wave sweep ranging from at least 20 Hz to at least 20 kHz. The output level of this signal is recorded at the output of the device, and the level is analysed to determine how much it differs from the expected output. Consequently, the measurement includes all components in the audio path from the stylus tip, through the RIAA preamplifier (if one is built into the turntable), to the line-level outputs.
Because all of these components are in the signal path, there is no way of knowing immediately whether deviations from the expected response are caused by the stylus, the preamplifier, or something else in the chain.
It’s also worth noting that a typical standard test disc (JVC TRS-1007 is a good example) will not have a constant output level, which you might expect if you’re used to measuring other audio devices. Usually, the swept sine signal has a constant amplitude in the low frequency bands (typically, below 1 kHz) and a constant modulation velocity in the high frequencies. This is to avoid over-modulation in the low end, and burning out the cutter head during mastering in the high end.
* This is the correct term for what is typically called the “frequency response”. The difference is that a magnitude response only shows output level vs. frequency, whereas the frequency response would include both level and phase information.
In theory, an audio playback device only outputs the audio signal that is on the recording without any extra contributions. In practice, however, every audio device adds signals to the output for various reasons. As was discussed above, in the specific case of a turntable, the audio signal is initially generated by very small movements of the stylus in the record groove. Therefore, in order for it to work at all, the system must be sensitive to very small movements in general. This means that any additional movement can (and probably will) be converted to an audio signal that is added to the recording.
This unwanted extraneous movement, and therefore signal, is usually the result of very low-frequency vibrations that come from various sources. These can include things like mechanical vibrations of the entire turntable transmitted through the table from the floor, vibrations in the system caused by the motor or imbalances in the moving parts, warped discs which cause a vertical movement of the stylus, and so on. These low-frequency signals are grouped together under the heading of rumble.
A rumble measurement is performed by playing a disc that has no signal on it, and measuring the output signal’s level. However, that output signal is first filtered to ensure that the level detection is not influenced by higher-frequency problems that may exist.
The characteristics of the filters are defined in internal standards such as DIN 45 539 (or IEC98-1964), shown below. Note that I’ve only plotted the target response. The specifications allow for some deviation of ±1 dB (except at 315 Hz). Notice that the low-pass filter is the same for both the Weighted and the Unweighted filters. Only the high-pass filter specifications are different for the two cases.
If the standard being used for the rumble measurement is the DIN 45 539 specification, then the resulting value is stated as the level difference between the measured filtered noise and a the standard output level, equivalent to the output when playing a 1 kHz tone with a lateral modulation velocity of 70.7 mm/sec. This detail is also worth noting, since it shows that the rumble value is a relative and not an absolute output level.
Every recording / playback system, whether for audio or for video signals, is based on the fundamental principle that the recording and the playback happen at the same rate. For example, a film that was recorded at 24 frames (or photos) per second (FPS) must also be played at 24 FPS to avoid objects and persons moving too slowly or too quickly. It’s also necessary that neither the recording nor the playback speed changes over time.
A phonographic LP is mastered with the intention that it will be played back at a rotational speed of 33 1/3 RPM (Revolutions Per Minute) or 45 RPM, depending on the disc. (These correspond to 1 revolution either every 1.8 seconds or every 1 1/3 seconds respectively.) We assume that the rotational speed of the lathe that was used to cut the master was both very accurate and very stable. Although it is the job of the turntable to duplicate this accuracy and stability as closely as possible, measurable errors occur for a number of reasons, both mechanical and electrical. When these errors are measured using especially-created audio signals like pure sine tones, the results are filtered and analyzed to give an impression of how audible they are when listening to music. However, a problem arises in that a simple specification (such as a single number for “Wow and Flutter”, for example) can only be correctly interpreted with the knowledge of how the value is produced.
The first issue is the simple one of accuracy: is the turntable rotating the disc at the correct average speed? Most turntables have some kind of user control of this (both for the 33 and 45 RPM settings), since it will likely be necessary to adjust these occasionally over time, as the adjustment will drift with influences such as temperature and age.
Like any audio system, regardless of whether it’s analogue or digital, the playback speed of the turntable will vary over time. As it increases and decreases, the pitch of the music at the output will increase and decrease proportionally. This is unavoidable. Therefore, there are two questions that result:
How much does the speed change?
What is the rate and pattern of the change?
In a turntable, the amount of the change in the rotational speed is directly proportional to the frequency shift in the audio output. Therefore for example, if the rotational speed decreases by 1% (for example, from 33 1/3 RPM to exactly 33 RPM), the audio output will drop in frequency by 1% (so a 440 Hz tone will be played as a 440 * 0.99 = 435.6 Hz tone). Whether this is audible is dependent on different factors including
the rate of change to the new speed (a 1% change 4 times a second is much easier to hear than a 1% change lasting 1 hour)
the listener’s abilities (for example, a person with “absolute pitch” may be able to recognise the change)
the audio signal (It is easier to detect a frequency shift of a single, long tone such as a note on a piano or pipe organ than it is of a short sound like a strike of claves or a sound with many enharmonic frequencies such as a snare drum.)
In an effort to simplify the specification of stability in analogue playback equipment such as turntables, four different classifications are used, each corresponding to different rates of change. These are drift, wow, flutter, and scrape, the two most popular of which are wow and flutter, and are typically grouped into one value to represent them.
Frequency drift is the tendency of a playback device’s speed to change over time very slowly. Any variation that happens slower than once every 2 seconds (in other words, with a modulation frequency of less than 0.5 Hz) is considered to be drift. This is typically caused by changes such as temperature (as the playback device heats up) or variations in the power supply (due to changes in the mains supply, which can vary with changing loads throughout the day).
Wow is a modulation in the speed ranging from once every 2 seconds to 6 times a second (0.5 Hz to 6 Hz). Note that, for a turntable, the rotational speed of the disc is within this range. (At 33 1/3 RPM: 1 revolution every 1.8 seconds is equal to approximately 0.556 Hz.)
Flutter describes a modulation in the speed ranging from 6 to 100 times a second (6 Hz to 100 Hz).
Scrape or scrape flutter describes changes in the speed that are higher than 100 Hz. This is typically only a problem with analogue tape decks (caused by the magnetic tape sticking and slipping on components in its path) and is not often used when classifying turntable performance.
Measurement and Weighting
The easiest accurate method to measure the stability of the turntable’s speed within the range of Wow and Flutter is to follow one of the standard methods, of which there are many, but they are all similar. Examples of these standards are AES6-2008, CCIR 409-3, DIN 45507, and IEC-386. A special measurement disc containing a sine tone, usually with a frequency of 3150 Hz is played to a measurement device which then does a frequency analysis of the signal. In a perfect system, the result would be a 3150 Hz sine tone. In practice, however, the frequency of the tone varies over time, and it is this variation that is measured and analysed.
There is general agreement that we are particularly sensitive to a modulation in frequency of about 4 Hz (4 cycles per second) applied to many audio signals. As the modulation gets slower or faster, we are less sensitive to it, as was illustrated in the example above: (a 1% change 4 times a second is much easier to hear than a 1% change lasting 1 hour).
So, for example, if the analysis of the 3150 Hz tone shows that it varies by ±1% at a frequency of 4 Hz, then this will have a bigger impact on the result than if it varies by ±1% at a frequency of 0.1 Hz or 40 Hz. The amount of impact the measurement at any given modulation frequency has on the total result is shown as a “weighting curve” in the figure below.
As can be seen in this curve, a modulation at 4 Hz has a much bigger weight (or impact) on the final result than a modulation at 0.315 Hz or at 140 Hz, where a 20 dB attenuation is applied to their contribution to the total result. Since attenuating a value by 20 dB is the same as dividing it by 10; a ±1% modulation of the 3150Hz tone at 4 Hz will produce the same result as a ±10% modulation of the 3150 Hz tone at 140 Hz, for example.
This shows just one example of why comparing one Wow and Flutter measurement value should be interpreted very cautiously.
Expressing the result
When looking at a Wow and Flutter specification, one will see something like <0.1%, <0.05% (DIN), or <0.1% (AES6). Like any audio specification, if the details of the measurement type are not included, then the value is useless. For example, “W&F: <0.1%” means nothing, since there is no way to know which method was used to arrive at this value.(Similarly, a specification like “Frequency Range: 20 Hz to 20 kHz” means nothing, since there is no information about the levels used to define the range.)
If the standard is included in the specification (DIN or AES6, for example), then it is still difficult to compare wow and flutter values. This is because, even when performing identical measurements and applying the same weighting curve shown in the figure above, there are different methods for arriving at the final value. The value that you see may be a peak value (the maximum deviation from the average speed), the peak-to-peak value (the difference between the minimum and the maximum speeds), the RMS (a version of the average deviation from the average speed), or something else.
The AES6-2008 standard, which is the currently accepted method of measuring and expressing the wow and flutter specification, uses a “2-sigma” method, which is a way of looking at the peak deviation to give a kind of “worst-case” scenario. In this method, the 3150 Hz tone is played from a disc and captured for as long a time as is possible or feasible. Firstly, the average value of the actual frequency of the output is found (in theory, it’s fixed at 3150 Hz, but this is never true). Next, the short-term variation of the actual frequency over time is compared to the average, and weighted using the filter shown above. The result shows the instantaneous frequency variations over the length of the captured signal, relative to the average frequency (however, the effect of very slow and very fast changes have been reduced by the filter). Finally, the standard deviation of the variation from the average is calculated, and multiplied by 2 (“2-Sigma”, or “two times the standard deviation”), resulting in the value that is shown as the specification. The reason two standard deviations is chosen is that (in the typical case where the deviation has a Gaussian distribution) the actual Wow & Flutter value should exceed this value no more than 5% of the time.
The reason this method is preferred today is that it uses a single number to express not only the wow and flutter, but the probability of the device reaching that value. For example, if a device is stated to have a “Wow and Flutter of <1% (AES6)”, then the actual deviation from the average speed will be less than 1% for 95% of the time you are listening to music. The principal reason this method was not used in the “old days” is that it requires statistical calculations applied to a signal that was captured from the output of the turntable, an option that was not available decades ago. The older DIN method that was used showed a long-term average level that was being measured in real-time using analogue equipment such as the device shown in below.
Unfortunately, however, it is still impossible to know whether a specification that reads “Wow and Flutter: 1% (AES6)” means 1% deviation with a modulation frequency of 4 Hz or 10% deviation with a modulation frequency of 140 Hz – or something else. It is also impossible to compare this value to a measurement done with one of the older standards such as the DIN method, for example.
As was discussed in Part 3, when a record master is cut on a lathe, the cutter head follows a straight-line path as it moves from the outer rim to the inside of the disk. This means that it is always modulating in a direction that is perpendicular to the groove’s relative direction of travel, regardless of its distance from the centre.
A turntable should be designed to ensure that the stylus tracks the groove made by the cutter head in all aspects. This means that this perpendicular angle should be maintained across the entire surface of the disk. However, in the case of a tonearm that pivots, this is not possible, since the stylus follows a circular path, resulting in an angular tracking error.
The location of the pivot point, the tonearm’s shape, and the mounting of the cartridge can all contribute to reducing this error. Typically, tonearms are designed so that the cartridge is angled to not be in-line with the pivot point. This is done to ensure that there can be two locations on the record’s surface where the stylus is angled correctly relative to the groove.
However, the only real solution is to move the tonearm in a straight line across the disc, maintaining a position that is tangential to the groove, and therefore keeping the stylus located so that its movement is perpendicular to the groove’s relative direction of travel, just as it was with the cutter head on the lathe.
In a perfect system, the movement of the tonearm would be completely synchronous with the sideways “movement” of the groove underneath it, however, this is almost impossible to achieve. In the Beogram 4000c, a detection system is built into the tonearm that responds to the angular deviation from the resting position. The result is that the tonearm “wiggles” across the disk: the groove pulls the stylus towards the centre of the disk for a small distance before the detector reacts and moves the back of the tonearm to correct the angle.
Typically, the distance moved by the stylus before the detector engages the tracking motor is approximately 0.1 mm, which corresponds to a tracking error of approximately 0.044º.
One of the primary artefacts caused by an angular tracking error is distortion of the audio signal: mainly second-order harmonic distortion of sinusoidal tones, and intermodulation distortion on more complex signals. (see “Have Tone Arm Designers Forgotten Their High-School Geometry?” in The Audio Critic, 1:31, Jan./Feb., 1977.) It can be intuitively understood that the distortion is caused by the fact that the stylus is being moved at a different angle than that for which it was designed.
It is possible to calculate an approximate value for this distortion level using the following equation:
Where is the harmonic distortion in percent, is the angular frequency of the modulation caused by the audio signal (calculated using ), is the peak amplitude in mm, is the tracking error in degrees, is the angular frequency of rotation (the speed of the record in radians per second. For example, at 33 1/3 RPM, ) and is the radius (the distance of the groove from the centre of the disk). (see “Tracking Angle in Phonograph Pickups” by B.B. Bauer, Electronics (March 1945))
This equation can be re-written, separating the audio signal from the tonearm behaviour, as shown below.
which shows that, for a given audio frequency and disk rotation speed, the audio signal distortion is proportional to the horizontal tracking error over the distance of the stylus to the centre of the disk. (This is the reason one philosophy in the alignment of a pivoting tonearm is to ensure that the tracking error is reduced when approaching the centre of the disk, since the smaller the radius, the greater the distortion.)
It may be confusing as to why the position of the groove on the disk (the radius) has an influence on this value. The reason is that the distortion is dependent on the wavelength of the signal encoded in the groove. The longer the wavelength, the lower the distortion. As was shown in Figure 1 in Part 6 of this series, the wavelength of a constant frequency is longer on the outer groove of the disk than on the inner groove.
Using the Beogram 4000c as an example at its worst-case tracking error of 0.044º: if we have a 1 kHz sine wave with a modulation velocity of 34.1 mm/sec on a 33 1/3 RPM LP on the inner-most groove then the resulting 2nd-harmonic distortion will be 0.7% or about -43 dB relative to the signal. At the outer-most groove (assuming all other variables remain constant), the value will be roughly half of that, at 0.3% or -50 dB.
Today I was working on a little acoustics simulation patcher in Cycling 74’s Max, and part of the code required the use of a modulo function. No problem, right?
Problem. I originally wrote the code in Matlab, and I was porting it to Max; and the numbers just weren’t working properly. After getting rid of my own home-made bugs, it still wasn’t working…
Turns out that there seems to be a disagreement in the code community about how to do the modulo of a negative number.
The best indication of the problem I was facing is found on this page, where you can see that different languages come up with different answers for -13 mod 3 and 13 mod -3. The problem is that neither Max nor Matlab are in the list. So: here are the results of those two, to add to the list.
The results are:
13 mod 3
-13 mod 3
13 mod -3
-13 mod -13
This means that Matlab behaves like Python, using the formula
mod(a, n) = a – n * floor(a / n)
whereas Max behaves like C and Java.
So, if you, like me, move back and forth between Matlab and Max, beware!
In order to keep the stylus tip in the groove of the record, it must have some force pushing down on it. This force must be enough to keep the stylus in the groove. However, if it is too large, then both the vinyl and the stylus will wear more quickly. Thus a balance must be found between “too much” and “not enough”.
As can be seen in Figure 1, the typical tracking force of phonograph players has changed considerably since the days of gramophones playing shellac discs, with values under 10 g being standard since the introduction of vinyl microgroove records in 1948. The original recommended tracking force of the Beogram 4002 was 1 g, however, this has been increased to 1.3 g for the Beogram 4000c in order to help track more recent recordings with higher modulation velocities and displacements.
Effective Tip Mass
The stylus’s job is to track all of the vibrations encoded in the groove. It stays in that groove as a result of the adjustable tracking force holding it down, so the moving parts should be as light as possible in order to ensure that they can move quickly. The total apparent mass of the parts that are being moved as a result of the groove modulation is called the effective tip mass. Intuitively, this can be thought of as giving an impression of the amount of inertia in the stylus.
It is important to not confuse the tracking force and the effective tip mass, since these are very different things. Imagine a heavy object like a 1500 kg car, for example, lifted off the ground using a crane, and then slowly lowered onto a scale until it reads 1 kg. The “weight” of the car resting on the scale is equivalent to 1 kg. However, if you try to push the car sideways, you will obviously find that it is more difficult to move than a 1 kg mass, since you are trying to overcome the inertia of all 1500 kg, not the 1 kg that the scale “sees”. In this analogy, the reading on the scale is equivalent to the Tracking Force, and the mass that you’re trying to move is the Effective Tip Mass. Of course, in the case of a phonograph stylus, the opposite relationship is desirable; you want a tracking force high enough to keep the stylus in the groove, and an effective tip mass as close to 0 as possible, so that it is easy for the groove to move it.
Imagine an audio signal that is on the left channel only. In this case, the variation is only on one of the two groove walls, causing the stylus tip to ride up and down on those bumps. If the modulation velocity is high, and the effective tip mass is too large, then the stylus can lift off the wall of the groove just like a car leaving the surface of a road on the trailing side of a bump. In order to keep the car’s wheels on the road, springs are used to push them back down before the rest of the car starts to fall. The same is true for the stylus tip. It’s being pushed back down into the groove by the cantilever that provides the spring. The amount of “springiness” is called the compliance of the stylus suspension. (Compliance is the opposite of spring stiffness: the more compliant a spring is, the easier it is to compress, and the less it pushes back.)
Like many other stylus parameters, the compliance is balanced with other aspects of the system. In this case it is balanced with the effective mass of the tonearm (which includes the tracking force(1), resulting in a resonant frequency. If that frequency is too high, then it can be audible as a tone that is “singing along” with the music. If it’s too low, then in a worst-case situation, the stylus can jump out of the record groove.
If a turntable is very poorly adjusted, then a high tracking force and a high stylus compliance (therefore, a “soft” spring) results in the entire assembly sinking down onto the record surface. However, a high compliance is necessary for low-frequency reproduction, therefore the maximum tracking force is, in part, set by the compliance of the stylus.
If you are comparing the specifications of different cartridges, it may be of interest to note that compliance is often expressed in one of five different units, depending on the source of the information:
“Compliance Unit” or “cu”
mm/N millimetres of deflection per Newton of force
µm/mN micrometres of deflection per thousandth of a Newton of force
x 10^-6 cm/dyn hundredths of a micrometre of deflection per dyne of force
x 10^-6 cm / 10^-5 N hundredths of a micrometre of deflection per hundred-thousandth of a Newton of force
mm/N = 1000 µm / 1000 mN
1 dyne = 0.00001 Newton
Then this means that all five of these expressions are identical, so, they can be interchanged freely. In other words:
The earliest styli were the needles that were used on 78 RPM gramophone players. These were typically made from steel wire that was tapered to a conical shape, and then the tip was rounded to a radius of about 150 µm, by tumbling them in an abrasive powder.(1) This rounded curve at the tip of the needle had a hemispherical form, and so styli with this shape are known as either conical or spherical.
The first styli made for “microgroove” LP’s had the same basic shape as the steel predecessor, but were tipped with sapphire or diamond. The conical/spherical shape was a good choice due to the relative ease of manufacture, and a typical size of that spherical tip was about 36 µm in diameter. However, as recording techniques and equipment improved, it was realised that there are possible disadvantages to this design.
Remember that the side-to-side shape of the groove is a physical representation of the audio signal: the higher the frequency, the smaller the wave on the disc. However, since the disc has a constant speed of rotation, the speed of the stylus relative to the groove is dependent on how far away it is from the centre of the disc. The closer the stylus gets to the centre, the smaller the circumference, so the slower the groove speed.
If we look at a 12″ LP, the smallest allowable diameter for the modulated groove is about 120 mm, which gives us a circumference of about 377 mm (or 120 * π). The disc is rotating 33 1/3 times every minute which means that it is making 0.56 of a rotation per second. This, in turn, means that the stylus has a groove speed of 209 mm per second. If the audio signal is a 20,000 Hz tone at the end of the recording, then there must be 20,000 waves carved into every 209 mm on the disc, which means that each wave in the groove is about 0.011 mm or 11 µm long.
However, now we have a problem. If the “wiggles” in the groove have a total wavelength of 11 µm, but the tip of the stylus has a diameter of about 36 µm, then the stylus will not be able to track the groove because it’s simply too big (just like the tires of your car do not sink into every small crack in the road). Figure 3 shows to-scale representations of a conical stylus with a diameter of 36 µm in a 70 µm-wide groove on the inside radius of a 33 1/3 RPM LP (60 mm from the centre of the disc), viewed from above. The red lines show the bottom of the groove and the black lines show the edge where the groove meets the surface of the disc. The blue lines show the point where the stylus meets the groove walls. The top plot is a 1 kHz sine wave and the bottom plot is a 20 kHz sine wave, both with a lateral modulation velocity of 70 mm/sec. Notice that the stylus is simply too big to accurately track the 20 kHz tone.
One simple solution was to “sharpen” the stylus; to make the diameter of the spherical tip smaller. However, this can cause two possible side effects. The first is that the tip will sink deeper into the groove, making it more difficult for it to move independently on the two audio channels. The second is that the point of contact between the stylus and the vinyl becomes smaller, which can result in more wear on the groove itself because the “footprint” of the tip is smaller. However, since the problem is in tracking the small wavelength of high-frequency signals, it is only necessary to reduce the diameter of the stylus in one dimension, thus making the stylus tip elliptical instead of conical. In this design, the tip of the stylus is wide, to sit across the groove, but narrow along the groove’s length, making it small enough to accurately track high frequencies. An example showing a 0.2 mil x 0.7 mil (10 x 36 µm) stylus is shown in Figure 4. Notice that this shape can track the 20 kHz tone more easily, while sitting at the same height in the groove as the conical stylus in Figure 3.
Both the conical and the elliptical stylus designs have a common drawback in that the point of contact between the tip and the groove wall is extremely small. This can be seen in Figure 5, which shows various stylus shapes from the front. Notice the length of the contact between the red and black lines (the stylus and the groove wall). As a result, both the groove of the record and the stylus tip will wear over time, generally resulting in an increasing loss of high frequency output. This was particularly a problem when the CD-4 Quadradisc format was introduced, since it relies on signals as high as 45 kHz being played from the disc.(2) In order to solve this problem, a new stylus shape was invented by Norio Shibata at JVC in 1973. The idea behind this new design is that the sides of the stylus are shaped to follow a much larger-radius circle than is possible to fit into the groove, however, the tip has a small radius like a conical stylus. An example showing this general concept can be seen on the right side of Figure 5.
There have been a number of different designs following Shibata’s general concept, with names such as MicroRidge (which has an interesting, almost blade-like shape “across” the groove), Fritz-Geiger, Van-den-Hul, and Optimized Contour Contact Line. Generally, these designs have come to be known as line contact (or contact line) styli, because the area of contact between the stylus and the groove wall is a vertical line rather than a single point.
Originally, the Beogram 4002 was supplied with an MMC 6000 cartridge, which featured a stylus tip designed by Subir K. Pramanik, an engineer at Bang & Olufsen. This became known as the Pramanik diamond, and was designed to ensure maximum surface area with the groove wall on its vertical axis while maintaining a minimum contact along the horizontal axis.
Bonded vs. Nude
There is one small, but important point regarding a stylus’s construction. Although the tip of the stylus is almost always made of diamond today, in lower-cost units, that diamond tip is mounted or bonded to a titanium or steel pin which is, in turn, connected to the cantilever (the long “arm” that connects back to the cartridge housing). This bonded design is cheaper to manufacture, but it results in a high mass at the stylus tip, which means that it will not move easily at high frequencies.
In order to reduce mass, the steel pin is eliminated, and the entire stylus is made of diamond instead. This makes things more costly, but reduces the mass dramatically, so it is preferred if the goal is higher sound performance. This design is known as a nude stylus.
See “The High-fidelity Phonograph Transducer” B.B. Bauer, JAES 1977 Vol 25, Number 10/11, Oct/Nov 1977
The CD4 format used a 30 kHz carrier tone that was frequency-modulated ±15 kHz. This means that the highest frequency that should be tracked by the stylus is 30 kHz + 15 kHz = 45 kHz.
As mentioned above, when a wire is moved through a magnetic field, a current is generated in a wire that is proportional to the velocity of the movement. In order to increase the output, the wire can be wrapped into a coil, effectively lengthening the piece of wire moving through the field. Most phono cartridges make use of this behaviour by using the movement of the stylus to either:
move tiny magnets that are placed near coils of wire (a Moving Magnet or MM design or
move tiny coils of wire that are placed near very strong magnets (a Moving Coil or MC design)
In either system, there is a relative physical movement that is used to generate the electrical signal from the cartridge. There are advantages and disadvantages associated with both of these systems, however, they’re well-discussed in other places, so I won’t talk about them here.
There is a third, less common design called a Moving Iron (or variable-reluctance(1)) system, which can be thought of as a variant of the Moving Magnet principle. In this design, the magnet and the coils remain stationary, and the stylus moves a small piece of iron instead. That iron is placed between the north and south poles of the magnet so that, when it moves, it modulates (or varies) the magnetic field. As the magnetic field modulates, it moves relative to the coils, and an electrical signal is generated. One of the first examples of this kind of pickup was the Western Electric 4A reproducer made in 1925.
In 1963, Erik Rørbaek Madsen of Bang & Olufsen filed a patent for a cartridge based on the Moving Iron principle. In it, a cross made of Mu-metal is mounted on the stylus. Each arm of the cross is aligned with the end of a small rod called a “pole piece” (because it was attached to the pole of a magnet on the opposite end). The cross is mounted diagonally, so the individual movements of the left and right channels on the groove cause the arms of the cross to move accordingly. For a left-channel signal, the bottom left and top right cross arms move in opposite directions – one forwards and one backwards. For a right-channel signal, the bottom right and top left arms move instead. The two coils that generate the current for each audio channel are wired in a push-pull relationship.
There are a number of advantages to this system over the MM and MC designs. Many of these are described in the original 1963 patent, as follows:
“The channel separation is very good and induction of cross talk from one channel to the other is minimized because cross talk components are in phase in opposing coils.”
“The moving mass which only comprises the armature and the stylus arm can be made very low which results in good frequency response.”
“Hum pick-up is very low due to the balanced coil construction”
“… the shielding effect of the magnetic housing … provides a completely closed magnetic circuit which in addition to shielding the coil from external fields prevents attraction to steel turntables.”
Finally, (although this is not mentioned in the patent) the push-pull wiring of the coils “reduces harmonic distortion induced by the non-linearity of the magnetic field.”(2)
reluctance is the magnetic equivalent of electrical resistance
Every audio device relies on a rather simple balancing act. The “signal”, whether it’s speech, music, or sound effects, should be loud enough to mask the noise that is inherent in the recording or transmission itself. The measurement of this “distance” in level is known as the Signal-to-Noise Ratio or SNR. However, the signal should not be so loud as to overload the system and cause distortion effects such as clipping, which results in what is commonly called Total Harmonic Distortion or THD.(1) One basic method to evaluate the quality of an audio signal or device is to group these two measurements into one value: the Total Harmonic Distortion plus Noise or THD+N value. The somewhat challenging issue with this value is that a portion of it (the noise floor) is typically independent of the signal level, since a device or signal will have some noise regardless of whether a signal is present or not. However, the distortion is typically directly related to the level of the signal.
In modern digital PCM audio signal (assuming that it is correctly-implemented and ignoring any additional signal processing), the noise floor is the result of the dither that is used to randomise the inherent quantisation error in the encoding system. This noise is independent of the signal level, and entirely dependent on the resolution of the system (measured in the number of bits used to encode each sample). The maximum possible level that can be encoded without incurring additional distortion that is inherent in the encoding system itself is when the maximum (or minimum) value in the audio signal reaches the highest possible signal value of the system. Any increase in the signal’s level beyond this will be clipped, and harmonic distortion artefacts will result.
Figure 1 shows two examples of the relationship between the levels of the signal and the THD+N in a digital audio system. The red line shows a 24-bit encoding, the blue line is for 16-bit. The “flat line” on the left of the plot is the result of the noise floor of the system. In this region, the signal level is so low, it’s below the noise floor of the system itself, so the only measurable output is the noise, and not the signal. As we move towards the right, the input signal gets louder and raises above the noise floor, so the output level naturally increases as well. However, in a digital audio system, we reach a maximum possible input level of 0 dB FS. If we try to increase the signal’s level above this, the signal itself will not get louder, however, it will become more and more distorted. As a result, the distortion artefacts quickly become almost as loud as the signal itself, and so the plots drop dramatically.
This is why good recording engineers typically attempt to align the levels of the microphones to ensure that the maximum peak of the entire recording will just barely reach the maximum possible level of the digital recording system. This ensures that they are keeping above the noise floor as much as possible without distorting the signals.
Audio signals recorded on analogue-only devices generally have the same behaviour; there is a noise floor that should be avoided and a maximum level above which distortion will start to increase. However, many analogue systems have a slightly different characteristic, as can be seen in the idealised model shown in Figure 2. Notice that, just like in the digital audio system, the noise floor is constant, and as the level of the input signal is increased, it rises above this. However, in an analogue system, the transition to a distorted signal is more gradual, seen as the more gentle slopes of the curves on the right side of the graph.
As a result, in a typical analogue audio system, there is an “optimal” level that is seen to be the best compromise between the signal being loud enough above the noise floor, but not distorting too much. The question of how much distortion is “too much” can then be debated — or even used as an artistic effect (as in the case of so-called “tape compression”).
If we limit our discussion to the stylus tracking a groove on a vinyl disc, converting that movement to an electrical signal that is amplified and filtered in a RIAA-spec preamplifier, then a phonograph recording is an analogue format. This means, generally speaking, that there is an optimal level for the audio signal, which, in the case of vinyl, means a modulation velocity of the stylus, converted to an electrical voltage.
Although there are some minor differences of opinion, a commonly-accepted optimum level for the groove on a stereo recording is 35.4 mm/sec for a single audio channel at 1,000 Hz. In a case where both audio channels have the same 1 kHz signal recorded in phase (as a dual-monophonic signal), then this means that the lateral velocity of the stylus will be 50 mm/sec.(2)
Of course, the higher the modulation velocity of the stylus, the higher the output of the turntable. However, this would also mean that the groove on the vinyl disc would require more space, since it is being modulated more. This means that there is a relationship between the total playing time of a vinyl disc and the modulation velocity. In order to have 20 minutes of music on a 12” LP spinning at 33 1/3 RPM, then it the standard method was to cut 225 “lines per inch” or “LPI” (about 89 lines per centimetre) on the disc. If a mastering engineer wishes to have a signal with a higher output, then the price is a lower playing time (because the grooves much be spaced further apart to accommodate the higher modulation velocity) however, in well-mastered recordings, this spacing is varied according to the dynamic range of the audio signal. In fact, in some classical recordings, it is easy to see the louder passages in the music because the grooves are intentionally spaced further apart, as is illustrated in Figure 3.
A large part of the performance of a turntable is dependent on the physical contact between the surface of the vinyl and the tip of the stylus. In general terms, as we’re already seen, there is a groove with two walls that vary in height, almost independently and the tip of the stylus traces that movement accordingly. However, it is necessary to get down to the microscopic level to consider this behaviour in more detail.
When a record is mastered (meaning, when the master disc is created on a lathe) the groove is cut by a heated stylus that has a specific shape, shown in Figure4. The depth of the groove can range from a minimum of 25 µm to a maximum of 127µm, which, in turn varies the width of the groove.(3)
The result is a groove with a varying width and depth that are dependent on the decisions made by the mastering engineer, and a modulation displacement (the left/right size of the “wiggle”) that is dependent on the level of the audio signal that is being reproduced.
In a perfect situation, the stylus that is used to play that signal back on a turntable would have exactly the same shape as the cutting stylus, since this would mean that the groove is traced in exactly the same way that it was cut. This, however, is not practical for a number of reasons. As a result, there are a number of options when choosing the shape of the playback stylus.
The assumption here is that the distortion produces harmonics of the signal, which is a simplified view of the truth, but an effect that is easy to measure.
(35.4*2) / sqrt(2) because the two channels are modulated at an angle of 45 degrees to the surface of the disc.
See “The High-fidelity Phonograph Transducer” B.B. Bauer, JAES 1977 Vol 25, Number 10/11, Oct/Nov 1977