Aliasing is weird: Part 1

One of the best-known things about digital audio is the fact that you cannot record a signal that has a frequency that is higher than 1/2 the sampling rate.

Now, to be fair, that statement is not true. You CAN record a signal that has a frequency that is higher than 1/2 the sampling rate. You just won’t be able to play it back properly, because what comes out of the playback will not be the original frequency, but an alias of it.

If you record a one-spoked wheel with a series of photographs (in the old days, we called this ‘a movie’), the photos (the frames of the movie) might look something like this:

As you can see there, the wheel happens to be turning at a speed that results in it rotating 45º every frame.

The equivalent of this in a digital audio world would be if we were recording a sine wave that rotated (yes…. rotated…) 45º every sample, like this:

Notice that the red lines indicating the sample values are equivalent to the height of the spoke at the wheel rim in the first figure.

If we speed up the wheel’s rotation so that it rotated 90º per frame, it looks like this:

And the audio equivalent would look like this:

Speeding up even more to 135º per frame, we get this:

and this:

Then we get to a magical speed where the wheel rotated 180º per frame. At this speed, it appears when we look at the playback of the film that the wheel has stopped, and it now has two spokes.

In the audio equivalent, it looks like the result is that we have no output, as shown below.

However, this isn’t really true. It’s just an artefact of the fact that I chose to plot a sine wave. If I were to change the phase of this to be a cosine wave (at the same frequency) instead, for example, then it would definitely have an output.

At this point, the frequency of the audio signal is 1/2 the sampling rate.

What happens if the wheel goes even faster (and audio signal’s frequency goes above this)?

Notice that the wheel is now making more than a half-turn per frame. We can still record it. However, when we play it back, it doesn’t look like what happened. It looks like the wheel is going backwards like this:

Similarly, if we record a sine wave that has a frequency that is higher than 1/2 the sampling rate like this:

Then, when we play it back, we get a lower frequency that fits the samples, like this:

Just a little math

There is a simple way to calculate the frequency of the signal that you get out of the system if you know the sampling rate and the frequency of the signal that you tried to record.

Let’s use the following abbreviations to make it easy to state:

Fs = Sampling rate
F_in = frequency of the input signal
F_out = frequency of the output signal

IF
F_in < Fs/2
THEN
F_out = F_in

IF
Fs > F_in > Fs/2
THEN
F_out = Fs/2 – (F_in – Fs/2) = Fs – F_in

Some examples:

If your sampling rate is 48 kHz, and you try to record a 25 kHz sine wave, then the signal that you will play back will be:
48000 – 25000 = 23000 Hz

If your sampling rate is 48 kHz, and you try to record a 42 kHz sine wave, then the signal that you will play back will be:
48000 – 42000 = 6000 Hz

So, as you can see there, as the input signal’s frequency goes up, the alias frequency of the signal (the one you hear at the output) will go down.

There’s one more thing…

Go back and look at that last figure showing the playback signal of the sine wave. It looks like the sine wave has an inverted polarity compared to the signal that came into the system (notice that it starts on a downwards-slope whereas the input signal started on an upwards-slope). However, the polarity of the sine wave is NOT inverted. Nor has the phase shifted. The sine wave that you’re hearing at the output is going backwards in time compared to the signal at the input, just like the wheel appears to be rotating backwards when it’s actually going forwards.

In Part 2, we’ll talk about why you don’t need to worry about this in the real world, except when you REALLY need to worry about it.

Distortion effects on Linear measurements, Part 4

In Part 3, I showed that a magnitude responses calculated from impulse responses produced by the MLS and swept sine methods produce different results when the measurement signals themselves are distorted.

In this posting, I’ll focus on the swept sine method which showed that the apparent magnitude response of the system looked like a strange version of a low shelving filter, but there’s a really easy explanation for this that goes back to something I wrote in Part 1.

The way these systems work is to cross-correlate the signal that comes back from the DUT with the signal that was sent to it. Cross-correlation (in this case) is a bit of math that tells you how similar two signals are when they’re compared over a change in time (sort of…). So, if the incoming signal is identical to the outgoing signal at one moment in time but no other, then the result (the impulse response) looks like a spike that hits 1 (meaning “identical”) at one moment, and is 0 (meaning “not at all alike in any way…”) at all other times.

However, one important thing to remember is that both an MLS signal and a swept sine wave take some time to play. So, on the one hand, it’s a little weird to think of a 10-second sweep or MLS signal being converted to a theoretically-infinitely short impulse. On the other hand, this can be done if the system doesn’t change in time and therefore never changes: something we call a Linear Time-Invariant (or LTI) system.

But what happens if the DUT’s behaviour DOES change over time? Then things get weird.

At the end of Part 1, I said

For both the MLS and the sine sweep, I’m applying a pre-emphasis filter to the signal sent to the DUT and a reciprocal de-emphasis filter to the signal coming from it. This puts a bass-heavy tilt on the signal to be more like the spectrum of music. However, it’s not a “pinking” filter, which would cause a loss of SNR due to the frequency-domain slope starting at too low a frequency.

Then, in Part 2 I said that, to distort the signals, I

look for the peak value of the measurement signal coming into the DUT, and then clip it.

It’s the combination of these two things that results in the magnitude response of the system measured using a swept sine wave looking the way it does.

If I look at the signal that I actually send to the input of the DUT, it looks like this:

I’m normalising this to have a maximum value of 1 and then clipping it at some value like ±0.5, for example, like this:

So, it should be immediately obvious that, by choosing to clip the signal at 1/2 of the maximum value of the whole sweep, I’m not clipping the entire thing. I’m only distorting signals below some frequency that is related to the level at which I’m clipping. The harder I clip, the higher the frequency I mess up.

This is why, when we look at the magnitude response, it looks like this:

In the very low frequencies, the magnitude response is flat, but lower than expected, because the signal is clipped by the same amount. In the high frequencies, the signal is not clipped at all, so everything is behaving. In between these two bands, there is a transition between “not-behaving” and “behaving”.

This means that

if the signal I was sending into the system was clipped by the same amount at all frequencies, OR
if the pre-emphasis wasn’t applied to the signal, boosting the low frequencies

Then the magnitude response would look almost flat, but lower than expected (by the amount that is related to how much it’s clipped). In other words, we would (mostly) see the linear response of the system, even though it was behaving non-linearly – almost like if we had only sent a click through it.

However, if we chose to not apply the pre-emphasis to the signal, then the DUT wouldn’t be behaving the way it normally does, since this is very roughly equivalent to the spectral balance of music. For example, if you send a swept sine wave from 20 Hz to 20,000 Hz to a loudspeaker without applying that bass boost, you’ll could either get almost nothing out of your woofer, or you’ll burn out your tweeter (depending on how loudly you’re playing the sweep).

How does the result look without the pre-emphasis filter applied to the swept sine wave? For example, if we sent this to the DUT:

… and then we clipped it at 1/2 the maximum value, so it looks like this:

(notice that everything is clipped)

then the impulse response and magnitude response look like this instead:

… which is more similar to the results when we clip the MLS measurement signal in that we see the effects on the top end of the signal. However, it’s still not a real representation of how the DUT “sounds” whatever that means…

Distortion effects on Linear measurements, Part 3

This posting will just be some more examples of the artefacts caused by symmetrical clipping of the measurement signal for the MLS and swept-sine methods, clipping at different levels.

Remember that the clip level is relative to the peak level of the measurement signal.

MLS

Swept Sine

Swept Sine, clipping at 0.5 of peak level

Swept Sine, clipping at 0.3 of peak level

Swept Sine, clipping at 0.1 of peak level

The take-home message here is that, although both the MLS and the swept sine methods suffer from showing you strange things when the DUT is clipping, the swept sine method is much less cranky…

In the next posting, I’ll explain why this is the case.

Distortion effects on Linear measurements, Part 2

Let’s make a DUT with a simple distortion problem: It clips the signal symmetrically at 0.5 of the peak value of the signal, so if I send in a sine wave at 1 kHz, it looks like this:

Figure 1: An example of a symmetrically-clipped sine wave with a fundamental frequency of 1 kHz.

Now, to be fair, what I’m REALLY doing here is to look for the peak value of the measurement signal coming into the DUT, and then clipping it. This would be equivalent to doing a measurement of the DUT and adjusting the input gain so that it looks like a peak level of – 6 dB relative to maximum is coming in.

Also, because what I’m about to do through this series is going to have radical effects on the level after processing, I’m normalising the levels. So, some things won’t look right from a perspective of how-loud-it-appears-to-be.

If I measure that DUT using the three methods, the results look like this:

Figure 2: The impulse response of a clipping DUT, measured with an impulse (plot has been normalised for level).

Figure 3: The impulse response of a clipping DUT, measured with an MLS sequence (plot has been normalised for level).

Figure 4: The impulse response of a clipping DUT, measured with a swept sine (plot has been normalised for level).

As can easily be seen above, the three systems show very different responses. So, unlike what I claimed this post (which is admittedly over-simplified, although intentionally so to make a point…), the fact that they are measuring the impulse response does not mean that we can’t see the effects of the non-linear response. We can obviously see artefacts in the linear response that are caused by the distortion, but those artefacts don’t look like distortion, and they don’t really show us the real linear response.

Distortion effects on Linear measurements, Part 1

In the last posting, I made a big assumption: that it’s normal to measure the magnitude response of a device via an impulse response measurement.

In order to illustrate the fact that an impulse response measurement shows you only the linear response of a system (and not distortion effects such as clipping), I did an impulse response measurement using an impulse. However, it only took about 24 hours for someone to email me to point out that it’s NOT typical to use an impulse to do an impulse response measurement.

These days, that is true. In the old days, it was pretty normal to do an impulse response measurement of a room by firing a gun or popping a balloon. However, unless your impulse is really loud, this method suffers from a low signal-to-noise ratio.

So, these days, mainly to get a better signal-to-noise ratio, we typically use another kind of signal that can be turned into an impulse response using a little clever math. One method is to send a Maximum Length Sequence (or MLS) through the device. The other method uses a sine wave with a smoothly swept-frequency.

There are other ways to do it, but these two are the most common for reasons that I won’t get into.

In both the MLS and the swept-sine cases, you take the incoming signal from the DUT, and do some math that compares the outgoing signal to the incoming signal and converts the result into an impulse response. You can then use that to do your analyses of the linear response of the DUT.

If your DUT is behaving perfectly linearly, then this will work fine. However, if your DUT has some kind of non-linear distortion, then the effects of the distortion on the measurement signal will result in some kinds of artefacts that show up in the impulse response in a potentially non-intuitive way.

This series of postings is going to be a set of illustrations of these artefacts for different types of distortion. For the most part, I’m not going to try to explain why the artefacts look the way they do. It’s just a bunch of illustrations that might help you to recognise the artefacts in the future and to help you make better choices about how you’re doing the measurements in the first place.

:To start, let’s take a “perfect” DUT and

measure its impulse response using the three methods (impulse, MLS, and swept sine)
for the MLS and swept sine methods, convert the incoming signal to an impulse response and plot it
find the magnitude response of the impulse response via an FFT and plot that

The results of these three measurement methods are shown below:

If you believe in conspiracy theories, then you might be suspicious that I actually just put up the same plot three times and changed the caption, but you’ll have to trust me. I didn’t do that. I actually ran the measurement three times.

If you’re familiar with the MLS and/or swept sine techniques, then you’ll be interested in a little more information:

The sampling rate is 48 kHz
Calculating in a floating point world with lots of resolution (I’m doing this all in Matlab and I’m not quantising anything… yet…)
The MLS signal is 2^16 samples long
I’m using one MLS sequence (for now)
I am not averaging the MLS measurement. I just ran it once.
The swept sine starts at 1 Hz and runs for 10 seconds.
For both the MLS and the sine sweep, I’m applying a pre-emphasis filter to the signal sent to the DUT and a reciprocal de-emphasis filter to the signal coming from it. This puts a bass-heavy tilt on the signal to be more like the spectrum of music. However, it’s not a “pinking” filter, which would cause a loss of SNR due to the frequency-domain slope starting at too low a frequency.
My DUT isn’t really a device. It’s just code that I’m applying to the signal, so there’s no input or output via some transmission system like analogue cabling, for example…

Most of that will be true for the other parts of the rest of the series. When it’s not true, I’ll mention it.

One measurement is worse than no measurements

Let’s say that we have to do an audio measurement of a Device Under Test (DUT) that has one input and one output, as shown below.

We don’t know anything about the DUT.

One of the first things we do in the audio world is to measure what most people call the “frequency response” but is more correctly called the “magnitude response”. (It would only be the “frequency response” if you’re also looking at the phase information.)

The standard way to do this is to use an impulse response measurement. This is a method that relies on the fact that an infinitely short, infinitely loud click contains all frequencies at equal magnitude. (Of course, in the real world, it cannot be infinitely short, and if it were infinitely loud, you would have a Big Bang on your hands… literally…)

If we measure the DUT with a single-sample impulse with a value of 1, and use an FFT to convert the impulse response to a frequency-domain magnitude response and we see this:

… then we might conclude that the DUT is as perfect as it can be, within the parameters of a digital audio system. The click comes out just like it went in, therefore the output is identical to the input.

If we measure a different DUT (we’ll call it DUT #2) and we see this:

… then we might conclude that DUT #2 is also perfect. It’s just an attenuator that drops the level by half (or -6.02 dB).

However, we’d be wrong.

I made both of those DUTs myself, and I can tell you that one of those two conclusions is definitely incorrect – but it illustrates the point I’m heading towards.

If I take DUT #1 and send in a sine tone at about 1 kHz and look at the output, I’ll see this:

As you can see there, the output is a sine wave. It looks like one on the top plot, and the bottom plot tells me that there ONLY signal at 1 kHz, which proves it.

If I send the same sine tone through DUT #2 and look at the output, I’ll see this:

As you can see there, DUT #2 clips the input signal so that it cannot exceed ±0.5. This turns the sine wave into the beginnings of a square wave, and generates lots of harmonics that can be seen in the lower half of the plot.

What’s the point?

The point is something that is well-known by people who make audio measurements, but is too easily forgotten:

An Impulse Response measurement only shows you the linear behaviour of an audio device. If the system is non-linear, then your impulse response won’t help you. In a worst case, you’ll think that you measured the system, you’ll think that it’s behaving, and it’s not – because you need to do other measurements to find out more.

The question is “what is ‘non-linear’ behaviour in an audio device?”

This is anything that causes the device to make it impossible to know what the input was by looking at the output. Anything that distorts the signal because of clipping is a simple example (because you don’t know what happened in the input signal when the output is clipped). But other things are also non-linear. For example, dynamic processors like compressors, limiters, expanders and noise gates are all non-linear devices. Modulating delays (like in a chorus or phaser effect), or a transmission system with a drifting clock are other examples. So are psychoaoustic lossy codecs like MP3 and AAC because the signal that gets preserved by the codec changes in time with the signal’s content. Even a “loudness” function can be considered to have a kind of non-linear behaviour (since you get a different filter at different settings of the volume control).

It’s also important to keep in mind that any convolution-based processing is using the impulse response as the filter that is applied to the signal. So, if you have a convolution-based effects unit, it cannot simulate the distortion caused by vacuum tubes using ONLY convolution. This doesn’t mean that there isn’t something else in the processor that’s simulating the distortion. It just means that the distortion cannot be simulated by the convolver.*

P.S.

The reason for the title: “One measurement is worse than no measurements” is that, when you do a measurement (like the impulse response measurement on DUT #2) you gain some certainty about how the device is behaving. In many cases, that single measurement can tell the truth, but only a portion of it – and the remainder of the (hidden) truth might be REALLY bad… So, your one measurement makes you THINK that you’re safe, but you’re really not… It’s not the measurement that’s bad. The problem is the certainty that results in having done it.

* Actually, one of the questions on my comprehensive exams for my Ph.D. was about compressors, with a specific sub-question asking me to explain why you can’t build a digital compressor based on convolution (which was a new-and-sexy way to do processing back then…). The simple answer is that you can’t use a linear time-invariant processor to do non-linear, time-variant processing. It would be like trying to carry water in a net: it’s simply the wrong tool for the job.

It’s not acoustics, but it’s close

Perfect symmetry

When working on the last series of posts, I stumbled on a signal that caused an FFT analysis to look a little strange to me. This post is about that strangeness.

If I make a sine wave (in a floating point world) that sits perfectly on an FFT bin, and I do an FFT of it, the noise floor that I see is the result a lack of precision of the calculations that were used to make the signal. An example of this is shown in Figure 1.

Figure 1. Two plots of the same analysis. The top plot has a logarithmic frequency scale and a range of 20 Hz to 20 kHz. The bottom plot is on a linear frequency scale, and has a range of 0 Hz to 2 kHz.

As can be seen there, the noise floor in the fit is typically at least 300 dB down from the signal level. This means that if the signal has a peak amplitude of 1, then each bin in my FFT has a peak amplitude of less than 0.000 000 000 000 001, which is very, very, low.

If I dither and quantise the sine wave with, say, a 24-bit LPCM precision, then the result would be different, as shown in Figure 2.

Figure 2. Two plots of the same analysis. The top plot has a logarithmic frequency scale and a range of 20 Hz to 20 kHz. The bottom plot is on a linear frequency scale, and has a range of 0 Hz to 2 kHz.

Now the noise floor seen in the FFT analysis is the noise that is intentionally generated as dither to randomise the quantisation error when converting to LPCM.

However, what happens if the signal is quantised but not dithered? Then the result looks like Figure 3.

This is interesting because, starting the first bin, every second bin has nothing in it, so on a decibel scale, the value is -∞ dB. Why does this happen?

The short answer is symmetry. By quantising the sine wave, I made it perfectly symmetrical.

This removes the DC content, since the positive-going portion of the waveform is identical to the negative-going portion. Therefore there’s nothing at 0 Hz (which is DC) or any of its “harmonics” (at least in the world of FFT bins…).

There is content of some kind in the other bins because our sine wave is not perfectly sinusoidal. All those steps that I put in it are an error that generates information at frequency centres other than the sine tone’s itself.

So, if you do an FFT on a sinusoidal signal and you see a result where half of the bins have nothing in them, one possible reason is that you’re dealing with a perfectly symmetrical signal.

Excruciating minutiae: Part 4

I mentioned in this posting that lately I’ve been doing some measurements of a DUT that:

required a frequency analysis with a very big dynamic range
… which meant that I was testing it using a sine tone with a frequency that was exactly the same as an FFT bin’s frequency centre
… and the sine tone had to be sent through the device by playing a standard audio file (wav and/or FLAC)

So, I did this, but I saw some weirdness that I didn’t expect down in the noise floor of the FFT output. Whenever I’m testing something and I see something weird, I start working my way back through the audio chain to verify that the weirdness is coming from the thing that I’m testing, and not from my test system itself.

So, the first step was to do of an FFT of both the .wav and the .flac files that I was sending through the DUT. The results of this test looked something like Figure 1.

Before I go further, let’s clarify exactly what I did to generate those three plots.

Using Matlab, I made a sine wave with a frequency identical to an FFT bin that was a close as possible to 997 Hz as I could get with a 65,536-point FFT at 48 kHz. (See this posting for more information about this.)
I exported the signal using Matlab’s “audiowrite” function, both as a 24-bit wav and a 24-bit FLAC.
I imported the two files back into Matlab
I ran an FFT on the original, and the two imported files

I would not expect the bottom two plots to be as “good” as the top plot, since they’ve been reduced to a 24-bit fixed point version of the original floating-point signal. However, there are two things to notice in Figure 1.

The most important thing is that the FLAC and WAVE imports produce different results. This is weird.
The less-important (but more interesting, later…) thing is that, for the FLAC import, every odd-numbered FFT bin is -∞ dB, which means that there is absolutely NO energy at those frequencies.

First things first

Let’s address that first issue first. The FFTs show us that the signals coming back from the .wav and .flac import are different. But I’m interested in (1) how they’re different and (2) why they’re different.

Let’s try to answer the first question first. I made a linear ramp that had the same number of samples as the number of quantisation values and had a range of -1 to 1 (just like my sine wave…). So, to test a 16-bit export, I made a ramp that was 2¹⁶ = 65,536 samples long (shown in the top plot in Figure 2). To test a 24-bit export, the ramp was 2²⁴ samples long.

In theory, if I export this ramp to a file type with the matching number of bits, then each sample should quantise to the next quantisation level from the bottom to the top. I then exported this ramp out to .wav and .flac, imported it again, and looked at the result, which is shown in Figure 2.

If I subtract the results of the imported files from the original, I get the result shown in the middle plot in Figure 2. I would NOT expect either the .wav or the .flac to be identical to the original, since information is lost in the export to a 16- or 24-bit fixed point LPCM format. However, I WOULD expect the .wav and .flac to be the same, which they obviously aren’t.

As can be seen in the bottom plot in Figure 2, there is a 1-quantisation level difference between the .wav and .flac files for signal values higher than 0.

Now the question is whether this difference is inherent in the file format, or if something else is going on. To test this, I did the same test on the 997-ish Hz sine wave (again) without dither, but with my own quantisation (using the code shown in this post). The result of this test is shown in Figure 3.

As you can see there, the imported .wav and .flac files behave identically. But, if you look carefully and compare to the .flac version in Figure 1, you’ll see that they’re different from THAT version.

The fact that the red and blue plots in Figure 3 are identical tell me that .wav and .flac are identical.

The fact that my quantisation results in identical results in .wav and .flac, but are different from Matlab’s “audiowrite” results (which produces .wav and .flac files that are different from each other) tells me that Matlab’s quantisation is different for .wav and .flac – and different from what I’m doing.

So, I go back to the ramp shown in Figure 2 and dig into the details again, zooming in on the samples near a value of -1, 0, and 1. These are shown below in Figure 4.

It’s a bit cryptic to see the results in Figure 4, but let’s walk through it.

The top plot shows the ramp signal that I encoded as a 16-bit audio file in 4 different ways: as a .wav and a .flac, using audiowrite’s quantiser and mine.
The second plot shows the differences in the imported files relative to the original for the first 20 samples, which correspond to the bottom 20 quantisation levels. As can be seen there, the audiowrite quantiser’s result appears to be identical to the original (they’re not, as we saw in the middle plot of Figure 1, but they’re close…). My quantiser is one level higher. This is because I’m scaling my original signal so that it can’t reach the bottom, as I talked about in Part 2.
The third plot shows the behaviour of the three quantisers (2 audiowrites and mine) around the 0 value ±10 quantisation levels. Note that there’s no sample with a value of 0 (Because two’s complement is not symmetrical around the 0 value.). It’s not immediately obvious there, but all three quantisers have an “error” of 1/2 a quantisation level step around 0.
- Below 0, both of audiowrite’s quantisers have a negative offset, and mine has a positive offset.
- Above 0, audiowrite’s .flac quantiser has a positive offset whereas both audiowrite’s .wav quantiser and mine have a negative offset

If the signal were a sine wave, then we’d see the same thing, it would just be harder to interpret, as shown in Figure 5. (There’s nothing useful shown in the third plot there because when you zoom in so closely , the slope of the sine wave as it passes 0 is really steep…)

I titled this series of posts “Excruciating minutiae” for a reason. The “error” (let’s call it a “difference” instead) is VERY small. It’s a difference of 1 quantisation level on a portion of the signal, which raises the very pragmatic question: “So what?”

Unless you’re REALLY digging into the bottom of the noise floor of a device, you probably never need to care about this. (In fact, even if you ARE digging into the bottom of the noise floor, you might not need to care.)

You CERTAINLY don’t have to worry about it if you’re just writing audio files to listen to, since you should be dithering those with TPDF dither, which will create a noise floor that is FAR above the “errors” caused by the differences I described above. This can be seen in Figures 6 and 7 below.

In other words, I’ve been using Matlab to export test files both in .wav and .flac for at least 20 years, and it’s only now that I’ve noticed this issue, which is another way of saying “don’t worry about it…”

Nota Bene

If you’re still awake, you might notice that there is one loose end… At the top of this posting I said

The less-important (but more interesting, later…) thing is that, for the FLAC import, every odd-numbered FFT bin is -∞ dB, which means that there is absolutely NO energy at those frequencies.

That will be the topic of another posting, since it’s more or less unrelated to this one – it was just an artefact of the test I described above.

Excruciating minutiae: Part 3

In Part 2 of this series, I wrote the following sentence:

The easiest (and possibly best) way to do this is to create white noise with a triangular probability distribution function and a peak-to-peak amplitude of ± 1 quantisation level.

That’s a very busy sentence, so let’s unpack it a little.

Rolling the dice

If you roll one die, you have an equal probability of rolling any number between 1 and 6 (inclusive). Let’s roll one die 100 times counting the number of times we get a 1, or a 2, or a 3, and so on up to 6.

Number rolled	Number of times the number was rolled	Percentage of times the number was rolled
1	17	17
2	14	14
3	15	15
4	15	15
5	21	21
6	18	18

(Note that the percentage of times each number was rolled is the same as the number of times each number was rolled only because I rolled the die 100 times.)

If I plot those results, it looks like Figure 1.

Figure 1. The results of rolling 1 die 100 times.

It may be weird, but I’ve plotted the number of times I rolled -5 or 13 (for example). These are 0 times because it’s impossible to get those numbers by rolling one die. But the reason I put those results in there will make more sense later.

Let’s keep rolling the die. If I do it 1,000,000 times instead of 100, I get these results:

ed	Number of times the number was rolled	Percentage of times the number was rolled
1	166225	16.6225
2	166400	16.6400
3	166930	16.6930
4	167055	16.7055
5	166501	16.6501
6	166889	16.6889

Now, since I rolled many, many, more times, it’s more obvious that the six results have an equal probability. The more I roll the die, the more those numbers get closer and closer to each other.

Take a look at the shape of the plot above. The area under the line from 1 to 6 (inclusive) is almost a rectangle because the six numbers are all almost the same.

The shape of that plot shows us the probability of rolling the six numbers on the die, so we call it a probability density function or PDF. In this case, we see a rectangular PDF.

But what happens if we roll two dice instead? Now things get a little more complicated, since there is more than one way to get a total result, as shown in the table below.

Total
2	1+1
3	1+2	2+1
4	1+3	2+2	3+1
5	1+4	2+3	3+2	4+1
6	1+5	2+4	3+3	4+2	5+1
7	1+6	2+5	3+4	4+3	5+2	6+1
8		2+6	3+5	4+4	5+3	6+2
9			3+6	4+5	5+4	6+3
10				4+6	5+5	6+4
11					5+6	6+5
12						6+6

As can be (hopefully) seen in the table, there is only one way to roll a 2, and there’s only one way to roll a 12. But there are 6 different ways to roll a 7. Therefore, if you’re rolling two dice, it’s 6 times more likely that you’ll roll a 7 than a 12, for example.

If I were to roll two dice 1,000,000 times, I would get a PDF like the one shown in Figure 3.

I won’t explain why this would be considered to be a triangular PDF.

Whether you roll one die or two dice, the number you get is random. In other words, you can’t use the past results to predict what the next number will be. However, if you are rolling one die, and you bet that you’ll roll a 6 every time, you’ll be right about 16.7% of the time. If you’re rolling two dice and you bet that you’ll roll a 12 every time, you’ll only be right about 2.8% of the time.

Let’s take two dice of different colours, say, one red die and one blue die. We’ll roll both dice again, but instead of adding the two values, we’ll subtract the blue value from the red one. If we do this 1,000,000 times, we’ll get something like the results shown below in Figure 4.

Notice that the probability density function keeps the same shape, it’s just moved down to a range of ±5 instead of 2 to 12.

Generating noise

In audio, noise is a sound that is completely random. In other words, just like the example with the dice, in a digital audio signal, you can’t predict what the next sample value will be based on the past sample values. However, there are many different ways of generating that random number and manipulating its characteristics.

Let’s start with a computer algorithm that can generate a random number between 0 and 1 (inclusive) with a rectangular PDF. We’ll then ask the algorithm to spit out 1,000,000 values. If the numbers really are random, and the computer has infinite precision, then we’ll probably get 1,000,000 different numbers. However, we’re not really interested in the numbers themselves – we’re interested in how they’re distributed between 0.00 and 1.00. Let’s say we divide up that range into 100 steps (or “buckets”) that are 0.01 wide and count how many of our random numbers fall into each group. So, we’ll count how many are between 0.0 and 0.01, between 0.01 and 0.02, and so on up to 0.99 to 1.00. We’ll get something like Figure 5.

I’ve only plotted the probabilities of the possible values: 0 to 1, which winds up showing only the top of the rectangle in the rectangular PDF.

If I generate 1,000,000 random numbers with that algorithm, and then subtract 1,000,000 other random numbers, one by one, and find the probabilities of the result, the answer will be familiar.

So, this is how we make the noise that’s added to the signal. If, for each sample, you generate two random numbers (making sure that your algorithm has a rectangular PDF) and subtract one from the other, you have the dither signal that will have a maximum level of ±1 quantisation level.

The signal (with a maximum range of ±1) is scaled up by multiplying it by 2^{(NumberOfBits-1)}-2
then you add the result of the dither generator
then the total is rounded to the nearest integer value
and then the result is scaled back down by a factor of 2^{(NumberOfBits-1)} to bring its back down to a range of ±1 to get it ready for exporting to a standard audio file format like .wav or .flac.

In other words, assuming that you have an audio signal called “Signal” that has a range of ±1 and consists of floating point values:

ScaleUp = 2^(Bitdepth-1)-2
ScaleDown = 2^(Bitdepth-1)

TpdfDither = rand(LengthOfSignal) - rand(LengthOfSignal) 

QuantisedDitheredSignal = round(Signal * ScaleUp + TpdfDither) / ScaleDown;

earfluff and eyecandy

mostly audio, but with some other stuff occasionally

Category: Analysis