Why doesn't aliasing of broad band noise 'pile up' in the sample band?

Question

I recently built a simulation to study sampling, the effects of aliasing and the effects of anti-aliasing filters on the sampled signal.

For fundamental frequencies above the sample band it's obvious one sees 'imposters' in the sampled signal. Using an antialiasing filter I can eliminate imposters.

But if I rather impose a broadband noise (actually white noise) signal into the sampler then it doesn't make much difference whether the anti-aliasing filter is present or not. The peak to peak noise is the same in either case. Of course the bandwidth of the noise has changed.

But furthermore I would expect the (imposter) aliased broadband noise outside the sample band to be superimposed on the broadband noise that is genuinely passed in the sample band thus 'piling up' with a larger peak to peak level.

Why doesn't this happen?

I should mention that my simulation time step is in the MHz and my system under study in the 1 kHz range. So the system is virtually in a continuous world.

This is a fantastic question that I have always wondered about myself... — Matt Young, Feb 19 '15 at 21:39
If you measure the noise amplitude on a scope, what amplitude do you see (a) before and (b) after the AA filter? — , Feb 19 '15 at 21:39
@BrianDrummond That experiment doesn't necessarily address the point of my question. Even a digital scope greatly over-samples and has its own anti-aliasing filters built in. So virtually the scope is 'continuous' and the effects of sampling aren't addressed. — docscience, Feb 19 '15 at 21:52
Why do you say the AA filter does not make a difference? I find it easiest to think of the peak to peak output of the sampler but it also works for RMS. If you input broadband noise of 1MHz BW and 1V pk-pk directly into your 2KHz sampler the output of the sampler will be 1v pk-pk. If you now add the AA filter (brick wall 1KHz BW) and feed that into the sampler the input voltage will be ~30mV pk-pk (30dB att) and the sampler output will now be 30mv p-p still with 500Hz BW. The noise above Nyquist has been aliased into the output band. Kevin — Kevin White, Feb 20 '15 at 02:19

Matt L. · Accepted Answer · 2015-02-21T21:21:04.417

You are correct: after sampling, the aliased noise components do pile up in the frequency band below the Nyquist frequency. The question is just what exactly it is that piles up, and what is its consequence.

In the following I assume that we deal with random noise modeled as a wide-sense stationary (WSS) random process, i.e. a random process for which we can define a power spectrum. If \$N(t)\$ is the noise process and \$R_k= N(kT)\$ is the sampled noise process (with sample period \$T\$), then the power spectrum of \$R_k\$ is an aliased version of the power spectrum of \$N(t)\$:

$$S_R(f)=f_s\sum_{k=-\infty}^{\infty}S_N(f-kf_s)\tag{1}$$

where \$f_s=1/T\$ is the sampling frequency. Of course, if \$N(t)\$ is band-limited (which is always the case) then only a finite number of shifted power spectra of \$N(t)\$ add up in the band of interest \$[0,f_s/2]\$.

The noise power is given by the integral of the respective power spectrum. In the case of \$N(t)\$ we have to integrate over the whole bandwidth of \$N(t)\$, whereas in the case of the sampled noise \$R_k\$ we have to integrate in the band \$[0,f_s/2]\$. From (1) it becomes clear that in both cases we obtain the same power because either we integrate the original power spectrum \$S_N(f)\$, or we integrate an aliased (i.e., piled up) version in the band \$[0,f_s/2]\$.

Consequently, the noise power does not change after sampling, regardless of the sampling frequency. The sampled noise has the same power as the original continuous-time noise.

So the power of the sampled noise only changes if you change the power of the continuous-time noise, and this can be done by the anti-aliasing filter, because the filter reduces noise band-width and, consequently, noise power. Note that only looking at the peak-to-peak value doesn't say much, because you need to consider the power.

Reference:

E.A. Lee, D.G. Messerschmitt: Digital Communication, 2nd ed., section 3.2.5 (pp. 64)

Dave Tweed · Answer 2 · 2015-02-21T12:32:21.297

3

The energy represented by the sampled signal is related only to the PDF (probability density function) of the input signal and the sample frequency. The actual bandwidth of the input signal does not affect this.

In other words, when you undersample a wide-bandwidth signal, you get a set of samples that have the same PDF as the original wideband signal, but those samples only have an effective bandwidth of Fs/2. The "excess" energy outside that bandwidth was simply never captured by the sampling process.

If you double the sample rate, you will "capture" twice as much energy.

edited Feb 21 '15 at 12:32

answered Feb 19 '15 at 22:01

Dave Tweed

168,369
17
228
393

Are you saying that for a given input noise power, increasing the sampling rate increases the noise power of the sampled noise? – Matt L. Feb 20 '15 at 09:04
Yes, as long as the noise bandwidth is still greater than or equal to the new sampling bandwidth. – Dave Tweed Feb 20 '15 at 12:41
2

That's not the case. If you model the noise as a (wide-sense) stationary random process, then the sampled noise has the same power as the original continuous-time noise process, regardless of the sampling rate. – Matt L. Feb 20 '15 at 12:58
@MattL.: On what do you base that assertion? Perhaps you should explain in greater detail in a separate answer. – Dave Tweed Feb 20 '15 at 13:20
OK, I'll write up an answer as soon as I have more time; might take till tomorrow though. – Matt L. Feb 20 '15 at 13:57
So then given you wind up with the same PDF, that explains why the peak to peak signal is the same for a uniform distribution, right? It's interesting that monotones don't follow this rule. The peak to peak can actually be reduced to near zero depending on the frequency relative to Fs. – docscience Feb 20 '15 at 17:44

Lewis Kelsey · Answer 3 · 2021-11-23T18:27:20.310

As you increase the double-sided spectral width of bandlimited continuous real white noise while keeping the PSD constant, as the variance, R_XX(τ=0) = σ², is the bandwidth of the noise multiplied by the PSD of the noise, the variance increases by a rate of 2B, where B is the baseband bandwidth. The autocorrelation function is a sinc of height σ²= N₀B/2. The PSD is of magnitude N₀/4 = σ²/2B. As the bandwidth goes to infinity you get an infinite variance. Alternatively if you fix the variance and produce a true continuous process with that fixed variance you end up with an infinite spectral width and a PSD of σ²T_s which is 0 as T_s → 0 such that the area under the PSD is still the power (the fixed variance). For this reason a true continuous white noise process must have an infinite variance so that the PSD is non zero. The reality in a real world situation is that the PSD of the thermal noise of a resistor is fixed at a flat K_BT, (Boltzmann constant multiplied by noise temperature), but tapers to 0 at higher frequencies, which is why the thermal noise of a resistor has a finite power. When this is sampled at the nyquist rate and reconstructed, the PSD is K_BT. If it is sampled at double the nyquist rate, the thermal noise remains within the same bandwidth and the PSD doesn't change, unlike quantisation noise, this is because the continuous thermal noise is not a true continuous process in that oversampling the thermal noise does not introduce more random variables and is instead just samples of a series of continuous linear functionals based on a fixed number of random variables, but the extra samples of quantisation noise are considered to be individual random variables and therefore the noise does spread equally over the sampling frequency bandwidth. The bandwidth of noise depends on the spacing of the random variables. A continuous process has infinite bandwidth, but the PSD also tends to 0 which is why true white noise variance must be increased and tend to infinity.

The PSD = K_BT doesn't depend on the length of the noise in time either. Because noise is a stochastic signal, the discrete PSD of sampled noise is always the variance, regardless of how many samples or how long the noise is, and therefore the reconstructed noise PSD is always going to be σ²T_s regardless of length in time, so it is always that PSD, i.e. N₀/4. Note the difference between N₀, N₀/2 and N₀/4. Take the periodogram of the deterministic realisation (random walk) of the noise process that happens to have the expected value of noise energy but not the expected distribution of the energy over frequency. If you have a random energy that needs to be randomly distributed over discrete frequencies then the expected value is going to be E divided by the number of frequencies. One such example is all time samples equal to the standard deviation of the noise forming a discrete rect window. The amplitude at 0 rad frequency on the periodogram is σ²N. The other samples are 0. It has the expected energy of σ²N but the expected energy at each frequency is σ²N/N = σ². This does not change regardless of how many samples there are i.e. regardless of the duration of the noise in time.

When you sample white noise at the nyquist rate of the noise, the autocorrelation function is sampled and R_XX(0) = σ²δ(t). The PSD = σ² = N₀B/2. As mentioned, the continuous white noise process power spectrum does not change at all when it is truncated in time, but when the frequency spectrum bandwidth is increased then the variance (power in the time domain) increases at a rate of 2B and the PSD does not change. Infinite bandwidth finite-PSD white noise has infinite variance, but when it is sampled, the discrete PSD becomes the variance, which is infinite, which is explained by the overlapping spectral densities of the aliased images due to sampling at less than the nyquist rate. So long as you sample at at least the nyquist rate of the bandlimited noise, then there is no overlap. If you sample at half the nyquist rate then then noise is of course double, and uncorrelated (because the discrete autocorrelation function is still 0 at all samples), but of you sample at a non integer fraction of the nyquist rate i.e. 1/(2.5) then you end up with a square wave shape noise in the frequency domain because of the way the images are overlapping, which of course reflects that the noise is correlated on the discrete autocorrelation function at lags other than 0 in the time domain.

In the sampled bandlimited (to 2B) white noise case, the PSD is σ² but when reconstructed using T_s width pulses, the PSD of the new continuous white noise process becomes σ²T_s which is σ²/2B. The variance remains the same but the PSD is now PSD = σ²T_s = N₀/4 instead of PSD = σ² = N₀B/2 (at least I think that's what this definition is). This continuous white noise process is one formed by a discrete number of linear functionals, which are based on the value of the discrete samples, so is effectively still a discrete process and not a true continuous process.

Note that the PSD is the expected value of the periodogram. The periodogram is a single record / walk / realisation of the random process. The single realisation is a deterministic signal and is what's often shown when you see the 'PSD of white noise' and it's not perfectly smooth – it's showing the PSD of the single realisation of the white noise. If you sample continuous white noise at less than the nyquist rate, then there will be overlap for every realisation, with an expected value of 2σ² in the location of the overlap if just 2 images are overlapping. All of the power in the original signal (divided by the sampling period T_s) is always present in the sampling bandwidth, so SNR doesn't change.

SNR is defined as (RMS_S)² / (RMS_N)² for a deterministic signal and E[S(t)²] / E[N(t)²] for a WSS stochastic signal, where E is the ensemble average (a further time average is irrelevant because the mean and variance are the same for all variables in the process). Since E[S(t)²] = (μ_S)²+(σ_S)², the definition is ((μ_S)²+(σ_S)²) / ((μ_N)²+(σ_N)²), where (σ_S)² is the variance of all variables in the random process. This is simplified to (σ_S)² / (σ_N)² if the noise and the signal have zero ensemble mean. It is simplified to (μ_S)² / (σ_N)² if the signal has no ensemble variance i.e. it is a constant. Notation wise, σ and μ should be used to refer to the ensemble mean and variance, and not the RMS (temporal standard deviation) and temporal mean of a deterministic signal (I've also seen μ being used for RMS and μ² for MS). Since the signal is usually deterministic and the noise random, the notation should be (RMS_S)²/ E[N(t)²]. The notation E[X(t)] is interpreted to be ensemble mean and therefore equal to X(t) if X(t) is a deterministic signal, as opposed to the temporal mean of the signal, which should be represented using a bar but I think I've see expectation being used for that.

Why doesn't aliasing of broad band noise 'pile up' in the sample band?

3 Answers3