Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change FFT normalization to exactly satisfy Parseval's Theorem #654

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

sjoerd-bouma
Copy link
Collaborator

As briefly mentioned in #653, the NuRadioMC convention for the real DFT (rfft) does not actually satisfy Parseval's theorem exactly. We multiply all frequency components by $\sqrt{2}$; however, the 0-frequency and (for even-length traces) the maximum frequency occur only once in the full frequency spectrum (including negative frequencies). For 'reasonable' numbers of samples and traces this effect is pretty small, but we may want to do this correctly.

This PR changes the FFT routines in NuRadioReco.utilities.fft to exactly satisfy Parseval's Theorem, i.e.

np.sum(trace**2 * dt), np.sum(np.abs(spectrum)**2 * df )

This will probably break the unit tests, and more importantly, we use frequency-domain parameterizations for e.g. our askaryan models, so if we decide to implement this some more work is needed.

MWE in case anyone wants to verify that Parseval's Theorem is not satisfied with the current normalization:

import numpy as np
from NuRadioReco.utilities import fft, units

np.random.seed(0)
n_samples = 2**11 + 1
trace = np.random.normal(0,10,n_samples)
vrms = np.std(trace) + np.mean(trace)
sampling_rate = 3.2
dt = 1 / sampling_rate
frequencies = np.fft.rfftfreq(n_samples, dt)
df = frequencies[1] - frequencies[0] # or analytically, df = sampling_rate / n_samples
spectrum = fft.time2freq(trace, sampling_rate)
P_time_domain = np.sum(trace**2 * dt)
P_frequency_domain = np.sum(np.abs(spectrum)**2 * df )
print(np.allclose(P_time_domain, P_frequency_domain), P_time_domain, P_frequency_domain)

# All frequencies components apart from the 0-frequency and (for even-length traces) the maximum frequency
# are modified with respect to the numpy convention in NuRadioMC. This is accounted for in the next line
n_is_even = (n is None) or ((n + 1) % 2)
spectrum_copy[len(spectrum)%2:len(spectrum) - n_is_even] /= np.sqrt(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by the len(spectrum) % 2. When the number of samples is even, i.e., len(spectrum) % 2 = 0 there is no 0-frequency bin?

Also please add spaces around %

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

@cg-laser
Copy link
Collaborator

I am quite surprised that the tests fail as the differences should be really small or non-existent for most cases. The zero frequency bin should be zero in most cases. And we should have almost always even trace lengths, especially when requesting Askaryan models. Did you investigate in more detail why the tests fail?

@cozzyd
Copy link

cozzyd commented Mar 19, 2024

So I'm confused by the sqrt(2) here. The reason you "need" it is because we rely on Hermitian symmetry (positive frequencies complex conjugate of negative frequencies when time domain is purely real), but is it really conceptually correct to scale the amplitudes at positive frequencies by sqrt(2)? That seems unexpected to me and I'm not sure it doesn't cause problems in other places (generally, if deriving something from the frequency domain, I would not expect it to be multiplied by sqrt(2), though perhaps everybody else understands this convention). In ANITA, we instead multiply the non-DC and non-nyquist terms by 2 when calculating a power spectrum (implicitly summing over the negative frequencies). I guess this distinction only matters when generating signals starting in the frequency domain?

@sjoerd-bouma
Copy link
Collaborator Author

I'm not sure why the tests fail, but I don't have time to look into it right now.
I'm semi-inclined to close this PR and instead live with satisfying Parseval's theorem approximately. As Cosmin hints, keeping track of an overall factor of sqrt(2) is hard enough without having to remember to exclude the 0 and Nyquist frequency for anything that is usually parameterized in the frequency domain (Askaryan models, noise, ...). Also, excluding these frequencies mean 'flat' frequency spectra (white noise, delta pulse) no longer look flat, because they are flat in the full frequency spectrum.

So maybe we should just better document our convention, including the fact that Parseval's theorem is only satisfied approximately, such that energy fluences should preferably be computed in the time domain. But I'm happy to listen to other opinions or suggestions.

@cg-laser
Copy link
Collaborator

Re Cosmin: Yes, it is most problematic for frequency domain parameterizations. (Especially because the old ZHS papers use a non-standard Fourier normalization with an extra factor of two ;-). So we went through extensive tests to make sure that everything agrees and have the rule to implement everything in the time domain (or rather all models must return a time domain signal) to avoid any ambiguities in the FFT normalization.
Internally, we then always use a consistent normalization.
The reason why we have the sqrt(2) is that we want to make it as easy as possible to get correct energy fluences, so that the user does not need to think of factors of 2.

Base automatically changed from documentation/fft to develop May 2, 2024 13:35
@fschlueter
Copy link
Collaborator

Can this PR be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants