Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize flatten #56

Open
lwpiotr opened this issue Jul 9, 2020 · 8 comments
Open

Vectorize flatten #56

lwpiotr opened this issue Jul 9, 2020 · 8 comments

Comments

@lwpiotr
Copy link

lwpiotr commented Jul 9, 2020

Hello,

As far as I understand, currently, wotan can flatten only a single curve. However, there are cases (like mine), where there are multiple curves with the same time axis. In such a case, it would be nice if wotan could accept a 2D array of values instead of a single vector. Of course, one can loop wotan over all the curves, but especially in Python, this is very inefficient.

@hippke
Copy link
Owner

hippke commented Jul 9, 2020

Hi, thanks for this idea. I have personally iterated over all of the Kepler light curves and had no performance issues. Can you describe the use case in more detail?

@lwpiotr
Copy link
Author

lwpiotr commented Jul 10, 2020

It is quite a different case. A camera with only ~2000 pixels, based on photomultipliers, and a huge field of view. The detector is segmented and in addition, the characteristics of individual pixels varies in time. Therefore standard flat fielding + treating the detector as a whole is not the best choice. The pixels have a very large field of view, which with small PSF makes it reasonable to treat each pixel separately - flatten it separately. And here we come to looping over >2000 pixels, many times.

@hippke
Copy link
Owner

hippke commented Jul 12, 2020

I see two requirements here:

  • Vectorize to get rid of the loop (make the code prettier)
  • Vectorize to gain speed.

I can add a loop "inside" wotan (in it's Python part) which would solve (1) but not (2). Most of wotan's algorithms are written in numba or C and quite fast in itself - no gain from vectorization. One could gain a factor of a few from parallelization (n threads), one for each of the light curves.

In most use cases, the Python overhead (compared to the actual calculations) should be small.

Which of wotan's algorithms do you use? Can you supply a piece of code for benchmarking? I'd like to understand if there is a way to speed it up, and if so, how.

@lwpiotr
Copy link
Author

lwpiotr commented Aug 6, 2020

Sorry for the late reply. I use flatten(). I must say that the looping overhead is much smaller than what I initially benchmarked a few months ago, but still non-negligible. I attach the file to benchmark (strange extension because .py is not accepted by github).
wotan_test.py.txt

I will use this opportunity to make some other remarks:

  1. The lightcurve returned by flatten() is in different units than the original. To get something in similar units, I need to get the trend and subtract from original data. I guess you have original_data-trend somewhere inside your code, so maybe you could add an option to return it instead of what is now returned as the lightcurve?
  2. If original data consist of many 0, they are converted to NaN in trend/lc. Adding, for example, 0.001 to the original data helps, but I guess this may be a bug.

And, finally, not very constructive but... I stumbled upon wotan when looking for a replacement for a Cern ROOT TSpectrum class and its Background() function (https://root.cern.ch/root/html532/TSpectrum.html). I think wotan biweight in flatten() results in what I need more than the Background() of TSpectrum, however, the Background() is perhaps ~50% faster than flatten(), while it may be performing more computations. Initially, I thought that I get the time gain with Bakcground() because I am calling it while looping over pixels in C part of the code, but probably that is not the case.

@hippke
Copy link
Owner

hippke commented Aug 6, 2020

If you're working with XY+brightness pixel data, then wotan is not the right tool. It's build for time series data (time, flux, errors). Try this function instead.
Regarding your points:

  1. Wotan does not use any units, thus they can not be different. The size of the input and output arrays should however be identical. If it is not, please open a bug issue.
  2. Wotan assumes the light curve to be centered around "1" as the nominal flux, as is common in astrophysics. Values of "0" would indicate "black" (= no photons). If wotan returns NaNs, that sounds like a bug (should be zero).

@lwpiotr
Copy link
Author

lwpiotr commented Aug 6, 2020

I am working with time series data, separate for each pixel. I need to do what Wotan does - flatten a curve for each pixel. That's what is happening in the example that I gave you. The only difference from the standard astronomical cases are the flux units (single photons, can be 0), timescale (either integer ordinal numbers or microseconds) and the number of curves (separate for each pixel, as each pixel is an independent detector).

  1. I think the issue may come from the expectation of lightcurves being centered around 1. This is not the case here.
  2. I get NaNs for zeros. I will open a bug issue.

@hippke
Copy link
Owner

hippke commented Aug 6, 2020

You could shift your light curves so that they center around one?

@lwpiotr
Copy link
Author

lwpiotr commented Aug 6, 2020

I could, this is not a problem. However, it is easier to just get the trend and subtract from the data. I just thought that perhaps the original lightcurve - trend is used somewhere in the calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants