Vectorize flatten #56

lwpiotr · 2020-07-09T08:09:18Z

Hello,

As far as I understand, currently, wotan can flatten only a single curve. However, there are cases (like mine), where there are multiple curves with the same time axis. In such a case, it would be nice if wotan could accept a 2D array of values instead of a single vector. Of course, one can loop wotan over all the curves, but especially in Python, this is very inefficient.

hippke · 2020-07-09T18:31:04Z

Hi, thanks for this idea. I have personally iterated over all of the Kepler light curves and had no performance issues. Can you describe the use case in more detail?

lwpiotr · 2020-07-10T02:13:12Z

It is quite a different case. A camera with only ~2000 pixels, based on photomultipliers, and a huge field of view. The detector is segmented and in addition, the characteristics of individual pixels varies in time. Therefore standard flat fielding + treating the detector as a whole is not the best choice. The pixels have a very large field of view, which with small PSF makes it reasonable to treat each pixel separately - flatten it separately. And here we come to looping over >2000 pixels, many times.

hippke · 2020-07-12T10:06:50Z

I see two requirements here:

Vectorize to get rid of the loop (make the code prettier)
Vectorize to gain speed.

I can add a loop "inside" wotan (in it's Python part) which would solve (1) but not (2). Most of wotan's algorithms are written in numba or C and quite fast in itself - no gain from vectorization. One could gain a factor of a few from parallelization (n threads), one for each of the light curves.

In most use cases, the Python overhead (compared to the actual calculations) should be small.

Which of wotan's algorithms do you use? Can you supply a piece of code for benchmarking? I'd like to understand if there is a way to speed it up, and if so, how.

lwpiotr · 2020-08-06T05:49:41Z

Sorry for the late reply. I use flatten(). I must say that the looping overhead is much smaller than what I initially benchmarked a few months ago, but still non-negligible. I attach the file to benchmark (strange extension because .py is not accepted by github).
wotan_test.py.txt

I will use this opportunity to make some other remarks:

The lightcurve returned by flatten() is in different units than the original. To get something in similar units, I need to get the trend and subtract from original data. I guess you have original_data-trend somewhere inside your code, so maybe you could add an option to return it instead of what is now returned as the lightcurve?
If original data consist of many 0, they are converted to NaN in trend/lc. Adding, for example, 0.001 to the original data helps, but I guess this may be a bug.

And, finally, not very constructive but... I stumbled upon wotan when looking for a replacement for a Cern ROOT TSpectrum class and its Background() function (https://root.cern.ch/root/html532/TSpectrum.html). I think wotan biweight in flatten() results in what I need more than the Background() of TSpectrum, however, the Background() is perhaps ~50% faster than flatten(), while it may be performing more computations. Initially, I thought that I get the time gain with Bakcground() because I am calling it while looping over pixels in C part of the code, but probably that is not the case.

hippke · 2020-08-06T06:58:50Z

If you're working with XY+brightness pixel data, then wotan is not the right tool. It's build for time series data (time, flux, errors). Try this function instead.
Regarding your points:

Wotan does not use any units, thus they can not be different. The size of the input and output arrays should however be identical. If it is not, please open a bug issue.
Wotan assumes the light curve to be centered around "1" as the nominal flux, as is common in astrophysics. Values of "0" would indicate "black" (= no photons). If wotan returns NaNs, that sounds like a bug (should be zero).

lwpiotr · 2020-08-06T07:33:55Z

I am working with time series data, separate for each pixel. I need to do what Wotan does - flatten a curve for each pixel. That's what is happening in the example that I gave you. The only difference from the standard astronomical cases are the flux units (single photons, can be 0), timescale (either integer ordinal numbers or microseconds) and the number of curves (separate for each pixel, as each pixel is an independent detector).

I think the issue may come from the expectation of lightcurves being centered around 1. This is not the case here.
I get NaNs for zeros. I will open a bug issue.

hippke · 2020-08-06T14:01:24Z

You could shift your light curves so that they center around one?

lwpiotr · 2020-08-06T14:04:35Z

I could, this is not a problem. However, it is easier to just get the trend and subtract from the data. I just thought that perhaps the original lightcurve - trend is used somewhere in the calculations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize flatten #56

Vectorize flatten #56

lwpiotr commented Jul 9, 2020

hippke commented Jul 9, 2020

lwpiotr commented Jul 10, 2020

hippke commented Jul 12, 2020

lwpiotr commented Aug 6, 2020

hippke commented Aug 6, 2020

lwpiotr commented Aug 6, 2020

hippke commented Aug 6, 2020

lwpiotr commented Aug 6, 2020

Vectorize flatten #56

Vectorize flatten #56

Comments

lwpiotr commented Jul 9, 2020

hippke commented Jul 9, 2020

lwpiotr commented Jul 10, 2020

hippke commented Jul 12, 2020

lwpiotr commented Aug 6, 2020

hippke commented Aug 6, 2020

lwpiotr commented Aug 6, 2020

hippke commented Aug 6, 2020

lwpiotr commented Aug 6, 2020