Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New IndicatorChangesConfig: Continuous Linear Segments #53

Open
Datseris opened this issue Oct 15, 2023 · 7 comments
Open

New IndicatorChangesConfig: Continuous Linear Segments #53

Datseris opened this issue Oct 15, 2023 · 7 comments
Assignees
Labels
new IndicatorsChangesConfig New analysis pipeline for estimating transitions in timeseries

Comments

@Datseris
Copy link
Member

This issue is a continuation of the discussion originally in STOR-i/Changepoints.jl#36 .

There we had the quote:

From my point of view, is the very important part of "change analysis" the continuous piece-wise linear regression. See for info the following paper. The method presented in this paper is similar to the standard PELT method for linear segments estimation, but it is moreover forced to be continuous (without step changes between linear segments!!!). See the code on GitLab, too.

The paper is "A new and general approach to signal denoising and eye movement classification based on segmented linear regression". It appears to be a blend between standard PELT and estimating changes in slope in indicator timeseries which @JanJereczek is interested in.

Following the outline I gave in #52 , one could define yet another analysis pipeline type that does the "PELT-like-continuus-linear-segments".

@Datseris Datseris added the new IndicatorsChangesConfig New analysis pipeline for estimating transitions in timeseries label Oct 15, 2023
@Datseris Datseris changed the title New IndicatorChangesConfig: New IndicatorChangesConfig: Continuous Linear Segments Oct 15, 2023
@Datseris
Copy link
Member Author

Datseris commented Oct 16, 2023

Holger Kantz gave a presentation where something similar is done:

A bi-linear function is fitted to the data (i.e., one line, then change to a line of differnt slope. Continuous fnction but discontinuous derivative). This type of function is fitted to the data (with e.g., a simple LsqFit.jl) and the change point is where the two the linear segments change slope.

So this is an extremely basic form of the algorithms we are discussing in this issue.

@Datseris
Copy link
Member Author

@stelmo I was considering whther I could use LinearSegmentation.jl for this. in principle, if I could force LinearSegmentation.jl to ensure that all linear segments are continuous (end points and start points coincide sequentially), then this would solve my problem. Do you know if this is possible? So far all figures in the README are discontinuous.

@stelmo
Copy link

stelmo commented Oct 21, 2023

LinearSegmentation (LS) does give the overlap option, but that just enforces that neighboring segments share the same boundary points. The actual line of best fit does not need to be continuous, hence all the disjoint fits, even with overlap=true. It would be interesting to try to implement an option forcing a continuous fit. What time line are you talking about (I am going on holiday soon-ish for 3 weeks without internet access)? Also, I have found that LS is pretty sensitive to data noise and one needs to carefully tune the fitting thresholds to get nice fits, so that also needs a little work by me...

@Datseris
Copy link
Member Author

I don't have any timeline, this is a project without any deadlines! Therefore I'll wait for you because I don't know any internals of LS to add an option for enforcing continuity...

@stelmo
Copy link

stelmo commented Oct 23, 2023

Okay, I extended my sliding_window function to have a continuous option (on master). Now you can create fits like this:
image

However, the sliding window method is very suboptimal (but simple and fast), and the papers you linked to look like they have much better algorithms. I plan on reading them + perhaps implementing their algos eventually. I hope to take a stab at this before I leave (3 weeks), but if not then, definitely before the end of the year.

For reference, the code that generated the plot:

using LinearSegmentation, CairoMakie

N = 100
xs = collect(range(0, 3 * pi, length = N)) .+ 0.1 .* randn(N)
ys = sin.(xs) .+ 0.1 .* randn(N)

segments = sliding_window(
    xs, 
    ys;
    min_segment_length = 0.5,
    fit_threshold = 0.1,
    fit_function = :rmse,
    overlap = true,
    continuous = true, # only on master
)

f, a = scatter(xs, ys)
for seg in segments
    b0, b1 = LinearSegmentation.endpoints(xs[seg], ys[seg])
    lines!(a, xs[seg], b0 .+ b1 .* xs[seg], linewidth=5)
end
f

@Datseris
Copy link
Member Author

This is very interesting! Thank you so much for trying this out. In your code you have enforced a minimum segment length, but you haven't enforced a maximum number of segments to use. How does the algorithm optimize this choice?

In my application scenarios I have timeseries. So the x axis is equi-spaced and sequential. Hence, the "optimal" line segment fit would be one that fits length(x)-1 segments, by connecting each sequential point in the timeseries. Here "optimal" means with least RMSE to the data, as this fit would have exactly 0 RMSE.

@stelmo
Copy link

stelmo commented Oct 23, 2023

Ah, so the sliding window algorithm is very basic. There is no way to control the maximum number of returned segments, it merely makes a new segment when the error of a segment becomes "too large". In essence, it slides along the x-axis, appending data to a segment. When a user specified threshold is reached, it starts a new segment and keeps on sliding and adding data to this new segment. You can see the sub-optimality of this approach by looking at the attached figure at the trough: the yellow segment is a little wonky. The major advantage of this method is that it is fast (or can be made fast easily), as there is very little processing involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new IndicatorsChangesConfig New analysis pipeline for estimating transitions in timeseries
Projects
None yet
Development

No branches or pull requests

2 participants