You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The output from BaseIntervalEvaluator is 2d to be able to output univariate costs or scores per input data column. Such multivariate output are currently summed over column, and this occurs within each detector. However, many multivariate changepoint and anomaly detection methods differ in the way they aggregate the information across univariate components. This aggregation should be handled in a unified way, such that aggregators can be reused and customised easily.
Need to decide:
Where should aggregation occur? Is it a component of interval evaluators or detectors?
Aggregation design: Is it a class? Is it a function? Does the function take one row of costs, or a matrix of several cost evaluations?
Requirements:
Ease of customisation/extension/flexibility.
Performance. The aggregation operation can easily become a bottleneck in computations for high-dimensional data.
Option 1
Use np.apply_along_axis, and let the user pass any function that is passed further to np.apply_along_axis.
Pros:
Simple and flexible.
Cons:
Slow. It forces the user to use np.apply_along_axis.
Option 2
Allow custom aggregation functions. Any function that takes in a 2d array and returns a 1d array with the same size as the number of rows of the input.
Pros:
Flexible
Speed: Allows aggregation functions for entire cost/score matrices to be written in numba.
Doesn't need to implement aggregation functions in skchange.
Cons:
Maybe too flexible? How to validate the input function?
Option 3
Option 2, but introduce an aggregation class that handles aggregator validation.
Pros:
Same as Option2.
Simpler to handle input validation.
Cons:
Yet another class that needs to be learned for the user.
Need to implement a range of common aggregators as classes in skchange.
The text was updated successfully, but these errors were encountered:
Tveten
changed the title
Enable custom cost/score aggregators in a unified way across detectors
[ENH] Enable custom cost/score aggregators in a unified way across detectors
Dec 4, 2024
The output from
BaseIntervalEvaluator
is 2d to be able to output univariate costs or scores per input data column. Such multivariate output are currently summed over column, and this occurs within each detector. However, many multivariate changepoint and anomaly detection methods differ in the way they aggregate the information across univariate components. This aggregation should be handled in a unified way, such that aggregators can be reused and customised easily.Need to decide:
Requirements:
Option 1
Use
np.apply_along_axis
, and let the user pass any function that is passed further to np.apply_along_axis.Pros:
Cons:
Option 2
Allow custom aggregation functions. Any function that takes in a 2d array and returns a 1d array with the same size as the number of rows of the input.
Pros:
skchange
.Cons:
Option 3
Option 2, but introduce an aggregation class that handles aggregator validation.
Pros:
Cons:
skchange
.The text was updated successfully, but these errors were encountered: