Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Enable custom cost/score aggregators in a unified way across detectors #33

Open
Tveten opened this issue Nov 22, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@Tveten
Copy link
Collaborator

Tveten commented Nov 22, 2024

The output from BaseIntervalEvaluator is 2d to be able to output univariate costs or scores per input data column. Such multivariate output are currently summed over column, and this occurs within each detector. However, many multivariate changepoint and anomaly detection methods differ in the way they aggregate the information across univariate components. This aggregation should be handled in a unified way, such that aggregators can be reused and customised easily.

Need to decide:

  • Where should aggregation occur? Is it a component of interval evaluators or detectors?
  • Aggregation design: Is it a class? Is it a function? Does the function take one row of costs, or a matrix of several cost evaluations?

Requirements:

  • Ease of customisation/extension/flexibility.
  • Performance. The aggregation operation can easily become a bottleneck in computations for high-dimensional data.

Option 1

Use np.apply_along_axis, and let the user pass any function that is passed further to np.apply_along_axis.

Pros:

  • Simple and flexible.

Cons:

  • Slow. It forces the user to use np.apply_along_axis.

Option 2

Allow custom aggregation functions. Any function that takes in a 2d array and returns a 1d array with the same size as the number of rows of the input.

Pros:

  • Flexible
  • Speed: Allows aggregation functions for entire cost/score matrices to be written in numba.
  • Doesn't need to implement aggregation functions in skchange.

Cons:

  • Maybe too flexible? How to validate the input function?

Option 3

Option 2, but introduce an aggregation class that handles aggregator validation.

Pros:

  • Same as Option2.
  • Simpler to handle input validation.

Cons:

  • Yet another class that needs to be learned for the user.
  • Need to implement a range of common aggregators as classes in skchange.
@Tveten Tveten changed the title Enable custom cost/score aggregators in a unified way across detectors [ENH] Enable custom cost/score aggregators in a unified way across detectors Dec 4, 2024
@Tveten Tveten self-assigned this Dec 4, 2024
@Tveten Tveten added the enhancement New feature or request label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant