Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly score changes with length of the same input signal #491

Open
dxiaosa opened this issue Dec 13, 2023 · 6 comments
Open

Anomaly score changes with length of the same input signal #491

dxiaosa opened this issue Dec 13, 2023 · 6 comments
Labels
question Further information is requested

Comments

@dxiaosa
Copy link

dxiaosa commented Dec 13, 2023

  • Python version: 3.8.13
  • Operating System: windows 10

Description

When predicting anomaly score on a dataset using score_anomalies() function after training a TadGAN model. I find that different lengths of the series will be lead to different results. For example, if the signal X has 200 data points, the score of position 100th will be different when passing X[:] and X[500:150]. The sliding window size is set to 50 in fact.

Do you have any idea about what will cause this problem possibly. I have checked the issue #288. But they could be very different.

Thanks.

@sarahmish
Copy link
Collaborator

sarahmish commented Dec 18, 2023

Hi @dxiaos, thank you for raising this question!

I suspect that the "windowing" concept of score_anomalies is the reason. The default setting for the critic window and the error window is set to be 1% of the length of the series.

critic_smooth_window = critic_smooth_window or math.trunc(y.shape[0] * 0.01)
error_smooth_window = error_smooth_window or math.trunc(y.shape[0] * 0.01)

you can specify the exact window you wish to use, e.g. critic_smooth_window=50.

To test whether the exact values would match up, I recommend isolating the function score_anomalies, and test it with a fixed input. For example, this is a quick sketch of how to do this:

import numpy as np
from orion.primitives.tadgan import score_anomalies

X = np.random.random((2000, 50, 1))

critic_smooth_window = 50
error_smooth_window = 50

long_errors, _, _, _ = score_anomalies(X, ..,
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window
)

short_errors, _, _, _ = score_anomalies(X[50:150], ..,
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window
)

@sarahmish sarahmish added the question Further information is requested label Dec 18, 2023
@dxiaosa
Copy link
Author

dxiaosa commented Dec 18, 2023

@sarahmish Thank you for you suggestion above! I have tried set the parameters for the score_anomalies(), but the calculated results still have some differences.

I have attached the implemented sample codes here:

import numpy as np
from orion.primitives.tadgan import score_anomalies

seed_value = 1
np.random.seed(seed_value)

X = np.random.random((2000, 50, 1))
y_hat = np.random.random((2000, 50, 1))
critic = np.random.random((2000, 50, 1))
X_index = np.arange(2000)
comb = 'mult'

critic_smooth_window = 50
error_smooth_window = 50

long_errors, _, _, _ = score_anomalies(
    X, 
    y_hat, 
    critic, 
    X_index, 
    comb=comb,    
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window
)

short_errors, _, _, _ = score_anomalies(
    X[50:150], 
    y_hat[50:150], 
    critic[50:150], 
    X_index[50:150], 
    comb=comb,    
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window)

you can try this on your environment and check the results.

@sarahmish
Copy link
Collaborator

Hi @dxiaosa! Apologies for taking long to reply.

I revisited the code source for score anomalies, and you are correct, the scores will not match. The function normalizes the final result at the end in mult mode which will make them not match.

rec_scores = stats.zscore(rec_scores)

However, the general shape (peaks and valleys) should follow the same trajectory.

I made a colab notebook here to help clarify the results.
Moreover, if the mode is set to rec score only, I believe you will get the same output and you can test it out!

@dxiaosa
Copy link
Author

dxiaosa commented Jan 22, 2024

Hello, @sarahmish , Thank you for your patient check!

Yeah, I had tested out your supplied colab notebook and found that if I change the comb='rec' with result plotting, there is still a little bit differences between the long and short sequence, which is shown as follows.

image

And the 'stats.zscore()' only does the normalizing or scaling thing, if the results still have differences. I think maybe there is something with the reconstruction scores computing?

rec_scores, predictions = reconstruction_errors(

Thanks.

@sarahmish
Copy link
Collaborator

That is slightly odd @dxiaosa, I imagine it has to do with whether or not the edges of the window are inclusive in the calculation. I'll investigate this a bit further and get back to you.

Thank you for your patience!

@sarahmish
Copy link
Collaborator

Hi @dxiaosa, apologies for the delay!

I updated the notebook to fix the issues we were observing.

  1. first, the main difference between the two runs is that score_anomalies does in fact rescale the data, so you will observe the same shape but on different scales.
  2. second, I added reconstruction_errors to see how in that function, we get the same output.

If you have further questions, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants