Anomaly score changes with length of the same input signal #491

dxiaosa · 2023-12-13T22:28:52Z

Python version: 3.8.13
Operating System: windows 10

Description

When predicting anomaly score on a dataset using score_anomalies() function after training a TadGAN model. I find that different lengths of the series will be lead to different results. For example, if the signal X has 200 data points, the score of position 100th will be different when passing X[:] and X[500:150]. The sliding window size is set to 50 in fact.

Do you have any idea about what will cause this problem possibly. I have checked the issue #288. But they could be very different.

Thanks.

The text was updated successfully, but these errors were encountered:

sarahmish · 2023-12-18T03:19:58Z

Hi @dxiaos, thank you for raising this question!

I suspect that the "windowing" concept of score_anomalies is the reason. The default setting for the critic window and the error window is set to be 1% of the length of the series.

Orion/orion/primitives/tadgan.py

Lines 460 to 461 in a00440c

 critic_smooth_window = critic_smooth_window or math.trunc(y.shape[0] * 0.01) 

 error_smooth_window = error_smooth_window or math.trunc(y.shape[0] * 0.01)

you can specify the exact window you wish to use, e.g. critic_smooth_window=50.

To test whether the exact values would match up, I recommend isolating the function score_anomalies, and test it with a fixed input. For example, this is a quick sketch of how to do this:

import numpy as np
from orion.primitives.tadgan import score_anomalies

X = np.random.random((2000, 50, 1))

critic_smooth_window = 50
error_smooth_window = 50

long_errors, _, _, _ = score_anomalies(X, ..,
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window
)

short_errors, _, _, _ = score_anomalies(X[50:150], ..,
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window
)

dxiaosa · 2023-12-18T19:38:09Z

@sarahmish Thank you for you suggestion above! I have tried set the parameters for the score_anomalies(), but the calculated results still have some differences.

I have attached the implemented sample codes here:

import numpy as np
from orion.primitives.tadgan import score_anomalies

seed_value = 1
np.random.seed(seed_value)

X = np.random.random((2000, 50, 1))
y_hat = np.random.random((2000, 50, 1))
critic = np.random.random((2000, 50, 1))
X_index = np.arange(2000)
comb = 'mult'

critic_smooth_window = 50
error_smooth_window = 50

long_errors, _, _, _ = score_anomalies(
    X, 
    y_hat, 
    critic, 
    X_index, 
    comb=comb,    
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window
)

short_errors, _, _, _ = score_anomalies(
    X[50:150], 
    y_hat[50:150], 
    critic[50:150], 
    X_index[50:150], 
    comb=comb,    
    critic_smooth_window=critic_smooth_window, 
    error_smooth_window=error_smooth_window)

you can try this on your environment and check the results.

sarahmish · 2024-01-15T10:25:20Z

Hi @dxiaosa! Apologies for taking long to reply.

I revisited the code source for score anomalies, and you are correct, the scores will not match. The function normalizes the final result at the end in mult mode which will make them not match.

Orion/orion/primitives/tadgan.py

Line 505 in 4da126f

rec_scores = stats.zscore(rec_scores)

However, the general shape (peaks and valleys) should follow the same trajectory.

I made a colab notebook here to help clarify the results.
Moreover, if the mode is set to rec score only, I believe you will get the same output and you can test it out!

dxiaosa · 2024-01-22T01:49:36Z

Hello, @sarahmish , Thank you for your patient check!

Yeah, I had tested out your supplied colab notebook and found that if I change the comb='rec' with result plotting, there is still a little bit differences between the long and short sequence, which is shown as follows.

And the 'stats.zscore()' only does the normalizing or scaling thing, if the results still have differences. I think maybe there is something with the reconstruction scores computing?

Orion/orion/primitives/tadgan.py

Line 502 in 4da126f

rec_scores, predictions = reconstruction_errors(

Thanks.

sarahmish · 2024-01-24T15:13:08Z

That is slightly odd @dxiaosa, I imagine it has to do with whether or not the edges of the window are inclusive in the calculation. I'll investigate this a bit further and get back to you.

Thank you for your patience!

sarahmish · 2024-03-11T19:40:29Z

Hi @dxiaosa, apologies for the delay!

I updated the notebook to fix the issues we were observing.

first, the main difference between the two runs is that score_anomalies does in fact rescale the data, so you will observe the same shape but on different scales.
second, I added reconstruction_errors to see how in that function, we get the same output.

If you have further questions, please let me know!

sarahmish added the question Further information is requested label Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anomaly score changes with length of the same input signal #491

Anomaly score changes with length of the same input signal #491

dxiaosa commented Dec 13, 2023 •

edited by sarahmish

sarahmish commented Dec 18, 2023 •

edited

dxiaosa commented Dec 18, 2023 •

edited

sarahmish commented Jan 15, 2024

dxiaosa commented Jan 22, 2024

sarahmish commented Jan 24, 2024

sarahmish commented Mar 11, 2024

Anomaly score changes with length of the same input signal #491

Anomaly score changes with length of the same input signal #491

Comments

dxiaosa commented Dec 13, 2023 • edited by sarahmish

Description

sarahmish commented Dec 18, 2023 • edited

dxiaosa commented Dec 18, 2023 • edited

sarahmish commented Jan 15, 2024

dxiaosa commented Jan 22, 2024

sarahmish commented Jan 24, 2024

sarahmish commented Mar 11, 2024

dxiaosa commented Dec 13, 2023 •

edited by sarahmish

sarahmish commented Dec 18, 2023 •

edited

dxiaosa commented Dec 18, 2023 •

edited