Skip to content

Automatic Stattest Selection

Pre-release
Pre-release
Compare
Choose a tag to compare
@emeli-dral emeli-dral released this 19 May 08:33

Release scope:

  1. Stat test auto selection algorithm update: https://docs.evidentlyai.com/reports/data-drift#how-it-works

For small data with <= 1000 observations in the reference dataset:

  • For numerical features (n_unique > 5): two-sample Kolmogorov-Smirnov test.
  • For categorical features or numerical features with n_unique <= 5: chi-squared test.
  • For binary categorical features (n_unique <= 2), we use the proportion difference test for independent samples based on Z-score.
    All tests use a 0.95 confidence level by default.

For larger data with > 1000 observations in the reference dataset:

  1. Added options for setting custom statistical test for Categorical and Numerical Target Drift Dashboard/Profile:
    cat_target_stattest_func: Defines a custom statistical test to detect target drift in CatTargetDrift.
    num_target_stattest_func: Defines a custom statistical test to detect target drift in NumTargetDrift.

  2. Added options for setting custom threshold for drift detection for Categorical and Numerical Target Drift Dashboard/Profile:
    cat_target_threshold: Optional[float] = None
    num_target_threshold: Optional[float] = None
    These thresholds highly depends on selected stattest, generally it is either threshold for p_value or threshold for a distance.

Fixes:
#207