Automatic Stattest Selection
Pre-releaseRelease scope:
- Stat test auto selection algorithm update: https://docs.evidentlyai.com/reports/data-drift#how-it-works
For small data with <= 1000 observations in the reference dataset:
- For numerical features (n_unique > 5): two-sample Kolmogorov-Smirnov test.
- For categorical features or numerical features with n_unique <= 5: chi-squared test.
- For binary categorical features (n_unique <= 2), we use the proportion difference test for independent samples based on Z-score.
All tests use a 0.95 confidence level by default.
For larger data with > 1000 observations in the reference dataset:
- For numerical features (n_unique > 5): Wasserstein Distance.
- For categorical features or numerical with n_unique <= 5): Jensen–Shannon divergence.
All tests use a threshold = 0.1 by default.
-
Added options for setting custom statistical test for Categorical and Numerical Target Drift Dashboard/Profile:
cat_target_stattest_func: Defines a custom statistical test to detect target drift in CatTargetDrift.
num_target_stattest_func: Defines a custom statistical test to detect target drift in NumTargetDrift.
-
Added options for setting custom threshold for drift detection for Categorical and Numerical Target Drift Dashboard/Profile:
cat_target_threshold: Optional[float] = None
num_target_threshold: Optional[float] = None
These thresholds highly depends on selected stattest, generally it is either threshold for p_value or threshold for a distance.
Fixes:
#207