Add Permutation Importance #202

mayer79 · 2025-05-10T10:04:15Z

Implements #201

mayer79 · 2025-05-16T11:49:02Z

This is the current basic call:

import numpy as np
import polars as pl
from sklearn.linear_model import LinearRegression

from model_diagnostics.xai import plot_permutation_importance

rng = np.random.default_rng(1)
n = 1000

X = pl.DataFrame(
    {
        "area": rng.uniform(30, 120, n),
        "rooms": rng.choice([2.5, 3.5, 4.5], n),
        "age": rng.uniform(0, 100, n),
    }
)

y = X["area"] + 20 * X["rooms"] + rng.normal(0, 1, n)

model = LinearRegression()
model.fit(X, y)

_ = plot_permutation_importance(
    predict_function=model.predict,
    X=X,
    y=y,
)

The extended feature API allows to permute groups like this:

_ = plot_permutation_importance(
    predict_function=model.predict,
    features={"size": ["area", "rooms"], "age": "age"},
    X=X,
    y=y,
)

…xcept parrow

lorentzenchr · 2025-05-19T18:50:02Z

src/model_diagnostics/xai/permutation_importance.py

+from model_diagnostics.scoring import SquaredError
+
+
+def safe_copy(X):


Might be good to put safe_copy and safe_column_names into _utils.array and add tests for them. I think they cause the current CI failure.

My local tests are failing for the Python 3.9 environment only (pandas and pyarrow). I will move the functions to _utils.array, draft some unit tests, and rename safe_column_names() to get_column_names().

lorentzenchr · 2025-05-19T18:50:32Z

This will be a great addition! Thanks @mayer79

lorentzenchr · 2025-05-23T20:41:26Z

The failing test is in the python 3.9 env with
numpy 1.22.0
polars 1.0.0
scipy 1.10.0
pandas 1.5.3
pyarrow 11.0.0

Could you check if increasing one of the versions fixes the problem, e.g. polars version?

mayer79 · 2025-05-24T13:08:18Z

The failing test is in the python 3.9 env with numpy 1.22.0 polars 1.0.0 scipy 1.10.0 pandas 1.5.3 pyarrow 11.0.0

Could you check if increasing one of the versions fixes the problem, e.g. polars version?

The following changes in the 3.9 env would be necessary. I don't know how much it would hurt to abandon pandas 1

pyarrow 11 -> 13
pandas 1.5 -> 2.0

I have added some additional unit tests and moved safe_copy() and get_column_names() to array.py.

Add compute_permutation_importance()

1c594f8

mayer79 self-assigned this May 10, 2025

mayer79 added the enhancement New feature or request label May 10, 2025

mayer79 marked this pull request as draft May 10, 2025 10:04

Replace ipynb by py

775a150

mayer79 changed the title ~~Add compute_permutation_importance()~~ Add Permutation Importance May 10, 2025

mayer79 added 4 commits May 10, 2025 12:24

Catch None values of n_repeats

0bf083e

doctest failure

f8485d0

add plot_permutation_importance()

0c7e6f6

Improve docstring

98e611a

mayer79 added 19 commits May 16, 2025 13:55

Linter

3f44810

remove base_score and n_repeats from output

7d3a4c7

docstring on features argument

7cdc7b7

calculate base score before stacking

63f7825

use scipy special to calculate t quantile

293ca1d

remove reset_index()

c312f1f

Fix doctest

0132dc5

Allow max_display=None

2877208

Add unit tests for plot

42705b9

Add error message for max_display

262540b

Remove wrong Optional typing

e5dcc6f

Replace boolean function argument

5215e53

Linter

ff21a64

Expand docstring of plot()

e5c65dd

simpler safe_select_column()

7f696eb

Replace safe_get_column() by get_second_dimension()

617308e

drop safe_index_rows_1d()

32b13d2

Clarify that np.split() works on all relevant prediction containers e…

8b5d36b

…xcept parrow

First unit tests on calculate_permutation_importance()

62535c0

mayer79 added 2 commits May 18, 2025 18:07

Formatter

5329b23

remove empty line

59a6d6a

lorentzenchr reviewed May 19, 2025

View reviewed changes

mayer79 added 5 commits May 24, 2025 13:43

Move and rename helper functions

05f3143

Use small x instead of capital X

bd7bbb9

Add unit test for get_column_names()

196b2c9

Add unit test for safe_copy()

eba4c8a

Add unit test to check if calculations have side effects

6126b53

mayer79 added 2 commits May 24, 2025 15:09

test typing failures

c9397d0

Add unit test

8fc6117

mayer79 marked this pull request as ready for review May 29, 2025 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Permutation Importance #202

Add Permutation Importance #202

Uh oh!

mayer79 commented May 10, 2025 •

edited

Loading

Uh oh!

mayer79 commented May 16, 2025

Uh oh!

lorentzenchr May 19, 2025

Uh oh!

mayer79 May 20, 2025

Uh oh!

lorentzenchr commented May 19, 2025

Uh oh!

lorentzenchr commented May 23, 2025

Uh oh!

mayer79 commented May 24, 2025

Uh oh!

Uh oh!

		from model_diagnostics.scoring import SquaredError


		def safe_copy(X):

Add Permutation Importance #202

Are you sure you want to change the base?

Add Permutation Importance #202

Uh oh!

Conversation

mayer79 commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayer79 commented May 16, 2025

Uh oh!

lorentzenchr May 19, 2025

Choose a reason for hiding this comment

Uh oh!

mayer79 May 20, 2025

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented May 19, 2025

Uh oh!

lorentzenchr commented May 23, 2025

Uh oh!

mayer79 commented May 24, 2025

Uh oh!

Uh oh!

mayer79 commented May 10, 2025 •

edited

Loading