dcrt0(1/4): API2: add comments and docstring of the functions #220

lionelkusch · 2025-04-07T14:46:52Z

Following the Sprint a new API has been characterized.
This is an example of the first refactoring methods for this new API.

bthirion

Please put the dosctring handling in a _utils module
+
We don't see anything in the figures: replace the boxplots with bars + error bars.

src/hidimstat/dcrt.py

fdr to fpr Co-authored-by: bthirion <[email protected]>

examples/plot_dcrt_example.py

src/hidimstat/__init__.py

jpaillard · 2025-04-23T13:46:00Z

src/hidimstat/_utils/docstring.py

Why do we need this? The docstrings were rendered properly before, no?

This file contains the functions of creating the docstring for a function based on the docstring of the class.

src/hidimstat/dcrt.py

jpaillard · 2025-04-23T13:59:48Z

src/hidimstat/dcrt.py

+        joblib_verbose=0,
+        fit_y=False,
+        n_tree=100,
+        problem_type="regression",


This comment is not specific to the class implementation.
I am not sure this parameter does what we expect it to do. It is only passed to the rf_distillation.
For regression and classification, _fit_lasso is called (L206) using a regression model. Using Lasso for classification doesn't break, but we would probably rather use Logistic Regression with L1 regularization instead.

See the issue #253, for the discussion.

src/hidimstat/dcrt.py

bthirion

Thx. One important question concerns API homogeneity.

bthirion · 2025-04-23T20:16:29Z

src/hidimstat/dcrt.py

+    def importance(
+        self,
+        fpr=0.05,
+        scaled_statistics=False,


importance should have a uniform API across classes. I think that it should only accept X=None, y=None as argument.
fpr is only useful for selection, thus not for importance computation
scaled_statistics could be specified when creating the class.
WDYT ?

During my first refactoring, importance returning : "importance value and a selection". In this case, fpr is required for importance.
From the discussion in the issue #217. This API is not yet formalised.
Once there will be more consensus on it, I will update this PR.

bthirion · 2025-04-23T20:17:14Z

src/hidimstat/dcrt.py

+                )
+                for idx in selection_set
+            )
+        elif self.statistic == "random_forest":


I think that we will have to revise the use of this so-called self.statistic.

One way of doing it is to let the user choose its own estimator and the associate loss.

Do want this?

let's leave that for another PR, but indeed, the estimator should be provided by the user (with a good default); I personally think RF are a good default.

…mstat into PR_dcrt_comment

bthirion

Moving forward, thx !

bthirion · 2025-04-29T20:59:29Z

examples/plot_dcrt_example.py

-        selection_features, X_res, sigma2, y_res
-    )
+    d0crt_lasso = D0CRT(screening=False, statistic="residual")
+    d0crt_lasso.fit(X, y)


Suggested change

d0crt_lasso.fit(X, y)

d0crt_lasso = D0CRT(screening=False, statistic="residual").fit(X, y)

bthirion · 2025-04-29T21:01:50Z

examples/plot_model_agnostic_importance.py

-)
-
+d0crt = D0CRT(problem_type="classification", screening=False)
+d0crt.fit(X, y)


Suggested change

d0crt.fit(X, y)

d0crt = D0CRT(problem_type="classification", screening=False).fit(X, y)

bthirion · 2025-04-29T21:03:12Z

examples/plot_model_agnostic_importance.py

-
+d0crt = D0CRT(problem_type="classification", screening=False)
+d0crt.fit(X, y)
+_, pval_dcrt = d0crt.importance(fpr=0.05)


I think it is already mentioned in another PR, but importance should not take and fpr argument.

No, this was mentioned in issue #217.
However, there is no consensus if this function has a X_test and Y_test or not.

all importance methods should have should have an X=None, Y=None, as arguments, that would default to the already provided training data. In some cases, these arguments would not even be taken into account.

The fact that all the importance methods will have an X and y is not yet defined.
Using an X_test and y_test will change a bit the algorithm for computing the residual on this test data.

As a user, it's more readable to add additional optional parameters to a function than to have parameters which don't have any effect.

The point is about API uniformity: you want to be able to loop over methods with maximally similar arguments without triggering an error. The mechanism is fundamental in sklearn and related libraries.

Can you indicate to me one function in sklearn where some parameters are not used by the function due to the homogenisation of the API?

bthirion · 2025-04-29T21:13:47Z

src/hidimstat/dcrt.py

-        Threshold for variable screening (0-100)
+        Whether to perform variable screening step based on Lasso coefficients
+    screening_threshold : float, default=10
+        Percentile threshold for screening (0-100), larger values lead to the inclusion of more variables at the screening stage (screening_threshold=100 keeps all variables).


This line is too long.

bthirion · 2025-04-29T21:19:36Z

src/hidimstat/dcrt.py

+                )
+                for idx in selection_set
+            )
+        elif self.statistic == "random_forest":


lionelkusch · 2025-04-29T21:35:11Z

Moving forward, thx !

I cannot really move forward without knowing where I am going.
The API is still under discussion in issue #217 . Once, the discussion arrives at some point. I will update it.

bthirion · 2025-04-30T14:49:18Z

All unsupervised learning class.fit() methods (clustering, PCA, ICA, manifold learning,...) in sklearn have an unused y parameter. But please to sklearn developers. They will give you better hints than me.

bthirion · 2025-06-17T21:00:35Z

Please LMK if you need a review on that one.

lionelkusch · 2025-06-18T08:37:20Z

Not yet, I just updated the PR once the API is finalised.

lionelkusch added 30 commits January 8, 2025 10:25

Add comments to _lambda_alpha_max

583e684

Remove loss parameter

2a769b9

Add full parameters for the frd threshold

b8e70b0

Add comments

9cbcd93

Remove uppercase letter

337df18

Add bibliography

cea2b2e

Add reference

b67c0de

Remove sigma_zero

8aa6707

Fix some typo and function

b06a946

Fix bugs in tests

5545804

Fix format

3d53a64

add link to original implementation

6071d18

At todo on difference of implementation

a7b6af5

improve dcrt

1b4e3c5

mprove warning message

eb33ac4

Add warning

a3e83da

Format document

a680fbf

Add new assert

e9c5a2b

Add new function in the init

9fa0dca

Include the new funvtion in the documentation

92d052e

Improve comments

3729fd5

Format file

768034d

Merge branch 'main' into PR_dcrt_comment

3153e68

formating files

456b47b

Update a comment

db70f26

Improve coverage

95bca86

Format file

a813248

Fix bug in examples

594d281

Split tests

70d6f5d

Format file

0248642

bthirion reviewed Apr 10, 2025

View reviewed changes

src/hidimstat/dcrt.py Outdated Show resolved Hide resolved

src/hidimstat/dcrt.py Outdated Show resolved Hide resolved

src/hidimstat/dcrt.py Outdated Show resolved Hide resolved

src/hidimstat/dcrt.py Outdated Show resolved Hide resolved

src/hidimstat/dcrt.py Outdated Show resolved Hide resolved

lionelkusch and others added 4 commits April 11, 2025 11:41

Apply suggestions from code review

7aa8def

fdr to fpr Co-authored-by: bthirion <[email protected]>

Move aggregation to _utils file

36a4d57

updtae for frp

b435399

fix bug

8d9a946

lionelkusch added the API 2 Refactoring follwoing the second version of API label Apr 11, 2025

lionelkusch added 4 commits April 15, 2025 17:01

Merge branch 'main' into PR_dcrt_comment

bf23937

move the function for docstrig in utils

91b4eb3

Fix import

181ed7e

Merge branch 'main' into PR_dcrt_comment

05ee677

lionelkusch requested review from bthirion and jpaillard April 23, 2025 12:54

jpaillard reviewed Apr 23, 2025

View reviewed changes

bthirion reviewed Apr 23, 2025

View reviewed changes

lionelkusch added 3 commits April 25, 2025 11:47

Merge branch 'main' into PR_dcrt_comment

34e2728

update example

85f3033

update api

82b7ef7

lionelkusch mentioned this pull request Apr 25, 2025

Using LogisticRegression for classification in D0CRT #253

Open

lionelkusch added 5 commits April 25, 2025 15:22

Merge branch 'PR_dcrt_comment' of https://github.com/lionelkusch/hidi…

853b53e

…mstat into PR_dcrt_comment

fix bug

309d982

fix condition

0284cbb

fix bug

2b9cdbf

fix tests

7a24bda

bthirion reviewed Apr 29, 2025

View reviewed changes

lionelkusch force-pushed the PR_dcrt_comment branch from e5d5482 to 7a24bda Compare May 19, 2025 13:54

Merge branch 'main' into PR_dcrt_comment

447d2a3

	d0crt_lasso.fit(X, y)
	d0crt_lasso = D0CRT(screening=False, statistic="residual").fit(X, y)

	d0crt.fit(X, y)
	d0crt = D0CRT(problem_type="classification", screening=False).fit(X, y)

dcrt0(1/4): API2: add comments and docstring of the functions #220

Are you sure you want to change the base?

dcrt0(1/4): API2: add comments and docstring of the functions #220

Uh oh!

Conversation

lionelkusch commented Apr 7, 2025

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionelkusch commented Apr 29, 2025

Uh oh!

bthirion commented Apr 30, 2025

Uh oh!

bthirion commented Jun 17, 2025

Uh oh!

lionelkusch commented Jun 18, 2025

Uh oh!

Uh oh!