Mondrian Forests #10

MarcoDiFrancesco · 2024-04-19T14:11:46Z

Mondrian Forest

Implementing Mondrian Forests (not Aggregated Mondrian Forests).

Logic

The logic mostly follows mainly the implementation by Nel215: nel215/mondrianforest since the implementation has no abstraction, which makes it easier to read. The implementation was later adapted it to the River codebase.

p.s. if you feel like this logic is too complex, feel free to change it and adapt the code to the River library.

How to run it

(You probably want to hide the warnings for now 😆 )

RUSTFLAGS=-Awarnings cargo run --release --example synthetic

RUSTFLAGS=-Awarnings cargo run --release --example synthetic-regression

Comparison with Python

(First download the file)

python python_baseline_synthetic.py

AdilZouitine · 2024-04-21T14:15:49Z

WOW, that's an incredible work! Thank you 😄
When you are ready to start the review, ping me!

MarcoDiFrancesco · 2024-05-29T12:17:40Z

Hey, I found the problem! It was the one thing you pointed out at the beginning: variance aware estimation. I have a DEBUG statement that overwrites it. For now I'd like to go through the correctness of the implementation that I mentioned in discord first, and then looking at how to reintroduce and fix the VAE.

MarcoDiFrancesco · 2024-06-04T14:16:50Z

I'm currently working in the Regression version ✨
Once you review the code and we merge I'll open the PR for regression.

If you are interested in the WIP it's here: diff classification..regression

MarcoDiFrancesco · 2024-06-12T13:49:39Z

I finished the Regression version ✨

I pushed the commits here, so this PR includes both regression and classification.

smastelini · 2024-06-12T13:51:49Z

Good stuff! For posterity: @MarcoDiFrancesco identified a bug in the River implementation of Mondrian Forests. A fix was applied in the Rust version and is still to be replicated in Python.

src/datasets/synthetic.rs

smastelini · 2024-06-17T13:08:34Z

python_baseline_synthetic.py

+    use_aggregation=False,
+)
+
+df = pd.read_csv("/home/robotics/light-river/syntetic_dataset.csv")


Do you think it is worth accessing this file directly from the URL, like you do on synthetic.rs?

src/datasets/synthetic_regression.rs

smastelini

Hi @MarcoDiFrancesco, thanks, once again, for the amazing work with Mondrian Forests. It will be a nice addition to light-river :D

I finished a first pass on the finished code, focusing mainly on the tests you wrote and high-level stuff. I intend to do another pass and check the functionality per se.

In the meantime, I left some comments for further discussion.

For the posteriority: I intend to replicate the tests @MarcoDiFrancesco wrote to check if the trees are "doing well" in the River implementation of AMF. Great stuff there!

smastelini · 2024-06-17T17:58:09Z

src/mondrian_forest/mondrian_forest.rs

+            .map(|(idx, _)| idx)
+            .unwrap();
+        // println!("probs: {}, pred_idx: {}, y (correct): {}, is_correct: {}", probs, pred_idx, y, pred_idx == y);
+        if pred_idx == y {


It seems the return is either one or zero (correct or incorrect). Shouldn't it be the predicted class, instead? Also, predict_proba could become predict_proba_one for consistency.

smastelini · 2024-06-17T18:41:53Z

src/mondrian_forest/mondrian_node.rs

+}
+
+impl<F: FType> NodeClassifier<F> {
+    pub fn update_internal(&mut self, left: NodeClassifier<F>, right: NodeClassifier<F>) {


I like how straightforward this trait makes it to update the variance estimators in the future!

smastelini · 2024-06-18T20:41:31Z

src/mondrian_forest/mondrian_tree_reg.rs

+            };
+            writeln!(
+                f,
+                "{}{}Node {}: time={:.3}, min={:?}, max={:?}, thrs={:.2}, f={}, sums={}, counts={}",


Just thinking out loud: it would be nice to leverage the same implementation you made for the classifier, although I know the written stuff is not the same

smastelini · 2024-06-18T20:43:29Z

src/mondrian_forest/mondrian_tree_reg.rs

+        node_idx
+    }
+
+    fn test_tree(&self) {


Do you think the test suite could also be shared by the classification and the regression trees>

MarcoDiFrancesco added 14 commits April 11, 2024 09:19

Update ClassifierOutput docstring

6023dd1

Add RegressionOutput to common

feba8a0

Merge branch 'online-ml:main' into main

c13d3c6

Add boilerplate code for mondrian forest

308a082

Add keystroke dataset

3ba0e3a

Add all functions calls with unimplemented errors

2f9e03d

Add predict steps to be refactored

7b63db5

Add get features function

d5bb6db

Add Array library

b5b7ec4

Add randomization for cache tests

d613df2

Disable test github actions and enable only check

2174472

Remove verbose from build and test

1c91530

Add Stats struct and impl

44cfba4

Add rust caching in actions

4c6ebe4

MarcoDiFrancesco added 15 commits April 22, 2024 09:47

Split MondrianTree and MondrianForest

1ccabc4

Refactor to use Tree Vector indicies instead of pointers

ac71b06

Change actions cargo.lock to cargo.toml

8aad4ed

Add print function for MondrianTree

8c91dd8

Adding print functions to mondriantree and node

6b38849

Implement and test predict_proba

107354a

Add unit test for predict_proba

4385fe8

Add final implementation of inference (predict_proba)

49d4e3e

Add random distribution to extend mondrian block

a16d3e7

Add full extend_mondrian_block implementation

de5d67a

Add synthetic dataset and tree integrity tests

667d35e

Fix pointer of grandpa on extend_mondrian_block

f79864d

Add recursive repr mondrian forest

989c176

Add score function

75e5feb

Remove debug statements

da4a00a

MarcoDiFrancesco added 5 commits May 24, 2024 17:11

Add assert to check for NaN probability

1e5a874

Revert removal of split_time

6971c21

Add test cases

782d1f2

Remove unused child_is_on_edge_parent test case

a5bd895

Add debug statement for overwriting variance aware estimation

3544c28

MarcoDiFrancesco changed the title ~~Mondrian Forests~~ Mondrian Forests - Classification Jun 4, 2024

Add synthetic regression target boilerplate

9083d8e

MarcoDiFrancesco added 5 commits June 7, 2024 15:33

Add Classification and Regression division of MF

43cce28

Add regression task and parent_has_finite_values test

e58638b

Fix child_inside_parent test

fed6daf

Remove prints in excess

760de79

Add regression metrics

54bb202

MarcoDiFrancesco changed the title ~~Mondrian Forests - Classification~~ Mondrian Forests Jun 12, 2024

MarcoDiFrancesco added 2 commits June 12, 2024 16:35

Fix test keystroke dataset

0d74d3f

Change description of synthetic dataset

c60b381

smastelini reviewed Jun 17, 2024

View reviewed changes

src/datasets/synthetic.rs Outdated Show resolved Hide resolved

smastelini reviewed Jun 17, 2024

View reviewed changes

src/datasets/synthetic_regression.rs Outdated Show resolved Hide resolved

smastelini requested a review from AdilZouitine June 17, 2024 13:21

smastelini requested changes Jun 18, 2024

View reviewed changes

MarcoDiFrancesco and others added 6 commits June 24, 2024 14:50

Add baseline comparison for regression

ec2109a

Add machine degradation dataset

b77ba69

Add genesis demostrator dataset

a6c1b8b

Update machine degradation with redirect

4a4b9f5

Update src/datasets/synthetic_regression.rs

23c109e

Update src/datasets/synthetic.rs

38e64ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mondrian Forests #10

Mondrian Forests #10

MarcoDiFrancesco commented Apr 19, 2024 •

edited

Loading

AdilZouitine commented Apr 21, 2024

MarcoDiFrancesco commented May 29, 2024

MarcoDiFrancesco commented Jun 4, 2024 •

edited

Loading

MarcoDiFrancesco commented Jun 12, 2024

smastelini commented Jun 12, 2024

smastelini Jun 17, 2024

smastelini left a comment

smastelini Jun 17, 2024

smastelini Jun 17, 2024

smastelini Jun 18, 2024

smastelini Jun 18, 2024

Mondrian Forests #10

Are you sure you want to change the base?

Mondrian Forests #10

Conversation

MarcoDiFrancesco commented Apr 19, 2024 • edited Loading

Mondrian Forest

Logic

How to run it

Comparison with Python

AdilZouitine commented Apr 21, 2024

MarcoDiFrancesco commented May 29, 2024

MarcoDiFrancesco commented Jun 4, 2024 • edited Loading

MarcoDiFrancesco commented Jun 12, 2024

smastelini commented Jun 12, 2024

smastelini Jun 17, 2024

Choose a reason for hiding this comment

smastelini left a comment

Choose a reason for hiding this comment

smastelini Jun 17, 2024

Choose a reason for hiding this comment

smastelini Jun 17, 2024

Choose a reason for hiding this comment

smastelini Jun 18, 2024

Choose a reason for hiding this comment

smastelini Jun 18, 2024

Choose a reason for hiding this comment

MarcoDiFrancesco commented Apr 19, 2024 •

edited

Loading

MarcoDiFrancesco commented Jun 4, 2024 •

edited

Loading