Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mondrian Forests #10

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open

Conversation

MarcoDiFrancesco
Copy link
Contributor

@MarcoDiFrancesco MarcoDiFrancesco commented Apr 19, 2024

Mondrian Forest

Implementing Mondrian Forests (not Aggregated Mondrian Forests).

Logic

The logic mostly follows mainly the implementation by Nel215: nel215/mondrianforest since the implementation has no abstraction, which makes it easier to read. The implementation was later adapted it to the River codebase.

p.s. if you feel like this logic is too complex, feel free to change it and adapt the code to the River library.

How to run it

(You probably want to hide the warnings for now 😆 )

RUSTFLAGS=-Awarnings cargo run --release --example synthetic
RUSTFLAGS=-Awarnings cargo run --release --example synthetic-regression

Comparison with Python

(First download the file)

python python_baseline_synthetic.py

@AdilZouitine
Copy link
Member

WOW, that's an incredible work! Thank you 😄
When you are ready to start the review, ping me!

@MarcoDiFrancesco
Copy link
Contributor Author

Hey, I found the problem! It was the one thing you pointed out at the beginning: variance aware estimation. I have a DEBUG statement that overwrites it. For now I'd like to go through the correctness of the implementation that I mentioned in discord first, and then looking at how to reintroduce and fix the VAE.

@MarcoDiFrancesco MarcoDiFrancesco changed the title Mondrian Forests Mondrian Forests - Classification Jun 4, 2024
@MarcoDiFrancesco
Copy link
Contributor Author

MarcoDiFrancesco commented Jun 4, 2024

I'm currently working in the Regression version ✨
Once you review the code and we merge I'll open the PR for regression.

If you are interested in the WIP it's here: diff classification..regression

@MarcoDiFrancesco MarcoDiFrancesco changed the title Mondrian Forests - Classification Mondrian Forests Jun 12, 2024
@MarcoDiFrancesco
Copy link
Contributor Author

I finished the Regression version ✨

I pushed the commits here, so this PR includes both regression and classification.

@smastelini
Copy link
Member

Good stuff! For posterity: @MarcoDiFrancesco identified a bug in the River implementation of Mondrian Forests. A fix was applied in the Rust version and is still to be replicated in Python.

use_aggregation=False,
)

df = pd.read_csv("/home/robotics/light-river/syntetic_dataset.csv")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth accessing this file directly from the URL, like you do on synthetic.rs?

Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MarcoDiFrancesco, thanks, once again, for the amazing work with Mondrian Forests. It will be a nice addition to light-river :D

I finished a first pass on the finished code, focusing mainly on the tests you wrote and high-level stuff. I intend to do another pass and check the functionality per se.

In the meantime, I left some comments for further discussion.

For the posteriority: I intend to replicate the tests @MarcoDiFrancesco wrote to check if the trees are "doing well" in the River implementation of AMF. Great stuff there!

.map(|(idx, _)| idx)
.unwrap();
// println!("probs: {}, pred_idx: {}, y (correct): {}, is_correct: {}", probs, pred_idx, y, pred_idx == y);
if pred_idx == y {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the return is either one or zero (correct or incorrect). Shouldn't it be the predicted class, instead? Also, predict_proba could become predict_proba_one for consistency.

}

impl<F: FType> NodeClassifier<F> {
pub fn update_internal(&mut self, left: NodeClassifier<F>, right: NodeClassifier<F>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how straightforward this trait makes it to update the variance estimators in the future!

};
writeln!(
f,
"{}{}Node {}: time={:.3}, min={:?}, max={:?}, thrs={:.2}, f={}, sums={}, counts={}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud: it would be nice to leverage the same implementation you made for the classifier, although I know the written stuff is not the same

node_idx
}

fn test_tree(&self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the test suite could also be shared by the classification and the regression trees>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants