-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mondrian Forests #10
base: main
Are you sure you want to change the base?
Mondrian Forests #10
Conversation
WOW, that's an incredible work! Thank you 😄 |
Hey, I found the problem! It was the one thing you pointed out at the beginning: variance aware estimation. I have a DEBUG statement that overwrites it. For now I'd like to go through the correctness of the implementation that I mentioned in discord first, and then looking at how to reintroduce and fix the VAE. |
I'm currently working in the Regression version ✨ If you are interested in the WIP it's here: diff classification..regression |
I finished the Regression version ✨ I pushed the commits here, so this PR includes both regression and classification. |
Good stuff! For posterity: @MarcoDiFrancesco identified a bug in the River implementation of Mondrian Forests. A fix was applied in the Rust version and is still to be replicated in Python. |
python_baseline_synthetic.py
Outdated
use_aggregation=False, | ||
) | ||
|
||
df = pd.read_csv("/home/robotics/light-river/syntetic_dataset.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth accessing this file directly from the URL, like you do on synthetic.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MarcoDiFrancesco, thanks, once again, for the amazing work with Mondrian Forests. It will be a nice addition to light-river :D
I finished a first pass on the finished code, focusing mainly on the tests you wrote and high-level stuff. I intend to do another pass and check the functionality per se.
In the meantime, I left some comments for further discussion.
For the posteriority: I intend to replicate the tests @MarcoDiFrancesco wrote to check if the trees are "doing well" in the River implementation of AMF. Great stuff there!
.map(|(idx, _)| idx) | ||
.unwrap(); | ||
// println!("probs: {}, pred_idx: {}, y (correct): {}, is_correct: {}", probs, pred_idx, y, pred_idx == y); | ||
if pred_idx == y { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the return is either one or zero (correct or incorrect). Shouldn't it be the predicted class, instead? Also, predict_proba
could become predict_proba_one
for consistency.
} | ||
|
||
impl<F: FType> NodeClassifier<F> { | ||
pub fn update_internal(&mut self, left: NodeClassifier<F>, right: NodeClassifier<F>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how straightforward this trait makes it to update the variance estimators in the future!
}; | ||
writeln!( | ||
f, | ||
"{}{}Node {}: time={:.3}, min={:?}, max={:?}, thrs={:.2}, f={}, sums={}, counts={}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud: it would be nice to leverage the same implementation you made for the classifier, although I know the written stuff is not the same
node_idx | ||
} | ||
|
||
fn test_tree(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the test suite could also be shared by the classification and the regression trees>
Mondrian Forest
Implementing Mondrian Forests (not Aggregated Mondrian Forests).
Logic
The logic mostly follows mainly the implementation by Nel215: nel215/mondrianforest since the implementation has no abstraction, which makes it easier to read. The implementation was later adapted it to the River codebase.
p.s. if you feel like this logic is too complex, feel free to change it and adapt the code to the River library.
How to run it
(You probably want to hide the warnings for now 😆 )
Comparison with Python
(First download the file)