Metric evaluation in stream learning - train test split #843
-
Hi! First of all, thanks to the developer for this awesome library. I'm not sure if this is the correct place to ask, as my question is not directly related to River, but I couldn't find a good answer elsewhere. I'm trying to compare the performance of batch learning vs stream learning, that is, I expect the performance of stream learning to be better on datasets with significant concept drift. However, I still have a lack of understanding regarding stream learning and I hope I can get my answer in this community. My question is, do we split the dataset into train-test like in normal batch learning? I don't think there's a reason not to split them, otherwise I can't get a metric that shows my model performance on real operation. But all of the example in here as well as some papers about incremental learning that I've read doesn't really show that the dataset is being split. If it's not split, how can I make a good comparison between batch learning and stream learning? I also have some question regarding river evaluate module. What is the difference between using progressive_val_score and making a for loop to predict each input and updating the metric from its output? I notice the evaluate module is much faster, but I'm not sure how to use it with pandas dataframe. Lastly, how does incremental learning work in practice? It can't learn from unlabelled data(for supervised learning), correct? So, after training the model, it is not updated anymore right? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello there and welcome!
Usually, no. What you describe is akin to cross-validation, and is a batch machine learning concept. In an online setting, you usually do progressive validation. The consequence is that batch and online models are not really comparable.
You could do cross-validation to evaluate an online model. If you're doing that, you're just treating batch as a special case of online.
That is essentially what
Incremental supervised models need labels to learn, yes. But there are also incremental unsupervised models. I hope that helps! |
Beta Was this translation helpful? Give feedback.
Hello there and welcome!
Usually, no. What you describe is akin to cross-validation, and is a batch machine learning concept. In an online setting, you usually do progressive validation. The consequence is that batch and online models are not really comparable.
You could do cross-validation to evaluate an online model. If you're doing that, you're just treating batch as a special case of online.