Replies: 1 comment 8 replies
-
Okay, Further to the above, I find the issue happens even when I don't save and reload! If I train a CSV model, run some predictions, then learn_one immediately afterwards (once), all previous predictions when re-done have the answer of my latest learning instead |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'll just preface this by saying I'm not a data scientist, just a seasoned web developer who's trying to expand horizons.
I've made a rudimentary bank transaction classifier that infers the type of transaction based on the description and amount. Eg shopping, loans, dining out, etc.
I've defined my pipeline as follows:
I then train it from a CSV and all works quite well!
Something like this:
I need to dump this to file and load it on a container on AWS to serve requests for both training and prediction.
Now, if I ever save & reload it using pickle, and then do a single model.learn_one() on it, ALL other (previously correct) predictions result in the same answer as provided to the single model.learn_one function.
Save:
Load:
From the documentation (https://riverml.xyz/latest/api/compose/Pipeline/), I see that if you're doing learn_one on a pipeline, it fits the model to this new value, quote: "Fit to a single instance."
Other learn_one functions (eg https://riverml.xyz/latest/api/multiclass/OneVsOneClassifier/) say: "Update the model with a set of features x and a label y."
My gut feel is that it's not saving/loading correctly, therefore losing all historical learning, making the next learn_one the only training it has ever received.
Questions:
I appreciate the help. I'm totally lost.
Beta Was this translation helpful? Give feedback.
All reactions