Developing for Production (Daemon or Instance?) #681
-
Hi everyone! Yes, I'm posting two questions in one day, I hope that's not a problem! :) I'll start with my use case and hopefully you can help me find a solution... I'm working with time series data coming in from multiple sources, each of which formats its data differently. My current plan is to have a daemon or cron job scouring the data streams and storing them in a time series database in a normalized format which my River learner can step through at its own pace. The question is, should the learner algorithm also be a daemon, constantly waiting for new data to come in from the stream, or should it be a cron job which checks for new rows in the database since its last run? In both cases I think I'll need to save the model periodically, in the daemon's case to recover from catastrophic failure or manual restarts, and in the cron job case to pick up where it left off without losing any learning. In both cases I think the important thing is a persistent and recoverable state for the learner. Do you have any ideas for how this could work? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
This is a very interesting question. Both approaches have pros and cons, I guess it depends on what works better for your application. Stream learning models are designed to process single instances of data, in other words, for online environments. However, they can be used in offline settings if that works for your application. The traditional approach is to allow the model to learn on the go as new data appears on the stream. This is particularly relevant if your stream has a constant flow of data at a high rate. This approach exploits the adaptability and incremental nature of the model. It is possible to read the database at intervals, but your system would be dependent on when learning is triggered, either by using a fixed time or some defined rules. This could impact the reaction time to changes in your data. In this scenario, you could even exploit batch earning techniques. Notice that this setup is the most common in production environments based on batch learning. Once again, this will depend on what works for your application. The good news is that Similarly, model persistency should be built around your application's requirements. |
Beta Was this translation helpful? Give feedback.
This is a very interesting question. Both approaches have pros and cons, I guess it depends on what works better for your application. Stream learning models are designed to process single instances of data, in other words, for online environments. However, they can be used in offline settings if that works for your application.
The traditional approach is to allow the model to learn on the go as new data appears on the stream. This is particularly relevant if your stream has a constant flow of data at a high rate. This approach exploits the adaptability and incremental nature of the model.
It is possible to read the database at intervals, but your system would be dependent on when learni…