-
Notifications
You must be signed in to change notification settings - Fork 1
Modeling
Using the pre-processed input data as described here, mave uses several different methods to build a model on the pre-retrofit data using the following methods: Dummy Regressor, Hour Of Week Bin Regressor, K Nearest Neighbors Regressor, Random Forest Regressor, & Extra Trees Regressor.
Each of these methods has an associated set of configuration parameters. For example, the simple case of the Hour of Week Bin Regressor might make a prediction using either the mean or the median of each bin. For these simple methods mave explores all possible combinations of parameter values. However, the more complex methods typically have more parameters and thousands (or millions) of combinations of parameter values. For these complex methods, mave trains a model using a randomly selected set of parameter values and iterates this process multiple times (default search_iterations=20).
For each method, mave selects the best performing set of parameter values using k-fold cross validation (default k=10), according to the R2 value. Mave then selects an overall best model also based on the highest R2 value, and this is the final model used for prediction. For the vast majority of datasets we have looked at, this is typically either the Random Forest Regressor or Extra Trees Regressor.
To be completed