Modeling

Using the pre-processed input data as described here, mave uses several different methods to build a model on the pre-retrofit data using the following methods: Dummy Regressor, Hour Of Week Bin Regressor, K Nearest Neighbors Regressor, Random Forest Regressor, & Extra Trees Regressor.

Parameter selection for each method

Each of these methods has an associated set of configuration parameters. For example, the simple case of the Hour of Week Bin Regressor might make a prediction using either the mean or the median of each bin. For these simple methods mave explores all possible combinations of parameter values. However, the more complex methods typically have more parameters and thousands (or millions) of combinations of parameter values. For these complex methods, mave trains a model using a randomly selected set of parameter values and iterates this process multiple times (default search_iterations=20).

Model selection

For each method, mave selects the best performing set of parameter values using k-fold cross validation (default k=10), according to the R2 value. Mave then selects an overall best model also based on the highest R2 value, and this is the final model used for prediction. For the vast majority of datasets we have looked at, this is typically either the Random Forest Regressor or Extra Trees Regressor.

To be completed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modeling

Parameter selection for each method

Model selection

Clone this wiki locally