-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use best model and transformer from forecaster pipeline on new data without actual y #77
Comments
Hi, you don't need actual y values. You need historical data to train the models on, but the predictions over the unknown forecast horizon only use the Xvars you pass to the model. So for your data, calling f.export('lvl_fcsts') should give you the forecasted points over the next four periods. |
Thanks heaps for clarifying and for your patience. Related to the code that you previously shared, suppose f2 contains more recent data in addition to the one used in f1, would it be equivalent to getting predictions over the new unknown forecast horizon based on the original model? from scalecast.Forecaster import Forecaster
from scalecast.util import find_optimal_transformation
f1 = Forecaster(...)
f2 = Forecaster(...)
f1.add_ar_terms(12)
f2.add_ar_terms(12)
# find optimal transformation on series 1
transformer, reverter = find_optimal_transformation(f1)
f1 = transformer.fit_transform(f1)
# tune lasso model on series 1
f1.set_estimator('lasso')
f1.tune()
chosen_params = f1.best_params # save best params -- these will also be in f1.history['lasso']['HyperParams']
f1.auto_forecast()
# apply transformation to series 2
f2 = transformer.fit_transform(f2)
# apply lasso model with optimal hyperparams to series 2
f2.set_estimator('lasso')
f2.manual_forecast(**chosen_params) |
Oh, I think I know what you are asking. One of the nuances with scalecast is that models have to retrain every time predictions are generated. To do what you are describing, you can use |
For our specific use-case, we need to be able to monitor model drift. With regard the work around, may I request for a code snippet of how I may implement it? |
Sure, I'll start working on mocking something up. One potentially easier work around would be to iteratively try longer and longer forecast horizons. As long as you know the actuals, the model predictions wouldn't change even if you used shorter horizons but didn't retrain the model. |
- Added `Forecaster.transfer_predict()` method. Only univariate sklearn models supported for now (#77). - Added `Forecaster.transfer_cis()` method. - Added `carry_fit_models` attribute in `Forecaster` object that can be changed when object is initialized. - Added `util.infer_apply_Xvar_selection()` function. - Changed how many history attributes are stored for each evaluated model, making the `Forecaster` object more memory efficient. - Refactored forecasting code for sklearn models so that model evaluation is more efficient. - Changed the `max_ar = 'auto'` behavior in `Forecaster.auto_Xvar_select()`. - Changed scikit-learn dependency to `<1.3.0` due to it not working with the shap library. - Fixed an issue with combo modeling where defaults were not working when a previous model had been run test only.
Instead of code, I decided to build a method for the |
Hi @mikekeith52 . Sorry to ask again. Am still getting familiar with scalecast. This is related to #57 .
I got the following from the forecaster pipeline I ran:
best model = knn
best params = {'n_neighbors': 43}
optimal transformer = Transformer(
transformers = [
('DetrendTransform', {'loess': True}),
('DiffTransform', 1),
('ScaleTransform',)
]
)
f = Forecaster(
DateStartActuals=2016-01-10T00:00:00.000000000
DateEndActuals=2021-01-10T00:00:00.000000000
Freq=None
N_actuals=260
ForecastLength=4
Xvars=['month_8', 'quarter_2', 'quarter_3', 'COVID19', 'dengue_lag_4', 'dengue_lag_5', 'dengue_lag_6', 'dengue_lag_7', 'dengue_lag_8', 'symptoms_of_dengue_lag_8']
TestLength=4
ValidationMetric=rmse
ForecastsEvaluated=['mlr', 'lasso', 'ridge', 'elasticnet', 'xgboost', 'lightgbm', 'knn']
CILevel=None
CurrentEstimator=knn
GridsFile=Grids
)
Is there a way I can use these in sklearn (if not scalecast) to forecast new values of y based on just the values of Xvars? From what I understand forecaster needs y.
I would appreciate your assistance a lot.
The text was updated successfully, but these errors were encountered: