Skip to content

train_loss Becomes Infinite When Using num_feat_dynamic_real in DeepAREstimator #3246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LiPingYen opened this issue Mar 7, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@LiPingYen
Copy link

Description

When using DeepAREstimator with the num_feat_dynamic_real argument set, the train_loss becomes infinite and does not decrease as training progresses. However, when num_feat_dynamic_real is not used, the train_loss decreases normally.

I need to add dynamic features to model training, so resolving this issue is important. Any help would be greatly appreciated!

To Reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

freq = 'D' 
prediction_length = 14
context_length = 1 * prediction_length
num_layers = 6
hidden_size = 128
batch_size = 128
dropout_rate = 0.1
num_batches_per_epoch = 100
max_epochs = 300

train = PandasDataset(filtered_dta2[:-prediction_length], target='stock_close_price', feat_dynamic_real = dynamic_features, freq=freq)
test = PandasDataset(filtered_dta2, target='stock_close_price', feat_dynamic_real = dynamic_features, freq=freq)


estimator = DeepAREstimator(
    freq=freq, 
    prediction_length = prediction_length, 
    context_length = context_length,
    num_layers = num_layers,
    hidden_size = hidden_size,
    batch_size = batch_size,
    num_batches_per_epoch = num_batches_per_epoch,
    num_feat_dynamic_real = train.num_feat_dynamic_real,
    dropout_rate = dropout_rate,
    distr_output = ImplicitQuantileNetworkOutput(),
    trainer_kwargs={'accelerator': 'mps', 'devices': 'auto', 'strategy':'auto', 'callbacks': [RichProgressBar()], 'deterministic': True, 'max_epochs': max_epochs, 'logger': False},
)
predictor = estimator.train(train)

Error message or code output

Epoch 0, global step 100: 'train_loss' reached inf (best inf), saving model to '/Users/tayloryen/Desktop/github project/time-series-in-R/stock/deepar/checkpoints/epoch=0-step=100-v8.ckpt' as top 1
Epoch 1, global step 200: 'train_loss' was not in top 1
Epoch 2, global step 300: 'train_loss' was not in top 1

Environment

  • Operating system: macOS 15.3.1
  • Python version: 3.12.8
  • GluonTS version: 0.16.0
  • PyTorch version: 2.6.0
@LiPingYen LiPingYen added the bug Something isn't working label Mar 7, 2025
@kashif
Copy link
Contributor

kashif commented Mar 15, 2025

so the issue can be with your data... sometimes there are nans etc. and that can cause issues... only the target is treated with the nan-checking and imputation etc. all other covariates are just concated

so check your covariates... most probably there are nans in it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants