-
Notifications
You must be signed in to change notification settings - Fork 47
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is there a typo in the paper for the weighting of sub-datasets? #22
Comments
You're absolutely right, there is a typo in the paper - see this with the corrected equations. A sub-dataset is simply each data source. LOTSA is a collection of many different open source time series datasets. For the notation in the paper, we call LOTSA the dataset and each component data source is called the sub-dataset. |
I get it. Thanks for your answer! |
The weights in the yaml file have a different meaning from The short explanation is that it is required to reweight each PyTorch Dataset to achieve the sampling proportion as presented in the paper. |
Yes, I found the calculation of the weights in your provided notebook. Thanks for your great work! |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
In the paper, my understanding is that to deal with data imbalance, it is guaranteed that the contribution proportion of each sub-dataset does not exceed 0.001, even though it has numerous samples; And if the samples are few, its contribution proportion corresponds to the actual sample numbers.$min(\frac{|D_k|}{\sum{|D_i|}}, \epsilon)$ .
If so, I'm wondering if the formula should be:
Furthermore, I have questions about the partitioning of the sub-datasets - i.e., according to what criteria are the sub-datasets divided? Is it based on the domains and frequency?
The text was updated successfully, but these errors were encountered: