Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with new Random Seed does not shuffle data #32

Open
keeganq opened this issue Mar 22, 2022 · 0 comments
Open

Training with new Random Seed does not shuffle data #32

keeganq opened this issue Mar 22, 2022 · 0 comments

Comments

@keeganq
Copy link

keeganq commented Mar 22, 2022

I've been adapting the example scripts to my own training task, and I've noticed that the scripts do not handle different random seeds as expected. I've found this problem in two places, but there might be more:

sampler=DistributedSampler(
train_dataset,
num_replicas=dist.get_world_size(),
rank=dist.get_rank(),
shuffle=True,
),

DistributedSampler(train_dataset, shuffle=True) # type: ignore

The problem is that the DistributedSampler (from PyTorch 1.9.0) requires kwarg "seed" to shuffle differently, when shuffle=True. I believe that the correct use of DistributedSampler for training with different random seeds would be to add the kwarg seed=_DOWNC.RANDOM_SEED when DistributedSampler is initialized in these two places. As for reshuffling on additional epochs, DistributedSampler will add the seed to the epoch number, so nothing needs to be changed during epoch-setting for the sampler.

https://github.com/pytorch/pytorch/blob/d69c22dd61a2f006dcfe1e3ea8468a3ecaf931aa/torch/utils/data/distributed.py#L100

Please let me know your thoughts, or if I may have missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant