Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with setting up dataloader #1061

Open
lazyman001 opened this issue Aug 27, 2024 · 2 comments
Open

problems with setting up dataloader #1061

lazyman001 opened this issue Aug 27, 2024 · 2 comments

Comments

@lazyman001
Copy link

I set my dataloader like this :

 NUM_WOEKERS = os.cpu_count()

 train_dataloader= DataLoader(
     dataset=train_data_simple,
     batch_size=BATCH_SIZE,
     shuffle=True,
     num_workers=1
 ) 

and an error is reported when the num_workcer equals any number that is not zero. The error is shown in the picture.
e7c6a01289149ea83eab3a006cf6373

@mrdbourke
Copy link
Owner

Hey @lazyman001 ,

Where are you getting this issue?

What's the code you're trying to run?

Have you tried putting all your code into a main() function? And then calling if __name__ == "__main__": main()?

For example:

This error occurs because you're trying to use multiple worker processes in PyTorch, and you haven't properly protected the code that starts these processes. Specifically, you need to ensure that the multiprocessing module's main entry point is properly guarded by an if __name__ == '__main__': clause. This is required when using the multiprocessing module on platforms that dont use the fork system call (such as Windows).

To fix this, make sure your script looks something like this:

python
Copy code
import torch
from torch.utils.data import DataLoader

# Your other imports and code here

def main():
    # Your training or data loading code here
    # Example:
    # dataset = YourDataset()
    # dataloader = DataLoader(dataset, num_workers=4)

    pass

if __name__ == '__main__':
    main()

Someone had a similar issue to this the other day, see: #1059

If you're still having troubles, please post the code you're trying to run and describe where you're running it.

@heisdenverr
Copy link

This error occurs because you're trying to use multiple worker processes in PyTorch, you created a global constant called NUM WORKERS, and in the data.dataloader implementation , you hard coded it to be 1.

To fix this, make sure your script looks something like this:

python
Copy code
import torch
from torch.utils.data import DataLoader

Your other imports and code here


def main():
    # Your training or data loading code here
    # Example:
    # dataset = YourDataset()
    # NUM_WORKERS = os.cpu_count()
    # dataloader = DataLoader(dataset, num_workers=NUM_WORKERS)

    pass

if __name__ == '__main__':
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@mrdbourke @heisdenverr @lazyman001 and others