forked from salesforce/densecap
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot re-initialize CUDA in forked subprocess #4
Comments
@moose-in-australia you may want to refer to this issue: salesforce#11 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying to run training for the end-to-end masked transformer using the ActivityNet data set. Currently I am running this on an AWS EC2 instance of type p2.xlarge, which has one GPU. I call the training script as follows:
CUDA_VISIBLE_DEVICES=0 python scripts/train.py --dist_url ./ss_model --cfgs_file cfgs/anet.yml --checkpoint_path ./checkpoint/ss_model --batch_size 14 --world_size 1 --cuda --sent_weight 0.25 --mask_weight 1.0 --gated_mask | tee log/ss_model-0
Unfortunately I run into the error below with regards to multiprocessing. So far I have been unable to debug it successfully. When adding the spawn method as indicated by the error messages, further errors occur. I would appreciate any help in figuring out what I'm doing wrong.
The text was updated successfully, but these errors were encountered: