Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: matrix contains invalid numeric entries #540

Open
1653042420 opened this issue Oct 27, 2023 · 4 comments
Open

ValueError: matrix contains invalid numeric entries #540

1653042420 opened this issue Oct 27, 2023 · 4 comments

Comments

@1653042420
Copy link

Hi! Thanks for sharing your excellent work! when I trained the Lidar branch
I only have one RTX 4080,and my batch size is 4. I encountered this issue when using the previously released code to train the LiDAR branch.Now,I know that it was caused by an incorrect learning rate. I want to konw if I use the latest released code, do I still need to adjust the learning rate based on the total batch size?

@971022jing
Copy link

Hi! Thanks for sharing your excellent work! when I trained the Lidar branch I only have one RTX 4080,and my batch size is 4. I encountered this issue when using the previously released code to train the LiDAR branch.Now,I know that it was caused by an incorrect learning rate. I want to konw if I use the latest released code, do I still need to adjust the learning rate based on the total batch size?

I have the same problem. Do you have any new progress?

@nanqiang-zhangzhaoxu
Copy link

I have the same problem. Do you have any new progress?

@wyf0414
Copy link

wyf0414 commented Apr 2, 2024

I have the same problem, too. And when I increase the max_epoch, the corresponding lr needs to be smaller. I have to adjust the lr again and again.

@gerardmartin2
Copy link

Hi, I have adjusted also the learning rate but in the 5th epoch it starts to slow down a lot. If you have made any modification in the lr schedule, can you show it? As my batch_size if 3 (approx 1/10 of the original) I have changed lr (from optimizer and min_lr_ratio) to 1/10 of original. Before this change my training was stucked at epoch2 and now it reaches epoch5, but as said, it starts to go too slow.

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants