Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the double-jump for out-of-bound check in DataLoaderLite #23

Closed
wants to merge 1 commit into from
Closed

Conversation

fatemi
Copy link

@fatemi fatemi commented Jun 13, 2024

self.current_position has already been moved at L246 and is ready for the next batch. The condition at L248 looks at yet another move, which is not correct [this way, it basically dismisses an additional block_size at the end of file]. It should simply look at currect_position + 1 to account only for the additional target token.

`self.current_position` has already been moved at L246 and is ready for the next batch. The condition at L248 looks at yet another move, which is not correct [this way, it basically dismisses an additional block_size at the end of file]. It should simply look at currect_position + 1 to account for additional target token.
@lukasugar
Copy link

I think we should check not for +1, but for B * T +1. Because we want to make sure that the next batch can be fetched.
We moved the self.current_position to the starting position of the new batch, so we need to check if that new batch can be fetched.

I've made a PR for that change: #42

@fatemi
Copy link
Author

fatemi commented Jun 28, 2024

Due to line 246, all the processes can fetch their next batch, no need for another B * T.

@lukasugar
Copy link

lukasugar commented Jun 28, 2024

I think that's not the case. Imagine this scenario:

  • there's 94 tokens in the shard, so len(self.tokens) equals 94
  • num_processes = 2
  • B=1
  • T=5

The first process will read tokens 0:6, 10:16...
After 8th step, line 246 will be self.current_position += B * T * self.num_processes which is 80 + 1*5*2 = 90.
In line 248, you'd check that 90 + 1 > 94 => False, so you'd try to fetch tokens[90:96] which is out of bounds.

Do you agree?

@fatemi fatemi closed this Jul 5, 2024
@fatemi fatemi deleted the patch-1 branch July 5, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants