Make itertomap loading more lazy #1016

SvenDS9 · 2023-02-15T15:10:19Z

Fixes #454

Changes

Only load from datapipe until requested element is loaded
Add test for this behavior

SvenDS9 · 2023-02-24T18:52:30Z

Some tests in torchvision fail. Apparently a stream isn't being closed and my implemenation of itertomap __getitem__() causes this.

The stream in question I think is being created here:
https://github.com/pytorch/vision/blob/01ef0a68b6ec00452391251fc16c38e58b92bf07/test/builtin_dataset_mocks.py#L1356

This causes this test to fail:
https://github.com/pytorch/vision/blob/01ef0a68b6ec00452391251fc16c38e58b92bf07/test/test_prototype_datasets_builtin.py#L122-L140
While there is a small bug in this test (closing the stream inside the loop changes the dictionary size during iteration which causes subsequent tests to also fail as the dict isn't fully empty for the next test) fixing this wouldn't solve the issue.

https://github.com/pytorch/vision/blob/01ef0a68b6ec00452391251fc16c38e58b92bf07/torchvision/prototype/datasets/_builtin/cub200.py#L194-L208

As I don't know how to fix it I would be happy if someone else could chime in.

pmeier · 2023-03-09T17:32:54Z

@ejguan I guess you have the most context on the errors given you fixed pytorch/vision#6997. We just merged pytorch/vision#7403 to make diagnosing this easier. This patch will hit with tomorrows nightly. Could you also have a look here?

ejguan · 2023-03-09T17:54:33Z

@ejguan I guess you have the most context on the errors given you fixed pytorch/vision#6997. We just merged pytorch/vision#7403 to make diagnosing this easier. This patch will hit with tomorrows nightly. Could you also have a look here?

@pmeier Thanks, I think pytorch/vision#7403 does the right job. I will take a look at this PR to see why such issue happens with this PR

ejguan

I don't know the exact reason why this PR fails TorchVision. I have a strong feeling that it's because the iterator object is never properly cleaned up.

torchdata/datapipes/iter/util/converter.py

SvenDS9 · 2023-03-09T18:10:21Z

I think so as well.

@pmeier Thank you for linking pytorch/vision#6997 I think I know what's happening now!
Forcefully depleting the dp (e.g by adding a Prefetcher in cub200 with a large enough buffer) which defeats the purpose of this PR - saving space - solves the issues with the test.

Probably in the test not every value is retrieved from the map, therefore the map is never fully loaded and the iter of the previous dp never "finishes".

How would we address this if we want lazy loading?

ejguan · 2023-03-09T19:33:03Z

Forcefully depleting the dp (e.g by adding a Prefetcher in cub200 with a large enough buffer) which defeats the purpose of this PR - saving space - solves the issues with the test.

I think it's the problem that the iterator which opens file handles never reaches finally clause to close them e,g, code pointer. The reason depleting works is it would forcefullly to exit the iterator to execute finally. And, that's the reason I think removing self.itr would help.

pmeier · 2023-03-10T13:12:17Z

@SvenDS9 seems like the test now displays a proper error message: https://github.com/pytorch/data/actions/runs/4384715686/jobs/7676523545#step:9:2098. So the failure is still ongoing, but at least it is now clear what is happening.

pmeier · 2023-03-10T21:04:17Z

Just as a heads-up: feel free to change anything under torchvision.prototype.datasets if it doesn't work well with torchdata. This is very much work in progress and nothing is set in stone. So if there is a better way to solve the issue we had that unblocks this PR, send a PR to torchvision and ping me there.

SvenDS9 · 2023-03-13T08:57:19Z

Probably in the test not every value is retrieved from the map, therefore the map is never fully loaded and the iter of the previous dp never "finishes".

I am pretty sure that is what happens. That also explains why removing the reference of iterator when it's depleted doesn't help as this code is never reached. As the iterator hasn't finished yet, the stream needs to remain open so that more elements can be retrieved from the dp. So in a way this is expected behavior.

I think we should make load_map() public so that users can choose to load the entire map in one go. Currently (in the nightly build) this is done the first time an element is requested from the map.

To fix the tests in torchvision we could either:

Somehow make sure that the dp is depleted in the test. This seems non-trivial as in cub200 the map isn't used directly but instead itertomap.__getitem__ is used as a map function for two other dps.
Explicitly call load_map() at https://github.com/pytorch/vision/blob/7d2acaa7d7fc600fa08fca18e9230f8651147025/torchvision/prototype/datasets/_builtin/cub200.py#L200 in cub200.

WDYT @ejguan @pmeier ?

pmeier · 2023-03-13T10:45:21Z

Of the two options provided, I strongly favor 2. since 1. would only fix the test, but not the behavior. I'll let @ejguan comment on whether the actual proposal is the way to go here.

That being said, we are currently not actively working on the datasets and thus I'm also ok with 1. to unblock. However, this means if we pick this up again in the future, we need a proper solution then.

ejguan · 2023-03-13T16:05:27Z

So, I guess we need to figure out a way to let users to indicate when they have done with MapDataPipe then deleting/depleting the iterator of prior DataPipe (it would be better if we can make it automatically).
Another note: adding __del__ to itertomap won't help because it's only invoked periodically by gc. So, at the time of TorchVision's test, there is a chance that self._itr has not been removed then file handles are still not closed.

Here is my third proposal, which requires changes from PyTorch, TorchData and TorchVision. In PyTorch Core, add a base callback function for all MapDataPipe at the end of epoch. For cub200, instead of using Mapper, we can rely on MapKeyZipper to combine them then using merge_fn to drop the value from split_dp. In TorchData, MapKeyZipper knows when the IterDataPipe ends, then invoke the callback function for the MapDataPipe.

Any suggestion is welcomed!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 15, 2023

SvenDS9 mentioned this pull request Mar 9, 2023

test_prototype_datasets_builtin: Properly close all streams pytorch/vision#7403

Merged

ejguan reviewed Mar 9, 2023

View reviewed changes

torchdata/datapipes/iter/util/converter.py Show resolved Hide resolved

torchdata/datapipes/iter/util/converter.py Outdated Show resolved Hide resolved

SvenDS9 added 5 commits March 10, 2023 13:49

Make iter to map conversion more lazy

bd73431

Improve readability

4f947ad

Fix mypy issues

377924e

Fix state methods and actually fix mypy issues

70a0f3c

Migrate to doctest

ce70b45

SvenDS9 force-pushed the make_itertomap_morelazy branch from 606ff72 to ce70b45 Compare March 10, 2023 12:58

Address PR comments

0340639

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make itertomap loading more lazy #1016

Make itertomap loading more lazy #1016

SvenDS9 commented Feb 15, 2023

SvenDS9 commented Feb 24, 2023

pmeier commented Mar 9, 2023

ejguan commented Mar 9, 2023

ejguan left a comment

SvenDS9 commented Mar 9, 2023 •

edited

ejguan commented Mar 9, 2023

pmeier commented Mar 10, 2023

pmeier commented Mar 10, 2023

SvenDS9 commented Mar 13, 2023 •

edited

pmeier commented Mar 13, 2023

ejguan commented Mar 13, 2023

Make itertomap loading more lazy #1016

Are you sure you want to change the base?

Make itertomap loading more lazy #1016

Conversation

SvenDS9 commented Feb 15, 2023

Changes

SvenDS9 commented Feb 24, 2023

pmeier commented Mar 9, 2023

ejguan commented Mar 9, 2023

ejguan left a comment

Choose a reason for hiding this comment

SvenDS9 commented Mar 9, 2023 • edited

ejguan commented Mar 9, 2023

pmeier commented Mar 10, 2023

pmeier commented Mar 10, 2023

SvenDS9 commented Mar 13, 2023 • edited

pmeier commented Mar 13, 2023

ejguan commented Mar 13, 2023

SvenDS9 commented Mar 9, 2023 •

edited

SvenDS9 commented Mar 13, 2023 •

edited