We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefetcher will hang indefinitely on shutdown(), the faulthandler stack traces indicates that main thread is blocked on https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/prefetcher.py#L113 while child thread is blocked on https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/prefetcher.py#L81, but I don't know why time.sleep could block on exit.
time.sleep
Repro:
@functional_datapipe("frame_slicer") class FrameSlicer(IterDataPipe): def __init__(self, source_datapipe) -> None: self.source_datapipe = source_datapipe def __iter__(self): for fields in self.source_datapipe: video_id, seg_start, seg_end = fields for i in range(int(seg_start), int(seg_end)+1): yield (video_id, i) def generate_entries(): lines = [] # start with a prime number to make sure we have uneven dataloaders random.seed(10) for i in range(37): frame_count = random.randint(5, 10) lines.append([f'video-{i}', 10, 10 + frame_count]) return lines def build_one_datapipe(): entries = generate_entries() total_frames = sum([x[2] - x[1] + 1 for x in entries]) dp = IterableWrapper(entries) dp = dp.shuffle() dp = dp.sharding_filter() dp = dp.frame_slicer() return dp, total_frames def build_dataloader2(): dp, total_frames = build_one_datapipe() mp_rs = MultiProcessingReadingService(num_workers=2) dist_rs = DistributedReadingService() rs = SequentialReadingService(dist_rs, mp_rs) dl = DataLoader2(dp, reading_service=rs) dl.seed(2) counter = 0 video_ids = set() for data in dl: video_ids.add(data[0]) counter += 1 dl.shutdown() # hang here
PyTorch version: 2.0.0a0+gite9ebda2 Is debug build: False CUDA used to build PyTorch: 12.0 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: 12.0.1 (https://github.com/conda-forge/clangdev-feedstock d44358f44aef33e9fa7c5f93e2481ee8f1a04ab6) CMake version: version 3.19.1 Libc version: glibc-2.31 Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) [GCC 10.3.0] (64-bit runtime) Python platform: Linux-5.4.0-64-generic-x86_64-with-glibc2.10 Is CUDA available: False CUDA runtime version: 12.0.140 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: False Versions of relevant libraries: [pip3] mypy-extensions==1.0.0 [pip3] mypy-protobuf==3.3.0 [pip3] numpy==1.23.5 [pip3] pytorch3d==0.6.2 [pip3] torch==2.0.1+1684801906.cuda120.cudnn891.nccl218.ap [pip3] torch-mlir==1684442443 [pip3] torch-scatter==2.1.0 [pip3] torch-tb-profiler==0.4.1 [pip3] torchdata==0.7.0.dev20230601 [pip3] torchfile==0.1.0 [pip3] torchvision==0.15.1a0+42759b1 [conda] magma-cuda121 2.6.1 1 pytorch [conda] mkl 2020.4 h726a3e6_304 conda-forge [conda] mkl-include 2023.1.0 h84fe81f_48680 conda-forge [conda] numpy 1.23.5 py38h7042d01_0 conda-forge [conda] pytorch3d 0.6.2 pypi_0 pypi [conda] torch 2.0.1+1684801906.cuda120.cudnn891.nccl218.ap pypi_0 pypi [conda] torch-mlir 1684442443 pypi_0 pypi [conda] torch-scatter 2.1.0 pypi_0 pypi [conda] torch-tb-profiler 0.4.1 pypi_0 pypi [conda] torchfile 0.1.0 pypi_0 pypi [conda] torchvision 0.15.1a0+42759b1 pypi_0 pypi
The text was updated successfully, but these errors were encountered:
I've met the same problem. Is there any help or workarounds?
Sorry, something went wrong.
No branches or pull requests
馃悰 Describe the bug
Prefetcher will hang indefinitely on shutdown(), the faulthandler stack traces indicates that main thread is blocked on https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/prefetcher.py#L113 while child thread is blocked on https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/prefetcher.py#L81, but I don't know why
time.sleep
could block on exit.Repro:
Versions
The text was updated successfully, but these errors were encountered: