[AutoParallel] Fix dataloader in to_static mode. #64334

GhostScreaming · 2024-05-15T10:07:03Z

PR Category

Auto Parallel

PR Types

Bug fixes

Description

Pcard-73145
Fix dataloader in to_static mode. Get data from DataLoader iterator directly may affect data generation randomness of BatchSampler when Shuffle=True. It may cause difference of data feeding between dynamic and to_static mode.

paddle-bot · 2024-05-15T10:07:08Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhiqiu

add unittest for it

zhiqiu · 2024-05-15T11:46:41Z

python/paddle/distributed/auto_parallel/static/engine.py

@@ -274,7 +274,8 @@ def __init__(
 def _prepare_data_spec_from_dataloader(self, dataloader):
 inputs_spec = []
 labels_spec = []
- data = next(iter(dataloader))
+
+ data = dataloader._get_input_spec()


if hasattr(dataloader, _get_input_spec):
data = dataloader._get_input_spec()
else:
data = next(iter(dataloader))

只有ShardDataloader会调用_prepare_data_spec_from_dataloader

下面的else分之会出现shuffle的问题吗？

_prepare_data_spec直接取dataset，对于IterableDataset，不支持Shuffle，不存在这个问题。对于普通Dataset，直接取第 0 个数据，不会影响随机性。

zhiqiu · 2024-05-15T11:49:35Z

python/paddle/distributed/auto_parallel/api.py

+ # of BatchSampler when `Shuffle=True`. It may cause difference of data feeding
+ # between dynamic and to_static mode.
+ def _get_input_spec(self):
+ batch_data = self._dataloader.batch_sampler.dataset.__getitem__(0)


what about IterableDataset?

IterableDataset 也有__getitem__(self, index)方法

这里会再修改一下，对IterableDataset还是需要特殊处理一下。

zhiqiu · 2024-05-15T11:50:06Z

python/paddle/distributed/auto_parallel/api.py

+ collate_fn = self._dataloader.collate_fn
+ batch_data = collate_fn(batch_data)
+ if isinstance(batch_data, dict):
+ batch_data = [batch_data]


batch data is already a tensor?

getitem(0)取出来的是一个np.ndarray

为啥这个分支不需要转to tensor呢？

这里的顺序写错了。collate_fn 会把 batch 遍历一遍，需要输入 batch_data 是一个 list。完成collate_fn 后，_get_batch 需要输入为 Tensor 。

… fix_dataloader

zhiqiu

LGTM

[AutoParallel] Fix dataloader in to_static mode.

d22d02f

zhiqiu reviewed May 15, 2024

View reviewed changes

GhostScreaming added 10 commits May 16, 2024 11:42

Polish code according to review comment.

6e336c8

Polish code.

41c71e6

Fix some problems.

1a74d67

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6187eb8

… fix_dataloader

Polish code.

22af11a

Polish code.

45819ad

Add test case.

40fa978

Polish code.

4f8da32

Polish code.

75956a5

Polish code.

0368fb0

zhiqiu approved these changes May 29, 2024

View reviewed changes

GhostScreaming merged commit f406545 into PaddlePaddle:develop May 30, 2024
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Fix dataloader in to_static mode. #64334

[AutoParallel] Fix dataloader in to_static mode. #64334

GhostScreaming commented May 15, 2024

paddle-bot bot commented May 15, 2024

zhiqiu left a comment

zhiqiu May 15, 2024

GhostScreaming May 16, 2024

zhiqiu May 16, 2024

GhostScreaming May 16, 2024

zhiqiu May 15, 2024

GhostScreaming May 16, 2024 •

edited

GhostScreaming May 16, 2024

zhiqiu May 15, 2024

GhostScreaming May 16, 2024

zhiqiu May 16, 2024

GhostScreaming May 16, 2024

zhiqiu left a comment

[AutoParallel] Fix dataloader in to_static mode. #64334

[AutoParallel] Fix dataloader in to_static mode. #64334

Conversation

GhostScreaming commented May 15, 2024

PR Category

PR Types

Description

paddle-bot bot commented May 15, 2024

zhiqiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GhostScreaming May 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

GhostScreaming May 16, 2024 •

edited