[Adam/AdamW] Update adamw.py #216

megemini · 2024-10-18T12:38:46Z

另外，貌似应该把 moment2_max 为 None 也传进去，如：

In [45]: out = _legacy_C_ops.adamw( ...: param, ...: grad, ...: lr, ...: moment1, ...: moment2, ...: None, # `momen2_max` ，如果不需要 `amsgrad`，则传入 `None` ...: beta1_pow, ...: beta2_pow, ...: None, # `master_param` 根据实际需要传入参数 ...: param, ...: moment1, ...: moment2, ...: None, # `momen2_max` ，如果不需要 `amsgrad`，则传入 `None` ...: beta1_pow, ...: beta2_pow, ...: None, # `master_param` 根据实际需要传入参数 ...: 'epsilon', ...: epsilon, ...: 'lazy_mode', ...: False, ...: 'min_row_size_to_use_multithread', ...: 1000, ...: 'beta1', ...: beta1, ...: 'beta2', ...: beta2, ...: "with_decay", ...: True, ...: 'coeff', ...: 0.5, ...: 'multi_precision', ...: False, ...: 'lr_ratio', ...: 1.0, ...: 'amsgrad', # `amsgrad` 参数 ...: False, # `amsgrad` 参数 ...: ) In [46]: out Out[46]: (Tensor(shape=[102, 105], dtype=float32, place=Place(gpu:0), stop_gradient=True, [[ 0.20898449, 0.89920276, 0.67242330, ..., 0.26126957, 0.79362839, 0.82994431], [-0.41922054, -0.49964213, -0.72876191, ..., 0.64584875, 0.38303095, 0.07835867], [ 0.82518733, -0.13006617, -0.18193051, ..., -0.83834726, -0.48943013, 0.28921935], ..., [-0.11833674, -0.87520474, 0.71153826, ..., 0.88105798, -0.84247899, -0.03978884], [ 0.03530697, -0.51926482, -0.60509771, ..., -0.93831873, -0.40703350, 0.06399230], [-0.96511489, 0.76393193, 0.27214301, ..., -0.11625432, -0.12905845, -0.89011657]]), Tensor(shape=[102, 105], dtype=float32, place=Place(gpu:0), stop_gradient=True, [[ 0.11505818, -0.12621984, -0.01848486, ..., 0.22509800, -0.09953045, 0.56122953], [ 0.05467438, 0.15807387, -0.00716698, ..., -0.16195901, -0.27719164, 0.73311120], [-0.21436734, -0.12942475, -0.19450223, ..., 0.26065761, -0.09446586, 0.61139995], ..., [ 0.01895352, -0.44893426, -0.35492349, ..., 0.78992486, -0.46006823, 0.70998996], [-0.83486271, 0.08806995, -0.31217605, ..., -0.04586679, 0.63772619, -0.62238657], [ 0.40758044, 0.40442133, 0.29918492, ..., 0.60868609, 0.73768240, 0.27699226]]), Tensor(shape=[102, 105], dtype=float32, place=Place(gpu:0), stop_gradient=True, [[0.04723934, 0.06505238, 0.23348683, ..., 0.10994066, 0.22805302, 0.44461903], [0.03345066, 0.11195458, 0.18727647, ..., 0.22885345, 0.15290459, 0.71300763], [0.08683055, 0.22121914, 0.06403468, ..., 0.06969104, 0.13063110, 0.51002014], ..., [0.11580127, 0.34216970, 0.38666964, ..., 0.77789527, 0.38964579, 0.50754905], [0.68814623, 0.03124339, 0.27777275, ..., 0.22106472, 0.46024293, 0.40622491], [0.32853597, 0.28381625, 0.27526388, ..., 0.47823176, 0.61193144, 0.35178134]]), Tensor(Not initialized), Tensor(shape=[], dtype=float32, place=Place(gpu:0), stop_gradient=True, 0.01142096), Tensor(shape=[], dtype=float32, place=Place(gpu:0), stop_gradient=True, 0.03978442), Tensor(Not initialized))

对应的 moment2_max_out 为 Tensor(Not initialized) ～

-Original file line number
+Diff line change
@@ Expand Up / @@ -117,7 +117,7 @@ def step(self): @@
                         sub_exp_avg_sq = paddle.gather(
                             exp_avg_sq, index, axis=axis)
-                        _, _, _, _, _, _ = _C_ops.adamw(
+                        _, _, _, _, _, *_ = _C_ops.adamw(
                             sub_p, grad,
                             paddle.to_tensor(lr), sub_exp_avg, sub_exp_avg_sq,
                             beta1_pow, beta2_pow, master_param, sub_p, sub_exp_avg,
@@ Expand All / @@ -133,7 +133,7 @@ def step(self): @@
                         exp_avg_sq.scatter_(index, sub_exp_avg_sq)
                     else:
-                        _, _, _, _, _, _ = _C_ops.adamw(
+                        _, _, _, _, _, *_ = _C_ops.adamw(
                             p, grad,
                             paddle.to_tensor(lr), exp_avg, exp_avg_sq, beta1_pow,
                             beta2_pow, master_param, p, exp_avg, exp_avg_sq,
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Adam/AdamW] Update adamw.py #216

Diff view

Diff view

There are no files selected for viewing

megemini Oct 18, 2024

Uh oh!

[Adam/AdamW] Update adamw.py #216

[Adam/AdamW] Update adamw.py #216

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

megemini Oct 18, 2024

Choose a reason for hiding this comment

Uh oh!