Skip to content

Commit

Permalink
style(nyz): fix flake8 code style (ci skip)
Browse files Browse the repository at this point in the history
  • Loading branch information
PaParaZz1 committed Jan 27, 2025
1 parent dae7673 commit 3292384
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 8 deletions.
12 changes: 6 additions & 6 deletions ding/model/template/qvac.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
class ContinuousQVAC(nn.Module):
"""
Overview:
The neural network and computation graph of algorithms related to Actor-Critic that have both Q-value and V-value critic, such as \
IQL. This model now supports continuous and hybrid action space. The ContinuousQVAC is composed of \
four parts: ``actor_encoder``, ``critic_encoder``, ``actor_head`` and ``critic_head``. Encoders are used to \
extract the feature from various observation. Heads are used to predict corresponding Q-value and V-value or action logit. \
The neural network and computation graph of algorithms related to Actor-Critic that have both Q-value and \
V-value critic, such as IQL. This model now supports continuous and hybrid action space. The ContinuousQVAC is \
composed of four parts: ``actor_encoder``, ``critic_encoder``, ``actor_head`` and ``critic_head``. Encoders \
are used to extract the feature. Heads are used to predict corresponding value or action logit.
In high-dimensional observation space like 2D image, we often use a shared encoder for both ``actor_encoder`` \
and ``critic_encoder``. In low-dimensional observation space like 1D vector, we often use different encoders.
Interfaces:
Expand All @@ -34,7 +34,7 @@ def __init__(
actor_head_layer_num: int = 1,
critic_head_hidden_size: int = 64,
critic_head_layer_num: int = 1,
activation: Optional[nn.Module] = nn.SiLU(), #nn.ReLU(),
activation: Optional[nn.Module] = nn.SiLU(),
norm_type: Optional[str] = None,
encoder_hidden_size_list: Optional[SequenceType] = None,
share_encoder: Optional[bool] = False,
Expand Down Expand Up @@ -319,7 +319,7 @@ def compute_critic(self, inputs: Dict[str, torch.Tensor]) -> Dict[str, torch.Ten
- logit (:obj:`torch.Tensor`): Discrete action logit, only in hybrid action_space.
- action_args (:obj:`torch.Tensor`): Continuous action arguments, only in hybrid action_space.
Returns:
- outputs (:obj:`Dict[str, torch.Tensor]`): The output dict of QVAC's forward computation graph for critic, \
- outputs (:obj:`Dict[str, torch.Tensor]`): The output of QVAC's forward computation graph for critic, \
including ``q_value``.
ReturnKeys:
- q_value (:obj:`torch.Tensor`): Q value tensor with same size as batch size.
Expand Down
5 changes: 3 additions & 2 deletions ding/policy/iql.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ def asymmetric_l2_loss(u, tau):
class IQLPolicy(Policy):
"""
Overview:
Policy class of Implicit Q-Learning (IQL) algorithm for continuous control. Paper link: https://arxiv.org/abs/2110.06169.
Policy class of Implicit Q-Learning (IQL) algorithm for continuous control.
Paper link: https://arxiv.org/abs/2110.06169.
Config:
== ==================== ======== ============= ================================= =======================
Expand Down Expand Up @@ -243,7 +244,7 @@ def _init_learn(self) -> None:

self._tau = self._cfg.learn.tau
self._beta = self._cfg.learn.beta
self._policy_start_training_counter = 10000 #300000
self._policy_start_training_counter = 10000 # 300000

def _forward_learn(self, data: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Expand Down

0 comments on commit 3292384

Please sign in to comment.