You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in the following snippet, that the std of nn.Embedding is set to 0.02:
def _init_weights(self, module):
if isinstance(module, nn.Linear):
std = 0.02
if hasattr(module, 'NANOGPT_SCALE_INIT'):
std *= (2 * self.config.n_layer) ** -0.5
torch.nn.init.normal_(module.weight, mean=0.0, std=std)
if module.bias is not None:
torch.nn.init.zeros_(module.bias)
elif isinstance(module, nn.Embedding):
torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
The official implementation sets it to 0.01 as noted in the video. In only matters for positional embeddings due to weight sharing scheme of wte and lm_head
The text was updated successfully, but these errors were encountered:
I noticed that in the following snippet, that the std of nn.Embedding is set to 0.02:
def _init_weights(self, module):
if isinstance(module, nn.Linear):
std = 0.02
if hasattr(module, 'NANOGPT_SCALE_INIT'):
std *= (2 * self.config.n_layer) ** -0.5
torch.nn.init.normal_(module.weight, mean=0.0, std=std)
if module.bias is not None:
torch.nn.init.zeros_(module.bias)
elif isinstance(module, nn.Embedding):
torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)
The official implementation sets it to 0.01 as noted in the video. In only matters for positional embeddings due to weight sharing scheme of wte and lm_head
I noticed that in the following snippet, that the
std
ofnn.Embedding
is set to0.02
:The official implementation sets it to
0.01
as noted in the video. In only matters for positional embeddings due to weight sharing scheme ofwte
andlm_head
The text was updated successfully, but these errors were encountered: