Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture #1106

Open
daniel-deychakiwsky opened this issue Apr 21, 2024 · 0 comments
Open

Architecture #1106

daniel-deychakiwsky opened this issue Apr 21, 2024 · 0 comments

Comments

@daniel-deychakiwsky
Copy link

daniel-deychakiwsky commented Apr 21, 2024

Hey Meta.

I noticed in the llama one paper it states:

2.2 Architecture
Following recent work on large language models,
our network is based on the transformer architecture (Vaswani et al., 2017). We leverage various
improvements that were subsequently proposed,
and used in different models such as PaLM. Here
are the main difference with the original architecture, and where we were found the inspiration for
this change (in bracket):

Except I don't see a "difference" in that paper indicating the model is decoder-only.

I noticed in the llama two paper it states:

2.2 Training Details
We adopt most of the pretraining setting and model architecture from Llama 1. We use the standard
transformer architecture (Vaswani et al., 2017), apply pre-normalization using RMSNorm (Zhang and
Sennrich, 2019), use the SwiGLU activation function (Shazeer, 2020), and rotary positional embeddings
(RoPE, Su et al. 2022). 

These publications lead me to believe llama one and two are encoder-decoder models based on the original 2017 transformer architecture. Reading the code in this repo reads as if the model is a decoder-only model which is stated clearly for the new llama three. Can you confirm what the llama one and two architectures are and potentially document that perhaps in this repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant