New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture #1106
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey Meta.
I noticed in the llama one paper it states:
Except I don't see a "difference" in that paper indicating the model is decoder-only.
I noticed in the llama two paper it states:
These publications lead me to believe llama one and two are encoder-decoder models based on the original 2017 transformer architecture. Reading the code in this repo reads as if the model is a decoder-only model which is stated clearly for the new llama three. Can you confirm what the llama one and two architectures are and potentially document that perhaps in this repo?
The text was updated successfully, but these errors were encountered: