Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about GODEL_XL (GPT-J) model size #19

Open
wooters opened this issue Jul 20, 2022 · 3 comments
Open

Question about GODEL_XL (GPT-J) model size #19

wooters opened this issue Jul 20, 2022 · 3 comments

Comments

@wooters
Copy link
Contributor

wooters commented Jul 20, 2022

First of all, thank you for making this work public!

I'm curious about the model size shown in the README for the released GODEL_XL model (based on GPT-J). In the table in the README it lists the model size as "2.7B". My understanding is that GPT-J has 6B parameters.

Is the number of parameters for GODEL XL listed in the README correct?

@meatflavourdev
Copy link

The parameter count isn't a very reliable statistic of the model's capability. With newer models that exploit sparsely connected networks and model distillation, one can darastically reduce the number of parameters and improve the speed and performance stats of the model. (ie. faster and better, less params)

@meatflavourdev
Copy link

meatflavourdev commented Jul 26, 2022

Also, models that exploit knowledge retrieval can run circles around large language models.
https://analyticsindiamag.com/deepminds-language-model-retro-proves-bigger-is-not-always-better/
So that's a 7B paramter model that can outperform GPT-3. (GODEL exploits knowledge retieval FYI)
It's the difference between implicit vs. explicit representations of learned data.

@wooters
Copy link
Contributor Author

wooters commented Jul 26, 2022

@meatflavourdev thanks for your responses. While I don't dispute anything you said, it doesn't address my question. To be clear, here is my issue:

  • GPT-J has 6B parameters
  • The paper says that the released GODEL_XL model was initialized from GPT-J, and so it should have 6B parameters (just as the released GODEL_B and GODEL_L models match the sizes of the T5 models from which they were initialized)
  • The table in the README says that the released GODEL_XL model has 2.7B parameters
  • The paper doesn't say anything about reducing the number of parameters for the released GODEL_XL/GPT-J model

My guess is that the number of parameters listed in the table in the README of this repo for the GODEL_XL model is a typo and it should say "6B" instead of "2.7B". This is a relatively minor point, but I was hoping that one of the authors could confirm just to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants