Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reddit Data #41

Open
KeremTurgutlu opened this issue Feb 18, 2023 · 3 comments
Open

Reddit Data #41

KeremTurgutlu opened this issue Feb 18, 2023 · 3 comments

Comments

@KeremTurgutlu
Copy link

Data preparation involves downloading reddit comment and submission data form https://files.pushshift.io/reddit/ and it is written that total data is around 700GB. However, the actual size of the data is around ~2TB, for training GODEL unitl which YYYY-MM reddit data you've used?

@KyriaAnnwyn
Copy link

There is no data at this link now

@KeremTurgutlu
Copy link
Author

You can torrent it.

@KyriaAnnwyn
Copy link

You can torrent it.

Thank you, your reply is just super!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants