-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
other sources? #194
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm also looking for a decent training set for casual conversations, actually for a language learning chatbot.
But it seems this project only has ~ 200k of logs. It's a start but...
What other sources do you know? I'm sharing some info hope others can also suggest where to look
Cornell's convokit
provides an API onto some really good sets like the famous movie dialogue corpus and also a structured API for some subreddits
https://convokit.cornell.edu/
Facebook's Parl.ai
has a standardized API to lots of datasets
https://parl.ai/about/
eg. https://arxiv.org/pdf/1801.07243.pdf
tatoeba
has a good sentence database but no conversation turns
https://tatoeba.org/eng/
I'm keeping archives of a few things I find. Here are a bunch of logs for teach English conversation
https://github.com/dcsan/corpus/blob/master/convo/esl-china/esl06.csv
some of which could be converted for use here.
What other sources have people found for conversations?
The text was updated successfully, but these errors were encountered: