You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Where is the source text dataset for the Ngrams of those 73 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.
The text was updated successfully, but these errors were encountered:
It is in data/resources which contains thousands of tweets scraped using the script provided in the bin folder.
You could provide the datasets from franc to our scripts and see what they output. We provide it anonymised whatsapp messages in our final implementation as we wanted to detect sms type text, but tweets were working good and is what we provide in the library.
Where is the source text dataset for the Ngrams of those 73 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.
The text was updated successfully, but these errors were encountered: