Skip to content

The corpus of texts on which the analyses of the GOBBYKID Project are based

Notifications You must be signed in to change notification settings

gobbykid/corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

Corpus (Pre-Balancing)

The corpus of texts on which the analyses of the GOBBYKID Project are based.

In particular, the main corpus is subdivided in two corpora: one containing the texts written by female authors and one containing the texts written by male authors. The file names contain the date of the first publication of the work, the surname of the author, and then the title of the book.

All texts come from Project Gutenberg and are encoded in UTF-8. Moreover, the CSV file contains some metadata about each text contained in the corpus.

About

The corpus of texts on which the analyses of the GOBBYKID Project are based

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published