Skip to content

ebertti/language-resource

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Resource

Collection of stopwords, frequent words and other things.

To help a build application with NLP (Natural Language Processing) like:

  • Stemming
  • Text simplification
  • Text-to-speech
  • Text-proofing
  • Natural language search
  • Query expansion
  • Automated essay scoring
  • Truecasing

or Search Engines like:

Languages

Language ISO 639-1 Name Stopwords Frequent Words Obs
bg Bulgarian Yes No UTF-8
cz Czech Yes No UTF-8
de German Yes Yes
en English Yes Yes
es Spanish Yes + Yes
fi Finnish Yes Yes
fr French Yes Yes
hu Hungarian Yes No UTF-8
it Italian Yes Yes UTF-8
pl Polish Yes No UTF-8
pt Portuguese Yes + No
ru Russian Yes No UTF-8
sv Swedish Yes Yes

Reference

Almost everything was extract from http://members.unine.ch/jacques.savoy/clef/

Contributing

Make a fork, do your changes and request a pull.

Please, also do the modifications on this readme file!

Thanks for your help!

About

Collection of stopwords, frequent words and other things.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published