Skip to content
renatotn7 edited this page Mar 23, 2016 · 5 revisions

This project has a several subprojects for language processing

Statistics of unknowns words in captions

This scripts make statistics of unknowns words linking with wordnet dictionary from captions. Produces a output with csv format

Script

At directory wordsFromCaptions there is the script:

  • wordsStatisticsFromCaptions.py

Input:

For this script, must exists in same directory the files

  • known.csv (file with previous known words)
  • captions.srt(file with the captions)
  • captionsForStatistics.txt (file with other captions that will basis for the statistics)

Output:

Number of word ocurrencies ; word ; wordnet dictionary

Clone this wiki locally