Skip to content

tertiarycourses/TextMiningR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Text Mining with R

These are the exercise files used for Text Mining with R course.

The course outline can be found in

https://www.tertiarycourses.com.sg/text-mining-with-r.html

https://www.tertiarycourses.com.my/text-mining-with-r-malaysia.html

Module 1: Introduction

  • What is text mining
  • Applications of text mining

Module 2: Basic Text Functions

  • Text manipulation functions
  • Working with strings
  • Working with gsub
  • Advanced methods
  • Convert to corpus

Module 3: Importing Data

  • Converting docx into corpus
  • Converting pdf into corpus
  • Converting html to corpus
  • Web scraping

Module 4: Tidytext Package

  • Tidying text objects
  • Tidying document term matrix objects
  • Tidying document frequency matrix objects
  • Tidying corpus objects
  • Mining literacy works

Module 5: Word Frequencies & Relationships

  • Pre-processing text
  • Wordcloud
  • Frequency analysis
  • nGrams & bigrams
  • Bigrams for sentiment analysis
  • Visualizing bigrams network

Module 6: Sentiment Analysis

  • Sentiment libraries
  • Analyzing positive & negative words
  • Comparing 3 sentiment libraries
  • Common positive & negative words

Module 7: Topic Modelling

  • Latent Semantic Indexing (LSI)
  • Latent Dirichlet Allocation (LDA)
  • Word topic probabilities
  • Document - topic probabilities
  • Chapters probabilities
  • Per document classification

Module 8: Document Similarity & Classifier

  • Text alignment & pairwise comparison
  • Minihashing and locality sensitive hashing
  • Extract key words 
  • Classify by location, language, topic

Module 9: Working internet and social media (Optional)

  • Extracting data from amazon
  • Extracting data from twitter
  • Extracting youtube comments
  • Extracting facebook comments