These are the exercise files used for Text Mining with R course.
The course outline can be found in
https://www.tertiarycourses.com.sg/text-mining-with-r.html
https://www.tertiarycourses.com.my/text-mining-with-r-malaysia.html
Module 1: Introduction
- What is text mining
- Applications of text mining
Module 2: Basic Text Functions
- Text manipulation functions
- Working with strings
- Working with gsub
- Advanced methods
- Convert to corpus
Module 3: Importing Data
- Converting docx into corpus
- Converting pdf into corpus
- Converting html to corpus
- Web scraping
Module 4: Tidytext Package
- Tidying text objects
- Tidying document term matrix objects
- Tidying document frequency matrix objects
- Tidying corpus objects
- Mining literacy works
Module 5: Word Frequencies & Relationships
- Pre-processing text
- Wordcloud
- Frequency analysis
- nGrams & bigrams
- Bigrams for sentiment analysis
- Visualizing bigrams network
Module 6: Sentiment Analysis
- Sentiment libraries
- Analyzing positive & negative words
- Comparing 3 sentiment libraries
- Common positive & negative words
Module 7: Topic Modelling
- Latent Semantic Indexing (LSI)
- Latent Dirichlet Allocation (LDA)
- Word topic probabilities
- Document - topic probabilities
- Chapters probabilities
- Per document classification
Module 8: Document Similarity & Classifier
- Text alignment & pairwise comparison
- Minihashing and locality sensitive hashing
- Extract key words
- Classify by location, language, topic
Module 9: Working internet and social media (Optional)
- Extracting data from amazon
- Extracting data from twitter
- Extracting youtube comments
- Extracting facebook comments