This research report was conducted as part of the PB312: Research Apprenticeship module during my BSc.
I applied NLP techniques including topic modeling (latent Dirichlet allocation) and supplementary co-occurrence network analysis to examine the latent structure of dialogue norms across 11,000 subreddits extracted using Reddit API. Models were refined using a range of hyperparameter optimisation techniques, including k-fold cross-validation with parallel computing.
This socio-technical investigation aimed to extend our understanding of the typology of community rules on online platforms essential to designing moderation practices and technologies.
Repository contains research report and analysis script coded in R.