This repository contains code for the DIOPTRA-L project by Haidee Kotze, Gys-Walt van Egdom, Corina Koolen and Utrecht University's Research Software Lab, and can be used to reproduce the publication
Kotze, Haidee & Janssen, Berit & Koolen, Corina & Plas, Luka & Egdom, Gys-Walt. (2021). Norms, affect and evaluation in the reception of literary translations in multilingual online reading communities: Deriving cognitive-evaluative templates from big data. Translation, Cognition & Behavior. 4. 10.1075/tcb.00060.kot.
Most of the scripts require Python 3.6. To install dependencies, run
pip install -r requirements.txt
The statistical analysis and visualization was performed in R, using the following libraries:
- coin
- dplyr
- ggplot2
- Hmisc
- irr
- lme4
- reshape2
- rstatix
- scrapers: Python scripts used to scrape reviews from Goodreads. Documentation on usage in that folder's README.
- preprocessing: Python scripts used to clean the data, and more specifically, tokenization.
- embeddings: Jupyter notebooks for training and evaluating word embeddings using word2vec. As the dataset is relatively small, the resulting embeddings were not informative for further research.
- analysis: Python scripts to collect and count translation lemmas, based on human annotations.
- collocations: Python scripts for finding collocations surrounding translation lemmas
- sentiment: Python scripts to count positive / negative and hedge terms in collocations.
- model: R scripts used to generate statistics and visualizations of the data.