This set of Jupyter Notebooks is set up to analyze the DTRS datasets.
Before starting, though, please check if you have the following software. If not, please follow the instructions below to install them.
This should work on Python 3.6 or older, but if you do not have Python installed on your system, you are better off installing version 3.8. Assuming you are on a Mac, download this installer here. Once the installation is done, open Terminal and type the following command:
which python3
...and make sure the response has something about 3.8.x.
If you installed Python the way above, you should have access to pip
directly.
To check, you can type the command
which pip3
If you find out that pip
is not installed, you should download the
file get-pip.py
using the following commands:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
Assuming you are still in the same directory, you can then type
python3 get-pip.py
We use a number of Python libraries for this work. Please check if you have them all, or install them using the commands provided below:
For numerical computing (matrices/vectors etc.)
pip3 install numpy
Data manipulation and analysis tool.
pip3 install pandas
Plotting library, works well with Pandas.
pip3 install seaborn
One of the newer and faster NLP libraries.
pip3 install spacy
Download the English language model for Spacy:
python3 -m spacy download en_core_web_sm
To create custom linguistic categories
pip3 install empath
Computational notebook based on Python.
pip3 install jupyter
Now that you have all the installation out of the way, you can clone this repository, or download it as a zip file (see link on top).
You should already have access to the datasets separately.
Copy the .txt
files and paste them into the output
folder.
If the folder does not exist, create it in the top-level folder that
contains the .ipynb
files (the Jupyter Notebooks).
If all goes well, you should be able to launch Jupyter using Terminal. Open Terminal, navigate to the cloned/downloaded directory, and type the following:
jupyter notebook
Sometimes there may be issues with the shortcuts (some $PATH
variables
need to be set up), and Jupyter may not launch. If this happens, try
this alternative:
python3 -m jupyter notebook
If all goes well, you should have your default browser automatically open with the notebook. If not, open your default browser and type the following into the address bar:
localhost:8888/tree
You do not need to run the notebook titled file_parsing.ipynb
. You
already have the output of that notebook in your outputs
folder.
You can go to one of the LIWC or Empath notebooks and run them.