This project provides insights into the type of YouTube content you consume and how you use the platform. It analyzes the number of videos you watch, when you watch them, their topics, and tags. The analysis is performed for various time periods, allowing you to understand how your viewing habits have changed over time.
git clone https://github.com/mattiaferrarini/YouTube-History-Analysis.git
pip install -r requirements.txt
Create a project in the Google Developers Console and obtain an API key. Create a file named .env
in the project folder and add the created key to it as follows:
YOUTUBE_API_KEY=your-key
You can download your YouTube history from Google Takeout.
Click Deselect all
at the top of the export panel. Then scroll all the way down to YouTube and YouTube Music
. You should check the box, then only include history
in the export and choose JSON
as the format.
Once you have downloaded your history, add it to the project folder and rename it to watch-history.json
if necessary.
You should create an isolated virtual Python environment for the project using virtualenv to avoid problems with dependencies and versions.
Install virtualenv:
pip3 install virtualenv
Create a virtual environment:
virtualenv <your-env>
Activate it:
source <your-env>/bin/activate
To deactivate it afterwards:
deactivate
Run the history_processor.py
script:
python history_processor.py
The script will process the watch-history.json
file, filtering out ads and useless data. It also queries the YouTube Data API in order to get the topics and tags of your watched videos. The final output is a processed-history.json
file.
Since the YouTube Data API has a default quota allocation of 10,000 units per day, you may need to run the script on multiple days to process the entire history.
To manage this, history_processor.py
generates a last-processed-history.json
file which stores the index of the last processed history item. If this file exists when the script is run again, processing will resume from where it left off in the previous session.
Once you have a processed-history.json
file, you can analyse it by running the analysis.ipynb
notebook.