Skip to content

Latest commit

 

History

History
139 lines (95 loc) · 9.26 KB

README.md

File metadata and controls

139 lines (95 loc) · 9.26 KB

My GitHub Activity Dashboard

pre-commit.ci status

Jupyter-based dashboards to help visualise activity in issues and Pull Requests across many repositories and organisations - all in one place!

Click here to view the activity dashboard! 👉 Binder

Click here to view the past activity summary! 👉 Binder


Table of Contents:

How the dashboards work

Python script

get-data.py is a Python script that makes calls to the GitHub REST API in order to collect information about issues and pull requests. It specifically makes requests to the search endpoint which allows us search for issues and pull requests as we would expect to do so in GitHub's own search bar. For example, is:issue is:open assignee:sgibson91 would return all open issues assigned to me. This turned out to be much more efficient than using the 'list issues assigned to the authenticated user' endpoint since it made fewer individual requests and, therefore, wouldn't rate-limit the script.

The script searches for all issues and pull requests that meet the following criteria:

  • the user is either assigned to or has created them,
  • they involve the user and were closed in the last month,
  • they involve the user and were closed or updated in the last week;
  • and, any pull requests where the user's review has been requested.

The results are compiled into a pandas dataframe, along with some metadata, and then written to CSV file called github-activity.csv.

You can provide a .repoignore file to prevent results from specific repos turning up the the dataset. This is a plain text file with a repository to be ignored on each new line. The repository to be ignored is represented by the form ORG_OR_USER/REPO_NAME. You can also use regular expressions here as well. E.g., if you would like to ignore a whole organisation, this would look like ORG_NAME/.*.

Continuous Delivery of data

The get-data.py script is run in a GitHub Actions workflow on a regular cron trigger. This cron job runs as if running the script locally and commits the updated CSV file to the main branch.

Visualising the data

The data are visualised using the activity-dashboard.ipynb and past-activity-summary.ipynb Jupyter Notebooks. They each implement widgets to interact with the data so that users can filter by an individual repository and sort by time created, updated, or closed (past activity summary only). The Notebooks are executed with voila in order to give the dashboards a more aesthetically pleasing look.

Binder and nbgitpuller

The dashboards can be launched in Binder to generate a quick view without needing to use the repository locally. Binder usually rebuilds the Docker image of the repository with every new commit it sees on the provided git reference. However since the CSV file is regularly updated, this meant Binder was rebuilding a lot when it didn't need to since only the data were changing - not the Notebooks or the environment required by the Notebooks.

To mitigate the number of rebuilds Binder would need to make, the requirements.txt file containing only the packages needed to run the Notebooks has been separated out onto the notebook-env branch. This is the branch we build with Binder. We then use nbgitpuller to dynamically pull in the content from the main branch. This results in a Binder environment that is only rebuilt when the Notebooks' requirements are changed, but still operates with the most up-to-date data from the main branch.

Binder needs BOTH the main branch and the notebook-env branch to operate in this way! If you are using this project as a template or forking it, DO NOT remove the notebook-env branch without ALSO updating the Binder link!

Get your own dashboards!

  1. Create your own version of this repository by clicking the "Use this template" button at the top of this page. :fire: Make sure to check the "Include all branches" box when creating your repo, as you will need the notebook-env branch as well for the Binder links to work! 🔥 You can delete any other branches, except for main and notebook-env.

    include-all-branches

  2. Delete the github-activity.csv file from your repo. (It will be regenerated when the CI job next runs!)

  3. Delete the .repoignore file or edit it contain a list of repos you'd like excluded from the dataset, in the form ORG_OR_USER/REPO_NAME.

  4. Create a Personal Access Token with public_repo scope and add it as a repository secret called ACCESS_TOKEN

  5. Edit the README and update the Binder badges at the top of the document, replacing all instances of {{ YOUR_GITHUB_HANDLE_HERE }} (including {{}}!!!) with your GitHub handle in the below snippet:

    [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/{{ YOUR_GITHUB_HANDLE_HERE }}/github-activity-dashboard/notebook-env?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252F{{ YOUR_GITHUB_HANDLE_HERE }}%252Fgithub-activity-dashboard%26urlpath%3D%252Fvoila%252Frender%252Fgithub-activity-dashboard%252Factivity-dashboard.ipynb%26branch%3Dmain)
    [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/{{ YOUR_GITHUB_HANDLE_HERE }}/github-activity-dashboard/notebook-env?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252F{{ YOUR_GITHUB_HANDLE_HERE }}%252Fgithub-activity-dashboard%26urlpath%3D%252Fvoila%252Frender%252Fgithub-activity-dashboard%252Fpast-activity-summary.ipynb%26branch%3Dmain)

    🚨 Be careful not to edit anything else in the URL! 🚨

You can either get started straight away by manually triggering the 'Update GitHub Activity' workflow or wait for the cron job to run it for you to produce your github-activity.csv. Once that has been added to your repo, click your edited Binder badges to see your dashboards!

Using the tools locally

Installation requirements

This project requires a Python installation. Any minor patch of Python3 should suffice, but that hasn't been tested so proceed with caution!

The packages required to run this project are stored in requirements.txt and can be installed via pip:

pip install -r requirements.txt

Getting the data

  1. If you have not already done so, create a Personal Access Token with the public_repo scope

  2. Add this as a variable called ACCESS_TOKEN to your shell environment

    export ACCESS_TOKEN="PASTE YOUR TOKEN HERE"
  3. Run the Python script to generate the github-activity.csv file

    python get-data.py

🚨 If you see the message "You are rate limited! 😱", you will need to wait ~1hour before trying to run the script again 🚨

Viewing the dashboards

Once github-activity.csv has been generated, view the dashboards by running:

voila activity-dashboard.ipynb
voila past-activity-summary.ipynb

A browser window should be automatically opened.