Skip to content

A crawler for AUS' Banner Page. Probably a great tool for practicing data science and statistics.

License

Notifications You must be signed in to change notification settings

DeadPackets/AUSCrawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AUSCrawl

Lines of code GitHub GitHub package.json version GitHub issues Maintenance Open Source Love

AUSCrawl is a web scraper and crawler that scrapes AUS Banner for data on every single course, instructor, level, and attribute for every semester in AUS since 2005 and saves it in an SQLite database to be queried.

Note: There is a WIP Python re-write of this project.

Why create this project?

I created this project as a way to practice using a headless browser to scrape mass data while also learning asynchronous code, using the Sequelize ORM and optimizing my code in general. Additionally, I think the dataset this project produces can allow many others to practice data science or build applications that make use of this data.

Prerequisites

To run this project, you will need NodeJS. I recommend using any version after v14.

How to get started

  1. Download the repository: git clone https://github.com/DeadPackets/AUSCrawl
  2. Enter the project and download required libraries: cd AUSCrawl && npm install
  3. Now, simply run the project: node crawl.js
    1. Additionally, if you want verbose output, run the following: VERBOSE=true node crawl.js

Libraries used in the project

  • Chalk is used for coloring the console output
  • Sequelize is the database ORM used to save the crawled data into SQLite
  • Puppeteer is the headless browser library used to browse and crawl the data from banner.

How does it work?

I am planning on writing a blog post soon.

Contribution

Sure! Simply fork the project, add your feature/fix and make a pull request. I will review them ASAP.

About

A crawler for AUS' Banner Page. Probably a great tool for practicing data science and statistics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published