edx-discussion board crawler

edx-discussion board crawler is a Python-based cross-platform tool for mining text data from discussion threads of the enrolled edX courses available on a user's dashboard. THe edx discussion board is a dynamic page, thus we utilize the webdriver and selenium API to scrape the content. It was developed by teaching assistants at Tokyo Tech Online Education Development Office.

Prerequisites

Python libraries and modules:

Python - version 3.5+
Selenium with Python - API to access webdriver
Chrome webdriver - Chrome webdriver which requires the installation of official Chrome brower.

How to run

Before running, please make sure that the desired course(s) have already been listed in dashboard of your edx account.

Download Chrome webdriver from the link provided at Prerequisites.
Add your edX account and password at "account info.json"
(Optional) add the desired edX course URLs in "course table.xlsx"
open terminal and change a directory to the DB crawler folder
Run a python script edx-discussion.py with the following command

python edx-discussion.py
The program will ask you the option for crawling.

type 1 and press enter if you want to manually select courses to be crawled
type any number and press enter if you want the program to crawl courses listed in course table.xlsx file

the crawler starts

The output is a JSON file ('all_dis2.json'), which is stored by default in "HTMLs[COURSE_NAME]" folder .

Extra files and folders

[DATETIME]_selected_file.csv contain the list of selected courses for crawling

[DATETIME]_logfile_discussion.csv is the log file

Test environment

Python 3.5, Windows 10 and Ubuntu 14 and 16

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
account info.json		account info.json
course table.xlsx		course table.xlsx
edx-discussion.py		edx-discussion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

edx-discussion board crawler

Prerequisites

How to run

Extra files and folders

Test environment

About

Releases

Packages

Languages

TokyoTechX-TAs/edXDBcrawler

Folders and files

Latest commit

History

Repository files navigation

edx-discussion board crawler

Prerequisites

How to run

Extra files and folders

Test environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages