Skip to content
This repository has been archived by the owner on Aug 19, 2021. It is now read-only.

Gabighz/Digital-Aristotle

Repository files navigation

Digital Aristotle

Generic badge Codacy Badge

The purpose of this project is to create a chatbot that will augment studying for Computer Science students.

Development Setup

Assuming a completely clean/fresh installation (of Ubuntu or a similar distro) or lack of development tools, run the following commands from a directory above your clone of this repository:

sudo apt install python3 python3-pip libpq-dev python3-dev -y
sudo apt install postgresql
sudo service postgresql start
pip3 install virtualenv
virtualenv Digital-Aristotle
cd Digital-Aristotle
source bin/activate
pip install psycopg2
pip install psycopg2-binary
pip install -r requirements.txt
python download_nltk_data.py

To start the Django server for the first time (which uses PostgreSQL):

sudo -u postgres psql
postgres=# CREATE DATABASE chatbot;
postgres=# CREATE USER admin WITH PASSWORD 'password';
postgres=# ALTER USER admin CREATEDB;
postgres=# \q
python manage.py migrate
python manage.py createsuperuser
python manage.py runserver

If you choose to install new packages via pip install and generate a new requirements file, you can avoid issues with Python2 packages in the lib directory of your virtual environment with:

pipreqs . --force --ignore lib

If you are using macOS, I highly recommend following this article.

Troubleshooting

If you have issues cloning the repository such as RPC failed or Early EOF:

git clone <Repository URL> --depth 1
cd <repo>
git fetch --unshallow

If this does not solve the issue, follow github's steps to form a ssh key and clone the repository via the URI: https://help.github.com/en/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent

Information Retrieval System

Contains a keyword extractor, a keyphrase extractor, and an AIML generator.

This system is used to create AIML files from lecture notes, such as PDF and PPTX files. Firstly, the lecture notes are converted to XML files. This happens automatically when the website's administrator uploads a PDF or PPTX file.

The keyword extractor(could be improved):

  • Reads all the data in the XML files, identifying all words and their XML features.
  • Using a custom-made feature selection module, classification features are attached to each word.
  • The classification features as summed up to represent data points.
  • These data points are fed to a K-means classification system, with a parameter specifying a maximum of two clusters (keywords and non-keywords)
  • Every keyword is stored in a list for further use in the keyphrase extractor.

The keyphrase extractor(incomplete):

  • Reads all the keywords from the array produced by the keyword extractor.
  • Extracts all sentences which contain keywords from the XML files.
  • For each keyword, every phrase that contains it is ranked. The highest-ranking phrase is then considered a keyphrase.
  • Every keyword-keyphrase pair is stored in a list for further use in the AIML generator.

The AIML generator(incomplete):

  • Reads the output of the Keyword Extractor to generate AIML patterns (questions).
  • Reads the output of the Keyphrase Extractor to generate AIML templates (answers).
  • AIML categories (pairs of patterns and templates) are stored for use on the website.

Contributing

  1. Fork it
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -m 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request

Support

Please open an issue for support.

License

General Public License, Version 3