Skip to content

A machine-learning based open-domain QA chatbot from scratch 🤖

License

Notifications You must be signed in to change notification settings

nileshsah/RoboMax

Repository files navigation

RoboMax 🤖

Meet RoboMax! Your personal assistant for all your queries about the US election ¯\(ツ)/¯

The Project

This repository contains self-sustained Jupyter notebooks used for training our assistant RoboMax to make it capable enough to answer open-domain questions. We use tweets as a source for our knowledge-base and attempt to reflect back the opinion of the world about your question of interest. At the moment, we've tweaked our RoboMax to answer questions about the 2016 US election from tweets gracefully made available at https://www.kaggle.com/kinguistics/election-day-tweets/#election_day_tweets.csv

Getting Started

The notebook robomax-training-notebook.ipynb serves as the starting point for this project which constitutes of the major data exploration and feature engineering tasks.

The notebook robomax-election-tweets-bot.ipynb involves tweaking RoboMax in order to answer questions based on election tweets.

Dataset

Due to the unavailability of a twitter based question-answer dataset, we resorted to using the pretty standard SQuAD reading comprehension dataset in a modified way. Instead of predicting the factual answers, we trained our model to identify the sentence containing the required answer.

Training

We built our model based on rather nominal features with a baseline Random Forest Classifier which leaves a huge scope for improvement. AuC served as our metric to optimize due to the traditional class imbalance issue. We aimed to improve the recall for the sentences containing the correct answer over our prediction precision.

Prediction

We use a combination of indexing, predicting and summarizing to formulate an answer to the given question. Whoosh serves as our go-to indexing library. Our pre-trained model generates scores for the results from the indexer in terms of which tweet is closest to the question followed by capping the best results using an Edmundson summarizer to finally bake up an answer.

About

A machine-learning based open-domain QA chatbot from scratch 🤖

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published