Skip to content

Machine learning question and answer system, open source software that I developed during my software developer fellowship period in 2019. Check it out and read my mind.

Notifications You must be signed in to change notification settings

masete/Machine-Learning-

Repository files navigation

Policy Question Answering System

A web application that takes a user's question, surveys open access research articles about a given policy, and returns an answer to the user.

Project Overview

Every year, millions of research articles on different policies and their consequences are published. Each of these is a rich source of information that can help policy advisors in determining the appropriate policies to implement. This application carries out a number of functions.

  1. Provide a chatbot that the user interacts with.
  2. Based on the user's answers, generate a question about a given policy consequence.
  3. Process the question to obtain keywords and then use these keywords to fetch open access research articles about the policy.
  4. Perform claim detection on the abstract of each article to determine whether the article found evidence for or against a given policy consequence.
  5. Count the total number of articles that are for or against a given policy consequence.
  6. Display this data in a visualization and also provide short summaries of the policy and its consequence.

Running locally

Make sure that you have Redis, Python 3.6+, pip, and virtualenv installed on your computer.

Clone the git repository and change into the top directory.

git clone https://github.com/Prosper21/Policy-Question-Answering-System
cd Policy-Question-Answering-System

Create a virtual environment and activate it.

virtualenv venv
. venv/bin/activate

Install all the required packages by running the command:

pip install -r requirements.txt.

Create a .env file to store your environment variables. In our case, this would have the following values.

SECRET_KEY = 'xxx' # Use your own secret key here
REDIS_URL = 'redis://localhost:6379'

We now have all the files and packages that we need to run the application on our local host. In one terminal, start the redis server by running:

redis-server

In a second terminal, change into the Policy-Question-Answering-System directory, activate the virtual environment, and start the celery workers by running:

celery worker -A answer_policy_question.celery -O fair

In a third terminal, change into the Policy-Question-Answering-System directory, activate the virtual environment, and start your application by running:

python app.py.

You should see the application running and you can access it on localhost:5000.

Deploying on Heroku

Open an account on Heroku at https://signup.heroku.com/.

Install the Heroku CLI on your computer and carry out any other required configurations. The details can be found here https://devcenter.heroku.com/articles/heroku-cli.

Clone the git repository and change into the top directory.

git clone https://github.com/Prosper21/Policy-Question-Answering-System
cd Policy-Question-Answering-System

Create an app on Heroku, which prepares Heroku to receive your source code.

heroku create

In this case, heroku generates a random name for your app. You can also provide your own name by running:

heroku create <app-name>

Because this application uses Heroku addons, we need to create those first. In our case, we need the redis addon. We do this by running:

heroku addons:create heroku-redis:hobby-dev

This will automatically set the REDIS_URL configuration variable for our application but we also need to set up our SECRET_KEY configuration variable. We do this by running:

heroku config:set SECRET_KEY = 'xxx' # Use your own secret key here

We can now deploy to Heroku by running:

git push heroku master

Go to the Heroku dashboard and makes sure that your app's web and worker dynos are both switched on.

You can now visit the app at the url generated by its name i.e. herokuapp.app-name.com or simply open the app by running:

heroku open

Sills Needed

  • Python 3.6+
  • JavaScript/HTML/CSS
  • Heroku

Notes

If you make any changes for example by installing new packages and running

pip freeze > requirements.txt

then you will have to add

https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm- 2.1.0.tar.gz#egg=en_core_web_sm

to your requirements.txt. This ensures that the required spaCy model is loaded. If this causes a 'Double requirement given' error upon deployment, look for

en-core-web-sm=='x.x.x'

in your requirements.txt file and delete it.

References

Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

TODO

Front-End

Chatbot UI

At the moment, the chatbot assumes that the user provides very specific replies rather than full sentences. For example, when the bot asks 'What policy would you like to research today?', it expects a straight answer like 'cap and trade' rather than 'I would like to research cap and trade'. The same applies to the question 'And what effect of cap and trade on carbon emissions are you interested in?' where we expect an answer like 'carbon emissions' rather than 'I would like to know its effect on carbon emissions'. The reason for this is that the user's replies are being handled using JavaScript in the HTML files that render the page and thus there is no way to apply natural language processing techniques to automatically identify the policy or phenomenon of interest from full sentence replies. I have not yet figured out a way to do the processing of full sentence replies from the backend while maintaining a conversational flow but I believe this can be done.

Another possible improvement is moving all the JavaScript code that is in the HTML files to the JavaScript folder in the static directory. The recommended practice is that JavaScript and HTML code should not me mixed.

Back-End

Duplicates

Since we are fetching abstracts from three APIs, there is a chance of the same abstract appearing more than once in our results. It would be a good idea if these duplicates could be removed but some times, an extra character or space in the duplicate essentailly makes the two versions of the abstract different. I think one could use regex to clean up the abstracts and then be able to identify the duplicates.

Claim detection

At the moment, the claim detection algorithm is still a work in progress and could be improved durther for better results.

About

Machine learning question and answer system, open source software that I developed during my software developer fellowship period in 2019. Check it out and read my mind.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published