Opinion Generation using Abstractive Text Summarization

Prerequisites To run the project make sure you have python 3.7 installed. Install the below mwntioned libraries:

Tensorflow 1.13.1
Sklearn
tqdm
NLTK
Keras
Pandas
gensim
numpy

Also download the GoogleNews-vectors-negative300.bin file for word2vec model we have used. Use this file in the multiple scripts that require this file.

Getting started: Download the dataset given in the data folder. The initial data sets are the yelp_academic_dataset_business and the yelp_academic_dataset_review.

mult_prep.py The preprocessing starts on these files. In the mult_prep.py files, give the paths of the above mentioned files (line 23 and 29) and run the process. This file may take 3-4 hours to run and generates a new file which contains all the reviews in a single file - combined_data.txt

write_to_individual_files.py Use the combined_data.txt file generated in the above step to do the next set of preprocessing. Give the path of this file in the write_to_individual_files.py (line 14) file and execute the script. This script may take upto 6 hours on a normal laptop with 16GB ram 6 GB GPU and a core i7 intel processor. After this scripts run, close to 60,000 files will be generated where each file contains the reviews for a particular restaurant

cleaned_features.py This file is used to clean the data of every file and generate the cleaned features. Provide the path of the individual review files generated in this file. This generates the features needed in the further steps.

w2v_features.py This file is used to train the word embedding model using our corpus. This file runs part of the script file birectional_lstm.py

unsupervised_summaries.py This file generates summaries in an unsupervised way. Choose any file you want to generate a summary of from the handwritten summaries folder and give the path (line 118). This gives an output of short summaries out of which you can pick the highest scored ones.

bidirectional_lstm.py Is a self sustaining script till all the underlying script are in the working directory.

Data link: https://drive.google.com/open?id=15tW9ZoX9M1wbu-wGW7DmIeF8Mc1QqesC

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
__pycache__		__pycache__
data/reviews		data/reviews
summaries handwritten/training		summaries handwritten/training
train_w2v_embedding		train_w2v_embedding
.gitattributes		.gitattributes
.gitignore		.gitignore
NLP_Project_Proposal.pdf		NLP_Project_Proposal.pdf
NLP_Report.pdf		NLP_Report.pdf
Progress_Report_NLP.pdf		Progress_Report_NLP.pdf
README.md		README.md
birectional_lstm.py		birectional_lstm.py
cleaned_features.py		cleaned_features.py
load_data.py		load_data.py
lstm_attention.py		lstm_attention.py
mult_prep.py		mult_prep.py
similarity.py		similarity.py
split_data.py		split_data.py
testlstm.py		testlstm.py
unsupervised-summaries.py		unsupervised-summaries.py
w2v_features.py		w2v_features.py
write_to_individual_files.py		write_to_individual_files.py

SumedhSankhe/Opinion-Generation-using-Abstractive-Text-Summarization

Folders and files

Latest commit

History

Repository files navigation

Opinion Generation using Abstractive Text Summarization

About

Topics

Resources

Stars

Watchers

Forks

Languages