NLP: Measuring Semantic Sentence Similarity Using Baseline and Neural Models

This project focuses on enhancing semantic sentence similarity by designing and implementing various Machine Learning and Deep Learning models in Natural Language Processing (NLP). By applying advanced techniques, the project aims to minimize the model's mean squared error and evaluate the semantic distance between words or sentences in vector space using metrics like cosine similarity and word mover's distance.

Pre-Requisites

To get started with this project, you will need the following:

Python Environment: Python 3 with an editor such as PyCharm or Anaconda.
Jupyter Notebook: For running and experimenting with the code.
PyTorch: Latest version (GPU not required).
Foundational Knowledge: Basic understanding of Machine Learning, Deep Learning, and NLP.
Required Python Packages:
- NLTK
- sklearn
- numpy
- pandas
- scipy
- sent2vec
Embeddings:
- BioSentVec embedding (required for 4BioSentvec.ipynb): Download from NCBI.
- Bio Word2Vec embedding (required for 3Word2Vec.ipynb): Download from Bio NLP Lab.
Dataset:
- Request access from Harvard DBMI Portal.
- Task 2 dataset available at OHNLP 2018.

Research Papers

The following research papers provide valuable context and insights for this project:

Wang Y, Afzal N, Liu S, Rastegar-Mojarad M, Wang L, Shen F, Fu S, Liu H. Overview of the BioCreative/OHNLP Challenge 2018 Task 2: Clinical Semantic Textual Similarity. Proceedings of the BioCreative/OHNLP Challenge, 2018.
Wang Y, Afzal N, Fu S, Wang L, Shen F, Rastegar-Mojarad M, Liu H. MedSTS: A Resource for Clinical Semantic Textual Similarity. Language Resources and Evaluation, 2018.
Chen Q, Peng Y, Lu Z. BioSentVec: Creating Sentence Embeddings for Biomedical Texts. Proceedings of the 7th IEEE International Conference on Healthcare Informatics, 2019.

Deployment

Docker Image: Available at /aswaths/semantic_similarity:v3
Streamlit Application: Semantic Similarity Streamlit App
Inference API: Hosted on Render - Semantic Similarity API

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Notebook		Notebook
models		models
results		results
src		src
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP: Measuring Semantic Sentence Similarity Using Baseline and Neural Models

Pre-Requisites

Research Papers

Deployment

About

Languages

ash-sha/Semantic-Textual-Similarity-NLP

Folders and files

Latest commit

History

Repository files navigation

NLP: Measuring Semantic Sentence Similarity Using Baseline and Neural Models

Pre-Requisites

Research Papers

Deployment

About

Topics

Resources

Stars

Watchers

Forks

Languages