N-Gram Language Model

This project implements an n-gram language model with k-smoothing to predict the next word in a sequence and evaluate the model using perplexity.

Introduction

The n-gram language model is a type of probabilistic language model for predicting the next item in a sequence, in this case, the next word in a sentence. The model uses k-smoothing to handle zero counts in n-gram probabilities and evaluates its performance using the perplexity metric.

Features

Predict the next word based on previous words using n-gram probabilities.
Implement k-smoothing to handle unseen n-grams.
Evaluate model performance with perplexity scores.
Support for n-grams and (n+1)-grams.

Installation

To use this project, follow these steps:

Clone the repository:

git clone https://github.com/RAW-si18/AUTO-COMPLETE-SYSTEM.git

Usage

To use the n-gram model, open and run the Jupyter notebook main.ipynb. The notebook includes the following key functions:

estimate_probabilities: Estimate probabilities of the next word.
suggest_next_word: Suggest the most likely next word based on previous tokens.
compute_perplexity: Calculate the perplexity of the model on a test set.

Running the Notebook

Open the Jupyter notebook:
```
jupyter notebook main.ipynb
```
Execute the cells in the notebook to train the model, predict the next word, and evaluate using perplexity.

Examples

Here are a few example usages of the functions provided in the notebook:

Predict the next word:

previous_tokens = ['i', 'like']
suggestion, probability = suggest_next_word(previous_tokens, n_gram_counts, n_plus1_gram_counts, vocabulary, k=1.0)
print(f"The suggested next word is '{suggestion}' with a probability of {probability:.4f}")

Compute perplexity:

test_sentence = ['i', 'like', 'to', 'play', 'football']
perplexity = compute_perplexity(test_sentence, n_gram_counts, n_plus1_gram_counts, vocabulary, k=1.0)
print(f"The perplexity of the test sentence is {perplexity:.4f}")

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

N-Gram Language Model

Table of Contents

Introduction

Features

Installation

Usage

Running the Notebook

Examples

About

Uh oh!

Releases

Packages

Languages

RAWx18/Auto-Complete-System

Folders and files

Latest commit

History

Repository files navigation

N-Gram Language Model

Table of Contents

Introduction

Features

Installation

Usage

Running the Notebook

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages