Skip to content

Colab notebook showcasing the implementation of SLM or small language model or bigram language model

Notifications You must be signed in to change notification settings

hiteshmaurya99/SLM-BigramLanguageModel-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

SLM-BigramLanguageModel-

Colab notebook showcasing the implementation of SLM or small language model or bigram language model

  1. Environment Setup I set up the environment by importing necessary libraries and checking for GPU availability. The block_size, batch_size, and training hyperparameters were defined. The choice of 'cuda' or 'cpu' as the device depends on GPU availability.

  2. Data Preprocessing I loaded the text data from "wizard_of_oz.txt". The characters in the text were sorted and assigned numerical indices for encoding. I created functions (encode and decode) for converting between characters and indices. The data was then encoded into a PyTorch tensor.

  3. Model Definition I defined a Bigram Language Model using PyTorch. The model consists of an embedding layer (token_embedding_table). The forward method computes logits from input indices and calculates the cross-entropy loss if targets are provided. The generate method is used for text generation.

  4. Training Setup I instantiated the model, set up the AdamW optimizer, and defined a training loop. The loop includes getting batches of data (get_batch), computing logits and loss, performing backpropagation, and optimizing the model parameters. Periodically, I evaluated and printed the training and validation losses.

  5. Text Generation I initialized a context tensor and used the trained model to generate text. The generate method predicts the next character at each step based on the context.

Additional Notes The model is trained to predict the next character in a sequence given a context. This is achieved by minimizing the cross-entropy loss between predicted logits and actual characters in the training data. The Bigram model captures dependencies between consecutive characters, which is a simple form of language modeling. The generated text reflects the learned patterns and structure of the training data. Final Thoughts This notebook provides a concise example of building and training a Bigram Language Model using PyTorch for text generation. The training loop and text generation showcase the practical application of such models.

About

Colab notebook showcasing the implementation of SLM or small language model or bigram language model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published