TolkienFormer - Text generation with Tolkien's Touch

A project exploring LSTM and Transformer-like models to generate text with implementations in Python & Pytorch.

Table of Contents

About The Project
Setup
Data
Training
Testing
Roadmap
License
Contact

About The Project

Welcome to TolkienFormer, a personal project that dives into the task of text generation. This project explores LSTMs and Transformer-like models to generate text reminiscent of J.R.R. Tolkien's The Lord of the Rings. While aiming to produce reasonable results, the primary goal is not to achieve state-of-the-art results but instead to enhance proficiency with Transformers, LSTMs, and PyTorch.

Text generation poses significant challenges in terms of data and computational resources. Thus, TolkienFormer employs the technique of Teacher Forcing to stabilize and expedite training and testing.

An example of the output generated by TolkienFormer after training and fine-tuning a model can be seen below:

Setup

First, you have to clone the repo and create a conda environment, as well adding the project root to your PYTHONPATH to enable local imports:

# 1. Clone this repository
git clone https://github.com/LuisWinckelmann/TolkienFormer.git
cd TolkienFormer
# 2. Setup conda env
conda create --name tolkienformer
conda activate tolkienformer
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# 3. Enable local imports by adding the root to your pythonpath:
# 3a) Linux:
export PYTHONPATH=$PYTHONPATH:$PWD
# 3b) Windows:
set PYTHONPATH=%PYTHONPATH%;%cd%

Afterwards, you need prepare the data to be able to train the models. For the specifics, please follow the Instructions below.

Data

For the data you can use any *.txt file that you want. In the current setup the file will get parse row-wise. The example dataset chapter1, provided in src/data/chapter1 includes chapter 1 or Tolkien's The Fellowship of the Ring obtained from here. To use your own dataset simply copy the text file(s) into src/data and run:

cd src/data
python data_preparation.py

If your data is stored somewhere else then src/data you can use --path_to_folder_with_txt_filess to adjust the root folder with the .txt files inside. If your data has another format you'll need to adjust your custom dataset in src/utils/datasets.py accordingly.

Training

To run training of the LSTM run:

cd src/models/lstm
python train.py

To run training of the transformer-like model run:

cd src/models/transformer
python train.py

All currently available hyperparameters can be changed in the corresponding config.json files located in src/models/lstm or src/models/transformers respectively.

Testing

After executing the training, to generate results of the models as shown in the description, you can run:

# LSTM model
cd src/models/lstm
python test.py 
# Transformer-like model
cd src/models/transformer
python test.py 
# Optional Parameters to edit when running test.py: 
# --num_sentences 5 
# --model_epoch 150

To specify the amount of predicted sentences use the --num_sentences flag, to select one of the saved checkpoints, use the --model_epoch flag Other parameters for the evaluation can be changed in the model config.json.

Roadmap

Move to logging from printing
Write description with a showcase
Publish some additional results
Confirm setup and functionality works and README is clearly written
Get rid of code doubling my merging LSTM & Transformer folders and specifically train.py & test.py
Easier setup via shell script(s)

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Luis Winckelmann - [email protected]
Project Link: https://github.com/LuisWinckelmann/TolkienFormer

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
gfx		gfx
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TolkienFormer - Text generation with Tolkien's Touch

About The Project

Setup

Data

Training

Testing

Roadmap

License

Contact

About

Releases

Packages

Contributors 2

Languages

License

LuisWinckelmann/TolkienFormer

Folders and files

Latest commit

History

Repository files navigation

TolkienFormer - Text generation with Tolkien's Touch

About The Project

Setup

Data

Training

Testing

Roadmap

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages