This repository provides a simple implementation of an attention mechanism both with and without PyTorch. The primary goal is to understand the forward pass of this architecture.
- Embedding Layer: Initialized randomly using a normal distribution.
- Model Dimension:
dim_model = 64 - Sequence Length:
seq_length = 10 - Vocabulary Size:
vocab_size = 100
- Python 3.9
- PyTorch (optional)
This project is licensed under the MIT License.
