Skip to content

Latest commit

 

History

History
23 lines (19 loc) · 734 Bytes

README.md

File metadata and controls

23 lines (19 loc) · 734 Bytes

RWKV_UNKNOWN

This is my implementation of RWKV language model.

cd src
deepspeed main.py --deepspeed --deepspeed_config=./configs/ds_config.config

TODO

  • jsonl data loading
  • ckpt saving/load
  • logging dump
  • config dump
  • megatron
  • rwkv5
  • attention mask

Reference

  • RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
  • Data preprocessor from TrainChatGalRWKV
  • neromous for the initaial code.
  • RWKV-infctx-trainer for the model initialization code.