MPP-Qwen14B

MPP-Qwen14B

似乎被爱可可老师转发了🥹，感谢大家关注！

MiniGPT4Qwen相关可以跳转到：MiniGPT4Qwen_README.md

MPP-Qwen14B

知乎博客：https://zhuanlan.zhihu.com/p/687106694

已支持MPP-Qwen-14B模型在2张RTX4090 24GB上预训练和6张RTX4090 24GB上sft的deepspeed流水线并行训练！

sft后的权重（百度网盘）：

链接: https://pan.baidu.com/s/1Jy_zlQTBfSd9WmqZFFkBAg?pwd=0930
提取码: 0930

Quick Start

你可以使用automap_inference.ipynb(link)来做简单的推理！

准备的权重文件参考：模型下载部分，以及百度网盘中的sft文件即可

然后你就可以使用automap_inference.ipynb快速尝试对话了！

记得修改checkpoint_path到你下载的百度网盘sft权重的路径哦

Introduction

去年11月发布的LLaVA1.5，用可以接受的数据量（558K Pretrain + 665K SFT），以Vicuna-v1.5-13B为基座，得到了非常好的性能。后续被学术界和工业界广泛follow。

在读过其在github上的README后发现，24GB的消费级别显卡（RTX3090、RTX4090等）仅可以完成以Vicuna-v1.5-7B为底座的训练，而且Open出的是LoRA的配置。

为了不让贫穷限制想象力，接着MiniGPT4Qwen-14B的deepspeed流水线并行框架，推出MPP-Qwen14B（Multimodal Pipeline Parallel-Qwen14B），全程在RTX4090 24GB上完成只训练linear层的Pretrain阶段和LLM全参数训练的SFT阶段。

附属项目

知乎博客：MiniGPT4Qwen-14B
知乎博客：MiniGPT4Qwen
干净、灵活的Trainer：https://github.com/Coobiw/MiniGPT4Qwen/tree/master/lavis_trainer_cleaned
- 知乎：https://zhuanlan.zhihu.com/p/670572461
grad-checkpoint + amp tutorails：https://github.com/Coobiw/MiniGPT4Qwen/tree/master/amp_and_grad-checkpointing
- 知乎：https://zhuanlan.zhihu.com/p/671165275?
deepspeed tutorials：https://github.com/Coobiw/MiniGPT4Qwen/tree/master/deepspeed_tutorials
- 知乎：https://zhuanlan.zhihu.com/p/673359684

所需计算资源

MPP-Qwen14B Pretrain：2张RTX 4090 24GB
MPP-Qwen14B SFT：6张RTX 4090 24GB

TODO LIST

支持model parallelism的推理（使用了transformers的device_map="auto"）
开源sft权重（huggingface或百度网盘）
开源pretrain权重
开源处理好的pretrain和sft的数据集json文件
开源pretrain和sft代码和config
支持deepspeed的流水线并行

Installation

conda create -n minigpt4qwen python=3.8
conda activate minigpt4qwen
pip install -e .

Getting Started

模型下载

请将模型权重下载后都放在 cache/ckpt下

mkdir cache
cd cache
mkdir ckpt
mkdir dataset

1.下载BLIP2的相关权重

(a) eva vit-g

eva_vit_g.pth

wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth

(b) bert-base-uncased

huggingface,下载如下的文件即可

(c) blip2_pretrained_flant5xxl

blip2_pretrained_flant5xxl.pth

wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth

2.下载Qwen-14B-Chat的权重

Qwen-14B-chat huggingface

3.获得pretrain后的checkpoint（optional，如果你想直接在这上面做sft的话）

(建议放入 lavis/output/pp_14b/pretrain)

在本仓库的release里放有checkpoint，可以直接下载

wget https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen14B_ckpt-and-data/ckpt-and-data.zip
unzip ckpt-and-data.zip

目录结构：

├── cache
│   ├── ckpt
│   │   ├── bert-base-uncased
│   │   ├── blip2
│   │   │   ├── blip2_pretrained_flant5xxl.pth
│   │   ├── eva
│   │   │   ├── eva_vit_g.pth
│   │   ├── Qwen-14B-chat

sft后的权重（百度网盘）：

链接: https://pan.baidu.com/s/1Jy_zlQTBfSd9WmqZFFkBAg?pwd=0930
提取码: 0930

训练

数据准备

MPP-Qwen14B使用了LLaVA的Pretrain和指令微调的数据集，所以整体数据获取流程与LLaVA仓库说明的大体一致。

预训练数据：558K subset of the LAION-CC-SBU dataset with BLIP captions，去该huggingface链接下载images.zip和blip_laion_cc_sbu_558k.json

指令微调数据：下载coco的train2017里的图片：

wget http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip

MPP-Qwen14B format的标注json文件：在本仓库的release中(https://github.com/Coobiw/MiniGPT4Qwen/releases/tag/MPP-Qwen14B_ckpt-and-data):

wget https://github.com/Coobiw/MiniGPT4Qwen/releases/download/MPP-Qwen14B_ckpt-and-data/ckpt-and-data.zip
unzip ckpt-and-data.zip

然后按照下面的目录结构组织文件

最后需要将数据集放入 ./cache/dataset中，目录结构如下：

├── cache
│   └── dataset
│       ├── llava_pretrain
│   │   │   ├── blip_laion_cc_sbu_558k
│   │   │   |   ├── images
│   │   │   |   ├── llava_pretrain_minigpt4qwen_format.json
│       ├── llava_instuct
│   │   │   ├── coco
│   │   │   |   ├── train2017
│   │   │   ├── llava_instruction_100k.json

数据tokens数目分析

python tokenize_analysis.py

根据此，会在train的配置文件中，pretrain和sft的max_txt_len分别设置为256和512

运行train_pipeline.py进行流水线并行训练

Pretrain：

python -m torch.distributed.run --nproc_per_node=2 train_pipeline.py --cfg-path lavis/projects/pp_qwen14b/pretrain_pp.yaml --num-stages 2

SFT：

python -m torch.distributed.run --nproc_per_node=6 train_pipeline.py --cfg-path lavis/projects/pp_qwen14b/sft_100k_pp.yaml --num-stages 6

deepspeed权重转换为pth文件

预训练阶段

（仅转换linear projection层）

python pipe_proj2pth.py --ckpt-dir lavis/output/pp_14b/pretrain/global_stepxxx

转换后，模型文件会存储在ckpt_dir底下，名为model.pth

sft阶段

（需要转换projection层和所有LLM的参数）

python pipemodel2pth.py --ckpt-dir lavis/output/pp_14b/sft/global_stepxxx

转换后，模型文件会存储在ckpt_dir底下，名为unfreeze_llm_model.pth

推理

运行命令行demo

Single-GPU Inference（显存>=32GB才可以）:

python cli_demo.py --model-type qwen14b_chat -c lavis/output/pp_14b/sft/global_step296/unfreeze_llm_model.pth

MultiGPU(llm使用device_map="auto"加载，需要两张以上GPU，加起来的显存大于32GB即可，本项目使用AutoDL的2x24GB 4090)：

python cli_demo.py --model-type qwen14b_chat -c lavis/output/pp_14b/sft/global_step296/unfreeze_llm_model.pth --llm_device_map "auto"

使用auto-map时的显存占用情况：

CPU（速度极慢）:

python cli_demo.py -c xxxxxx --model-type qwen14b_chat --cpu-only # 如果显存足够(>32GB)可以不要--cpu-only

运行后需要输入图片路径，输入后进入对话

常见操作：

:help 查看help

:clear 清空当前命令行

:clh 清空对话历史（但图像输入不会更改）

:his 查看对话历史

:img 查看输入的图像路径

运行gradio webui demo

Single-GPU Inference（显存>=32GB才可以）:

python webui_demo.py --model-type qwen14b_chat -c lavis/output/pp_14b/sft/global_step296/unfreeze_llm_model.pth

MultiGPU(llm使用device_map="auto"加载，需要两张以上GPU，加起来的显存大于32GB即可，本项目使用AutoDL的2x24GB 4090)：

python webui_demo.py --model-type qwen14b_chat -c lavis/output/pp_14b/sft/global_step296/unfreeze_llm_model.pth --llm_device_map "auto"

CPU：

python webui_demo.py -c xxxxxx --model-type qwen14b_chat --cpu-only # 如果显存足够(>30GB)可以不要--cpu-only

MPP-Qwen14B对话示例

========

Acknowledgement

Lavis 本仓库是基于lavis进行构建的，且使用了其中BLIP2的ViT和Q-former
QwenLM 本仓库的语言模型采用Qwen-14B-Chat
DeepSpeed 👍
DeepSpeedExamples 👍👍
LLaVA 参照其训练范式，使用了其预训练和指令微调数据

License

本仓库的许多代码是基于Lavis 的，其采用 BSD 3-Clause License.
本仓库采用Qwen-7B-Chat，支持商用和科研、开发用途，其License为LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
amp_and_grad-checkpointing		amp_and_grad-checkpointing
assets		assets
deepspeed_tutorials		deepspeed_tutorials
examples		examples
lavis		lavis
lavis_trainer_cleaned		lavis_trainer_cleaned
vis		vis
.gitignore		.gitignore
MiniGPT4Qwen_README.md		MiniGPT4Qwen_README.md
README.md		README.md
automap_inference.ipynb		automap_inference.ipynb
cli_demo.py		cli_demo.py
deepspeed2pth.py		deepspeed2pth.py
evaluate.py		evaluate.py
pipe_proj2pth.py		pipe_proj2pth.py
pipemodel2pth.py		pipemodel2pth.py
pyproject.toml		pyproject.toml
record.md		record.md
requirements.txt		requirements.txt
setup.py		setup.py
test_model_chat.py		test_model_chat.py
tokenize_analysis.py		tokenize_analysis.py
train.py		train.py
train_pipeline.py		train_pipeline.py
webui_demo.py		webui_demo.py

Coobiw/MiniGPT4Qwen

Folders and files

Latest commit

History

Repository files navigation

MPP-Qwen14B

Quick Start

Introduction

附属项目

所需计算资源

TODO LIST

Installation

Getting Started

模型下载

训练

数据准备

数据tokens数目分析

运行train_pipeline.py进行流水线并行训练

deepspeed权重转换为pth文件

预训练阶段

sft阶段

推理

运行命令行demo

运行gradio webui demo

MPP-Qwen14B对话示例

========

Acknowledgement

License

About

Topics

Resources

Stars

Watchers

Forks

Languages