Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load local Qwen/Qwen2-VL-2B-Instruct finetuned checkpoint using deepspeed #2894

Closed
cs-mshah opened this issue Jan 9, 2025 · 4 comments

Comments

@cs-mshah
Copy link

cs-mshah commented Jan 9, 2025

Describe the bug
How should the Qwen/Qwen2-VL-2B-Instruct finetuned checkpoint using deepspeed be loaded for inference?

image

Your hardware and system info
Python=3.10
ms-swift latest
vllm==0.6.3.post1

Additional context
I am using https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_mllm.py script for inferencing with custom fine-tuned checkpoint which has the following structure:

❯ /vmdata/manan/vlm_training/v11-20250103-072801/checkpoint-33000/
❯ tree -L 3
.
├── adapter_config.json
├── adapter_model.safetensors
├── additional_config.json
├── args.json
├── global_step33000
│   ├── bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt
│   ├── bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt
│   ├── zero_pp_rank_0_mp_rank_00_model_states.pt
│   ├── zero_pp_rank_1_mp_rank_00_model_states.pt
│   ├── zero_pp_rank_2_mp_rank_00_model_states.pt
│   ├── zero_pp_rank_3_mp_rank_00_model_states.pt
│   ├── zero_pp_rank_4_mp_rank_00_model_states.pt
│   ├── zero_pp_rank_5_mp_rank_00_model_states.pt
│   ├── zero_pp_rank_6_mp_rank_00_model_states.pt
│   └── zero_pp_rank_7_mp_rank_00_model_states.pt
├── latest
├── merged
│   └── pytorch_model.bin
├── README.md
├── rng_state_0.pth
├── rng_state_1.pth
├── rng_state_2.pth
├── rng_state_3.pth
├── rng_state_4.pth
├── rng_state_5.pth
├── rng_state_6.pth
├── rng_state_7.pth
├── scheduler.pt
├── trainer_state.json
├── training_args.bin
└── zero_to_fp32.py

The merged/ folder contains the final deepspeed merged checkpoint. Has the checkpointing structure changed in the 3.0 swift version.

@Jintao-Huang
Copy link
Collaborator

这个是权重:adapter_model.safetensors

其他都是用于继续训练的

@Jintao-Huang
Copy link
Collaborator

@cs-mshah
Copy link
Author

Thanks. Could we have this updated in the docs as well.

@Jintao-Huang
Copy link
Collaborator

The document will be updated later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants