How to set dtype=half? #32

Hermi-Mire · 2025-02-16T06:01:34Z

Hi, I try to reproduce on 2080 Ti, but got error message:

  File "/workspace/dialogue/deepscaler/verl/verl/trainer/main_ppo.py", line 114, in main
    ray.get(main_task.remote(config))
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/ray/_private/worker.py", line 2772, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/ray/_private/worker.py", line 919, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::main_task() (pid=6802, ip=172.17.0.4)
  File "/workspace/dialogue/deepscaler/verl/verl/trainer/main_ppo.py", line 199, in main_task
    trainer.init_workers()
  File "/workspace/dialogue/deepscaler/verl/verl/trainer/ppo/ray_trainer.py", line 530, in init_workers
    self.actor_rollout_wg.init_model()
  File "/workspace/dialogue/deepscaler/verl/verl/single_controller/ray/base.py", line 42, in func
    output = ray.get(output)
ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_init_model() (pid=7224, ip=172.17.0.4, actor_id=a1b97f929b7bcffade92906d01000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7f60c7f6b040>)
  File "/workspace/dialogue/deepscaler/verl/verl/single_controller/ray/base.py", line 399, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/workspace/dialogue/deepscaler/verl/verl/single_controller/base/decorator.py", line 404, in inner
    return func(*args, **kwargs)
  File "/workspace/dialogue/deepscaler/verl/verl/workers/fsdp_workers.py", line 358, in init_model
    self.rollout, self.rollout_sharding_manager = self._build_rollout()
  File "/workspace/dialogue/deepscaler/verl/verl/workers/fsdp_workers.py", line 293, in _build_rollout
    rollout = vLLMRollout(actor_module=self.actor_module_fsdp,
  File "/workspace/dialogue/deepscaler/verl/verl/workers/rollout/vllm_rollout/vllm_rollout.py", line 98, in __init__
    self.inference_engine = LLM(actor_module,
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/llm.py", line 147, in __init__
    self.llm_engine = LLMEngine.from_engine_args(model, tokenizer, engine_args)  # TODO: check usagecontext
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py", line 393, in from_engine_args
    engine = cls(
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py", line 212, in __init__
    self.model_executor = executor_class(
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py", line 71, in __init__
    self._init_executor(model, distributed_init_method)
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py", line 78, in _init_executor
    self._init_workers_sp(model, distributed_init_method)
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py", line 111, in _init_workers_sp
    self.worker.init_device()
  File "/workspace/dialogue/deepscaler/verl/verl/third_party/vllm/vllm_v_0_6_3/worker.py", line 163, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
  File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/vllm/worker/worker.py", line 473, in _check_if_gpu_supports_dtype
    raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX 2080 Ti GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

The text was updated successfully, but these errors were encountered:

michaelzhiluo · 2025-02-16T14:10:32Z

Set this: https://github.com/agentica-project/deepscaler/blob/main/verl/verl/trainer/config/ppo_trainer.yaml#L69
to float16 or half.

Bfloat16 is only supported on A100/H100/... GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set dtype=half? #32

How to set dtype=half? #32

Hermi-Mire commented Feb 16, 2025

michaelzhiluo commented Feb 16, 2025

How to set dtype=half? #32

How to set dtype=half? #32

Comments

Hermi-Mire commented Feb 16, 2025

michaelzhiluo commented Feb 16, 2025