You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ubuntu@compute-permanent-node-406:~/arctic_vllm$ docker logs eb62662fb7fb
INFO 05-01 21:15:34 pynccl.py:49] Loading nccl from environment variable VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2
INFO 05-01 21:15:36 api_server.py:149] vLLM API server version 0.4.0.post1
INFO 05-01 21:15:36 api_server.py:150] args: Namespace(host='0.0.0.0', port=5010, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='Snowflake/snowflake-arctic-instruct', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir='/home/ubuntu/.cache/huggingface/hub', load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=8, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=1234, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=131072, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, model_loader_extra_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=100)
2024-05-01 21:15:38,268 INFO worker.py:1749 -- Started a local Ray instance.
INFO 05-01 21:15:39 llm_engine.py:87] Initializing an LLM engine (v0.4.0.post1) with config: model='Snowflake/snowflake-arctic-instruct', speculative_config=None, tokenizer='Snowflake/snowflake-arctic-instruct', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir='/home/ubuntu/.cache/huggingface/hub', load_format=auto, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1234)
WARNING 05-01 21:15:39 tokenizer.py:123] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
(pid=6228) INFO 05-01 21:15:40 pynccl.py:49] Loading nccl from environment variable VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2
(pid=6595) INFO 05-01 21:15:46 pynccl.py:49] Loading nccl from environment variable VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(pid=6844) INFO 05-01 21:15:53 pynccl.py:49] Loading nccl from environment variable VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 [repeated 3x across cluster]
INFO 05-01 21:15:57 selector.py:28] Using FlashAttention backend.
(RayWorkerVllm pid=6310) INFO 05-01 21:15:57 selector.py:28] Using FlashAttention backend.
(RayWorkerVllm pid=6400) INFO 05-01 21:15:58 pynccl_utils.py:45] vLLM is using nccl==2.19.3
INFO 05-01 21:15:58 pynccl_utils.py:45] vLLM is using nccl==2.19.3
INFO 05-01 21:16:10 utils.py:129] reading GPU P2P access cache from /home/ubuntu/.config/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 157, in <module>
engine = AsyncLLMEngine.from_engine_args(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 348, in from_engine_args
engine = cls(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 313, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 422, in _init_engine
return engine_class(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 128, in __init__
self.model_executor = executor_class(
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 45, in _init_executor
self._init_workers_ray(placement_group)
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 196, in _init_workers_ray
self._run_workers(
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/ray_gpu_executor.py", line 312, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 113, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 157, in load_model
self.model = get_model(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
return loader.load_model(model_config=model_config,
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
model = _initialize_model(model_config, self.load_config,
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 87, in _initialize_model
model_class = get_model_architecture(model_config)[0]
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/utils.py", line 31, in get_model_architecture
model_cls = ModelRegistry.load_model_cls(arch)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/__init__.py", line 98, in load_model_cls
module = importlib.import_module(
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/arctic.py", line 7, in <module>
from transformers import ArcticConfig
ImportError: cannot import name 'ArcticConfig' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/__init__.py)
I'm guessing the arctic branch is not quite consistent.
The text was updated successfully, but these errors were encountered:
pseudotensor
changed the title
[Bug]: building docker image fails with ImportError: cannot import name 'ArcticConfig' from 'transformers'
[Bug]: building docker image, but running fails with ImportError: cannot import name 'ArcticConfig' from 'transformers'
May 1, 2024
Your current environment
🐛 Describe the bug
then
gives:
I'm guessing the arctic branch is not quite consistent.
The text was updated successfully, but these errors were encountered: