Prebuilt StableLM 1.6B model compilation not working #2283

saurav-pwh-old · 2024-05-06T13:48:14Z

🐛 Bug

I am trying to work with StableLM 1.6b model.But getting error in model compilation step.

To Reproduce

Steps to reproduce the behavior:

Library Installation:

!python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu122 mlc-ai-nightly-cu122
!git lfs install
!mkdir -p dist
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs

1.Installing the model and compiling it:

!cd dist && git clone https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f32_1-MLC
!mkdir ./dist/libs
!mlc_llm compile /content/dist/stablelm-2-zephyr-1_6b-q4f32_1-MLC/mlc-chat-config.json \
    --device cuda -o /content/dist/libs/stablelm-2-zephyr-1_6b-q4f32_1-cuda.so

Error

Cloning into 'stablelm-2-zephyr-1_6b-q4f32_1-MLC'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (44/44), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 47 (delta 5), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (47/47), 2.38 MiB | 4.56 MiB/s, done.
Filtering content: 100% (27/27), 882.66 MiB | 85.51 MiB/s, done.
mkdir: cannot create directory ‘./dist/libs’: File exists
[2024-05-06 13:36:58] INFO auto_config.py:69: Found model configuration: /content/dist/stablelm-2-zephyr-1_6b-q4f32_1-MLC/mlc-chat-config.json
[2024-05-06 13:37:02] INFO auto_device.py:79: Found device: cuda:0
[2024-05-06 13:37:02] INFO auto_target.py:71: Found configuration of target device "cuda:0": {"thread_warp_size": 32, "arch": "sm_75", "max_threads_per_block": 1024, "max_num_threads": 1024, "kind": "cuda", "max_shared_memory_per_block": 49152, "tag": "", "keys": ["cuda", "gpu"]}
[2024-05-06 13:37:02] INFO auto_target.py:103: Found host LLVM triple: x86_64-redhat-linux-gnu
[2024-05-06 13:37:02] INFO auto_target.py:104: Found host LLVM CPU: skylake-avx512
[2024-05-06 13:37:02] INFO auto_target.py:317: Generating code for CUDA architecture: sm_75
[2024-05-06 13:37:02] INFO auto_target.py:318: To produce multi-arch fatbin, set environment variable MLC_MULTI_ARCH. Example: MLC_MULTI_ARCH=70,72,75,80,86,87,89,90a
[2024-05-06 13:37:02] INFO auto_config.py:153: Found model type: stablelm_epoch. Use `--model-type` to override.
Traceback (most recent call last):
  File "/usr/local/bin/mlc_llm", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 25, in main
    cli.main(sys.argv[2:])
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/compile.py", line 120, in main
    parsed.model_type = detect_model_type(parsed.model_type, parsed.model)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/auto_config.py", line 155, in detect_model_type
    raise ValueError(f"Unknown model type: {model_type}. Available ones: {list(MODELS.keys())}")
ValueError: Unknown model type: stablelm_epoch. Available ones: ['llama', 'mistral', 'gemma', 'gpt2', 'mixtral', 'gpt_neox', 'gpt_bigcode', 'phi-msft', 'phi', 'qwen', 'qwen2', 'stablelm', 'baichuan', 'internlm', 'rwkv5', 'orion', 'llava', 'rwkv6', 'chatglm', 'eagle']

Expected behavior

A file named stablelm-2-zephyr-1_6b-q4f32_1-cuda.so should be successfully created inside libs directory.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu Linux
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...):
How you installed MLC-LLM (conda, source):
How you installed TVM-Unity (pip, source): pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu122 mlc-ai-nightly-cu122
Python version (e.g. 3.10): 3.10.12
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable): 12.2
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile `models):
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: 12.2
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: ON
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
TVM_LOG_BEFORE_THROW: OFF
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_MSCCL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --ignore-libllvm --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: ON
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: ced07e88781c0d6416e276d9cd084bb46aaf3da5
USE_VULKAN: ON
USE_RUST_EXT: OFF
USE_CUTLASS: ON
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-04-25 21:07:15 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: ON
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_FLASHINFER: ON
USE_CUBLAS: ON
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
Any other relevant information:

Information about GPU:

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|`

Additional context

I am using google colab to do all of this.

The text was updated successfully, but these errors were encountered:

tqchen · 2024-05-09T21:13:16Z

@tlopex can you look a bit into this model?

tlopex · 2024-05-10T03:45:56Z

@saurav-pwh-old Hi, I think that you can temporarily try to use --model-type stablelm in your command to override it.

ollmer · 2024-05-13T22:30:05Z

Hi, I have the same issue. When I tried to use --model-type stablelm, I got a new error:

TypeError: StableLmConfig.__init__() missing 2 required positional arguments: 'layer_norm_eps' and 'partial_rotary_factor'

tlopex · 2024-05-14T02:10:52Z

@ollmer Thanks you for pointing that out! The reason is that official stablelm2 model has been updated and there are some differences in parameters. What we have in huggingface.co/mlc-ai may be obsolete. We will soon update new one.

tlopex · 2024-05-24T12:35:27Z

Hello, everyone! Sorry for so long waiting. Thanks to @MasterJH5574 's help, I already uploaded stablelm2_1.6b models below:

And I tested it here

(mlc-prebuilt) tlopex@tlopex-OMEN-by-HP-Laptop-17-ck1xxx:~/mlc-llm$ python -m  mlc_llm chat HF://mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC     --device "cuda:0"     --overrides context_window_size=4096     --opt "O2" 
[2024-05-24 18:40:07] INFO config.py:106: Overriding context_window_size from None to 4096
[2024-05-24 18:40:09] INFO auto_device.py:79: Found device: cuda:0
[2024-05-24 18:40:09] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC
[2024-05-24 18:40:09] INFO download.py:42: [Git] Cloning https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC.git to /tmp/tmpq4lci24o/tmp
[2024-05-24 18:40:12] INFO download.py:78: [Git LFS] Downloading 0 files with Git LFS: []
0it [00:00, ?it/s]
[2024-05-24 18:40:18] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_3.bin to /tmp/tmpq4lci24o/tmp/params_shard_3.bin
[2024-05-24 18:40:21] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_2.bin to /tmp/tmpq4lci24o/tmp/params_shard_2.bin
[2024-05-24 18:40:26] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_4.bin to /tmp/tmpq4lci24o/tmp/params_shard_4.bin
[2024-05-24 18:40:32] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_0.bin to /tmp/tmpq4lci24o/tmp/params_shard_0.bin
[2024-05-24 18:40:33] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_6.bin to /tmp/tmpq4lci24o/tmp/params_shard_6.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_1.bin to /tmp/tmpq4lci24o/tmp/params_shard_1.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_5.bin to /tmp/tmpq4lci24o/tmp/params_shard_5.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_7.bin to /tmp/tmpq4lci24o/tmp/params_shard_7.bin
[2024-05-24 18:40:38] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_8.bin to /tmp/tmpq4lci24o/tmp/params_shard_8.bin
[2024-05-24 18:40:40] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_9.bin to /tmp/tmpq4lci24o/tmp/params_shard_9.bin
[2024-05-24 18:40:41] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_10.bin to /tmp/tmpq4lci24o/tmp/params_shard_10.bin
[2024-05-24 18:40:46] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_12.bin to /tmp/tmpq4lci24o/tmp/params_shard_12.bin
[2024-05-24 18:40:46] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_13.bin to /tmp/tmpq4lci24o/tmp/params_shard_13.bin
[2024-05-24 18:40:48] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_11.bin to /tmp/tmpq4lci24o/tmp/params_shard_11.bin
[2024-05-24 18:40:51] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_15.bin to /tmp/tmpq4lci24o/tmp/params_shard_15.bin
[2024-05-24 18:40:51] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_14.bin to /tmp/tmpq4lci24o/tmp/params_shard_14.bin
[2024-05-24 18:40:52] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_16.bin to /tmp/tmpq4lci24o/tmp/params_shard_16.bin
[2024-05-24 18:40:52] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_17.bin to /tmp/tmpq4lci24o/tmp/params_shard_17.bin
[2024-05-24 18:40:56] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_18.bin to /tmp/tmpq4lci24o/tmp/params_shard_18.bin
[2024-05-24 18:40:57] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_20.bin to /tmp/tmpq4lci24o/tmp/params_shard_20.bin
[2024-05-24 18:40:58] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_21.bin to /tmp/tmpq4lci24o/tmp/params_shard_21.bin
[2024-05-24 18:41:00] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_22.bin to /tmp/tmpq4lci24o/tmp/params_shard_22.bin
[2024-05-24 18:41:01] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_23.bin to /tmp/tmpq4lci24o/tmp/params_shard_23.bin
[2024-05-24 18:41:03] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_24.bin to /tmp/tmpq4lci24o/tmp/params_shard_24.bin
[2024-05-24 18:41:05] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_26.bin to /tmp/tmpq4lci24o/tmp/params_shard_26.bin
[2024-05-24 18:41:05] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_25.bin to /tmp/tmpq4lci24o/tmp/params_shard_25.bin
[2024-05-24 18:41:08] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_19.bin to /tmp/tmpq4lci24o/tmp/params_shard_19.bin
100%|███████████████████████████████████████████| 27/27 [00:55<00:00,  2.05s/it]
[2024-05-24 18:41:08] INFO download.py:155: Moving /tmp/tmpq4lci24o/tmp to /home/tlopex/.cache/mlc_llm/model_weights/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC
[2024-05-24 18:41:08] INFO chat_module.py:781: Now compiling model lib on device...
[2024-05-24 18:41:08] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-05-24 18:41:08] INFO jit.py:160: Using cached model lib: /home/tlopex/.cache/mlc_llm/model_lib/489dc4831dc725c82bd025a54da84013.so
[2024-05-24 18:41:09] INFO model_metadata.py:96: Total memory usage: 1756.66 MB (Parameters: 882.66 MB. KVCache: 0.00 MB. Temporary buffer: 874.00 MB)
[2024-05-24 18:41:09] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /set [overrides]    override settings in the generation config. For example,
                      `/set temperature=0.5;max_gen_len=100;stop=end,stop`
                      Note: Separate stop words in the `stop` option with commas (,).
  Multi-line input: Use escape+enter to start a new line.

<|user|>: Hello!
<|assistant|>: 
Hello! How can I assist you today?

So I believe you all can use it as well. Enjoy trying it!

tqchen · 2024-05-24T14:32:06Z

Thanks @tlopex !

saurav-pwh-old added the bug Confirmed bugs label May 6, 2024

tqchen closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prebuilt StableLM 1.6B model compilation not working #2283

Prebuilt StableLM 1.6B model compilation not working #2283

saurav-pwh-old commented May 6, 2024

tqchen commented May 9, 2024

tlopex commented May 10, 2024

ollmer commented May 13, 2024

tlopex commented May 14, 2024

tlopex commented May 24, 2024

tqchen commented May 24, 2024

Prebuilt StableLM 1.6B model compilation not working #2283

Prebuilt StableLM 1.6B model compilation not working #2283

Comments

saurav-pwh-old commented May 6, 2024

🐛 Bug

To Reproduce

Error

Expected behavior

Environment

Information about GPU:

Additional context

tqchen commented May 9, 2024

tlopex commented May 10, 2024

ollmer commented May 13, 2024

tlopex commented May 14, 2024

tlopex commented May 24, 2024

tqchen commented May 24, 2024