Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prebuilt StableLM 1.6B model compilation not working #2283

Closed
saurav-pwh-old opened this issue May 6, 2024 · 6 comments
Closed

Prebuilt StableLM 1.6B model compilation not working #2283

saurav-pwh-old opened this issue May 6, 2024 · 6 comments
Labels
bug Confirmed bugs

Comments

@saurav-pwh-old
Copy link

🐛 Bug

I am trying to work with StableLM 1.6b model.But getting error in model compilation step.

To Reproduce

Steps to reproduce the behavior:

  1. Library Installation:
!python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu122 mlc-ai-nightly-cu122
!git lfs install
!mkdir -p dist
!git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt_libs

1.Installing the model and compiling it:

!cd dist && git clone https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f32_1-MLC
!mkdir ./dist/libs
!mlc_llm compile /content/dist/stablelm-2-zephyr-1_6b-q4f32_1-MLC/mlc-chat-config.json \
    --device cuda -o /content/dist/libs/stablelm-2-zephyr-1_6b-q4f32_1-cuda.so

Error

Cloning into 'stablelm-2-zephyr-1_6b-q4f32_1-MLC'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (44/44), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 47 (delta 5), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (47/47), 2.38 MiB | 4.56 MiB/s, done.
Filtering content: 100% (27/27), 882.66 MiB | 85.51 MiB/s, done.
mkdir: cannot create directory ‘./dist/libs’: File exists
[2024-05-06 13:36:58] INFO auto_config.py:69: Found model configuration: /content/dist/stablelm-2-zephyr-1_6b-q4f32_1-MLC/mlc-chat-config.json
[2024-05-06 13:37:02] INFO auto_device.py:79: Found device: cuda:0
[2024-05-06 13:37:02] INFO auto_target.py:71: Found configuration of target device "cuda:0": {"thread_warp_size": 32, "arch": "sm_75", "max_threads_per_block": 1024, "max_num_threads": 1024, "kind": "cuda", "max_shared_memory_per_block": 49152, "tag": "", "keys": ["cuda", "gpu"]}
[2024-05-06 13:37:02] INFO auto_target.py:103: Found host LLVM triple: x86_64-redhat-linux-gnu
[2024-05-06 13:37:02] INFO auto_target.py:104: Found host LLVM CPU: skylake-avx512
[2024-05-06 13:37:02] INFO auto_target.py:317: Generating code for CUDA architecture: sm_75
[2024-05-06 13:37:02] INFO auto_target.py:318: To produce multi-arch fatbin, set environment variable MLC_MULTI_ARCH. Example: MLC_MULTI_ARCH=70,72,75,80,86,87,89,90a
[2024-05-06 13:37:02] INFO auto_config.py:153: Found model type: stablelm_epoch. Use `--model-type` to override.
Traceback (most recent call last):
  File "/usr/local/bin/mlc_llm", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 25, in main
    cli.main(sys.argv[2:])
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/compile.py", line 120, in main
    parsed.model_type = detect_model_type(parsed.model_type, parsed.model)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/auto_config.py", line 155, in detect_model_type
    raise ValueError(f"Unknown model type: {model_type}. Available ones: {list(MODELS.keys())}")
ValueError: Unknown model type: stablelm_epoch. Available ones: ['llama', 'mistral', 'gemma', 'gpt2', 'mixtral', 'gpt_neox', 'gpt_bigcode', 'phi-msft', 'phi', 'qwen', 'qwen2', 'stablelm', 'baichuan', 'internlm', 'rwkv5', 'orion', 'llava', 'rwkv6', 'chatglm', 'eagle']

Expected behavior

A file named stablelm-2-zephyr-1_6b-q4f32_1-cuda.so should be successfully created inside libs directory.

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA

  • Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu Linux

  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...):

  • How you installed MLC-LLM (conda, source):

  • How you installed TVM-Unity (pip, source): pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu122 mlc-ai-nightly-cu122

  • Python version (e.g. 3.10): 3.10.12

  • GPU driver version (if applicable):

  • CUDA/cuDNN version (if applicable): 12.2

  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile `models):
    USE_NVTX: OFF
    USE_GTEST: AUTO
    SUMMARIZE: OFF
    TVM_DEBUG_WITH_ABI_CHANGE: OFF
    USE_IOS_RPC: OFF
    USE_MSC: OFF
    USE_ETHOSU:
    CUDA_VERSION: 12.2
    USE_LIBBACKTRACE: AUTO
    DLPACK_PATH: 3rdparty/dlpack/include
    USE_TENSORRT_CODEGEN: OFF
    USE_THRUST: ON
    USE_TARGET_ONNX: OFF
    USE_AOT_EXECUTOR: ON
    BUILD_DUMMY_LIBTVM: OFF
    USE_CUDNN: OFF
    USE_TENSORRT_RUNTIME: OFF
    USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
    USE_CCACHE: AUTO
    USE_ARM_COMPUTE_LIB: OFF
    USE_CPP_RTVM:
    USE_OPENCL_GTEST: /path/to/opencl/gtest
    TVM_LOG_BEFORE_THROW: OFF
    USE_MKL: OFF
    USE_PT_TVMDSOOP: OFF
    MLIR_VERSION: NOT-FOUND
    USE_CLML: OFF
    USE_STACKVM_RUNTIME: OFF
    USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
    ROCM_PATH: /opt/rocm
    USE_DNNL: OFF
    USE_MSCCL: OFF
    USE_VITIS_AI: OFF
    USE_MLIR: OFF
    USE_RCCL: OFF
    USE_LLVM: llvm-config --ignore-libllvm --link-static
    USE_VERILATOR: OFF
    USE_TF_TVMDSOOP: OFF
    USE_THREADS: ON
    USE_MSVC_MT: OFF
    BACKTRACE_ON_SEGFAULT: OFF
    USE_GRAPH_EXECUTOR: ON
    USE_NCCL: ON
    USE_ROCBLAS: OFF
    GIT_COMMIT_HASH: ced07e88781c0d6416e276d9cd084bb46aaf3da5
    USE_VULKAN: ON
    USE_RUST_EXT: OFF
    USE_CUTLASS: ON
    USE_CPP_RPC: OFF
    USE_HEXAGON: OFF
    USE_CUSTOM_LOGGING: OFF
    USE_UMA: OFF
    USE_FALLBACK_STL_MAP: OFF
    USE_SORT: ON
    USE_RTTI: ON
    GIT_COMMIT_TIME: 2024-04-25 21:07:15 -0400
    USE_HEXAGON_SDK: /path/to/sdk
    USE_BLAS: none
    USE_ETHOSN: OFF
    USE_LIBTORCH: OFF
    USE_RANDOM: ON
    USE_CUDA: ON
    USE_COREML: OFF
    USE_AMX: OFF
    BUILD_STATIC_RUNTIME: OFF
    USE_CMSISNN: OFF
    USE_KHRONOS_SPIRV: OFF
    USE_CLML_GRAPH_EXECUTOR: OFF
    USE_TFLITE: OFF
    USE_HEXAGON_GTEST: /path/to/hexagon/gtest
    PICOJSON_PATH: 3rdparty/picojson
    USE_OPENCL_ENABLE_HOST_PTR: OFF
    INSTALL_DEV: OFF
    USE_PROFILER: ON
    USE_NNPACK: OFF
    LLVM_VERSION: 15.0.7
    USE_MRVL: OFF
    USE_OPENCL: OFF
    COMPILER_RT_PATH: 3rdparty/compiler-rt
    RANG_PATH: 3rdparty/rang/include
    USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
    USE_OPENMP: OFF
    USE_BNNS: OFF
    USE_FLASHINFER: ON
    USE_CUBLAS: ON
    USE_METAL: OFF
    USE_MICRO_STANDALONE_RUNTIME: OFF
    USE_HEXAGON_EXTERNAL_LIBS: OFF
    USE_ALTERNATIVE_LINKER: AUTO
    USE_BYODT_POSIT: OFF
    USE_HEXAGON_RPC: OFF
    USE_MICRO: OFF
    DMLC_PATH: 3rdparty/dmlc-core/include
    INDEX_DEFAULT_I64: ON
    USE_RELAY_DEBUG: OFF
    USE_RPC: ON
    USE_TENSORFLOW_PATH: none
    TVM_CLML_VERSION:
    USE_MIOPEN: OFF
    USE_ROCM: OFF
    USE_PAPI: OFF
    USE_CURAND: OFF
    TVM_CXX_COMPILER_PATH: /opt/rh/gcc-toolset-11/root/usr/bin/c++
    HIDE_PRIVATE_SYMBOLS: ON

  • Any other relevant information:

Information about GPU:

`+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 61C P0 28W / 70W | 1267MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|`

Additional context

I am using google colab to do all of this.

@saurav-pwh-old saurav-pwh-old added the bug Confirmed bugs label May 6, 2024
@tqchen
Copy link
Contributor

tqchen commented May 9, 2024

@tlopex can you look a bit into this model?

@tlopex
Copy link
Contributor

tlopex commented May 10, 2024

@saurav-pwh-old Hi, I think that you can temporarily try to use --model-type stablelm in your command to override it.

@ollmer
Copy link
Contributor

ollmer commented May 13, 2024

Hi, I have the same issue. When I tried to use --model-type stablelm, I got a new error:

TypeError: StableLmConfig.__init__() missing 2 required positional arguments: 'layer_norm_eps' and 'partial_rotary_factor'

@tlopex
Copy link
Contributor

tlopex commented May 14, 2024

@ollmer Thanks you for pointing that out! The reason is that official stablelm2 model has been updated and there are some differences in parameters. What we have in huggingface.co/mlc-ai may be obsolete. We will soon update new one.

@tlopex
Copy link
Contributor

tlopex commented May 24, 2024

Hello, everyone! Sorry for so long waiting. Thanks to @MasterJH5574 's help, I already uploaded stablelm2_1.6b models below:
图片
And I tested it here

(mlc-prebuilt) tlopex@tlopex-OMEN-by-HP-Laptop-17-ck1xxx:~/mlc-llm$ python -m  mlc_llm chat HF://mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC     --device "cuda:0"     --overrides context_window_size=4096     --opt "O2" 
[2024-05-24 18:40:07] INFO config.py:106: Overriding context_window_size from None to 4096
[2024-05-24 18:40:09] INFO auto_device.py:79: Found device: cuda:0
[2024-05-24 18:40:09] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC
[2024-05-24 18:40:09] INFO download.py:42: [Git] Cloning https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC.git to /tmp/tmpq4lci24o/tmp
[2024-05-24 18:40:12] INFO download.py:78: [Git LFS] Downloading 0 files with Git LFS: []
0it [00:00, ?it/s]
[2024-05-24 18:40:18] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_3.bin to /tmp/tmpq4lci24o/tmp/params_shard_3.bin
[2024-05-24 18:40:21] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_2.bin to /tmp/tmpq4lci24o/tmp/params_shard_2.bin
[2024-05-24 18:40:26] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_4.bin to /tmp/tmpq4lci24o/tmp/params_shard_4.bin
[2024-05-24 18:40:32] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_0.bin to /tmp/tmpq4lci24o/tmp/params_shard_0.bin
[2024-05-24 18:40:33] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_6.bin to /tmp/tmpq4lci24o/tmp/params_shard_6.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_1.bin to /tmp/tmpq4lci24o/tmp/params_shard_1.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_5.bin to /tmp/tmpq4lci24o/tmp/params_shard_5.bin
[2024-05-24 18:40:36] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_7.bin to /tmp/tmpq4lci24o/tmp/params_shard_7.bin
[2024-05-24 18:40:38] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_8.bin to /tmp/tmpq4lci24o/tmp/params_shard_8.bin
[2024-05-24 18:40:40] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_9.bin to /tmp/tmpq4lci24o/tmp/params_shard_9.bin
[2024-05-24 18:40:41] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_10.bin to /tmp/tmpq4lci24o/tmp/params_shard_10.bin
[2024-05-24 18:40:46] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_12.bin to /tmp/tmpq4lci24o/tmp/params_shard_12.bin
[2024-05-24 18:40:46] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_13.bin to /tmp/tmpq4lci24o/tmp/params_shard_13.bin
[2024-05-24 18:40:48] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_11.bin to /tmp/tmpq4lci24o/tmp/params_shard_11.bin
[2024-05-24 18:40:51] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_15.bin to /tmp/tmpq4lci24o/tmp/params_shard_15.bin
[2024-05-24 18:40:51] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_14.bin to /tmp/tmpq4lci24o/tmp/params_shard_14.bin
[2024-05-24 18:40:52] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_16.bin to /tmp/tmpq4lci24o/tmp/params_shard_16.bin
[2024-05-24 18:40:52] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_17.bin to /tmp/tmpq4lci24o/tmp/params_shard_17.bin
[2024-05-24 18:40:56] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_18.bin to /tmp/tmpq4lci24o/tmp/params_shard_18.bin
[2024-05-24 18:40:57] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_20.bin to /tmp/tmpq4lci24o/tmp/params_shard_20.bin
[2024-05-24 18:40:58] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_21.bin to /tmp/tmpq4lci24o/tmp/params_shard_21.bin
[2024-05-24 18:41:00] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_22.bin to /tmp/tmpq4lci24o/tmp/params_shard_22.bin
[2024-05-24 18:41:01] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_23.bin to /tmp/tmpq4lci24o/tmp/params_shard_23.bin
[2024-05-24 18:41:03] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_24.bin to /tmp/tmpq4lci24o/tmp/params_shard_24.bin
[2024-05-24 18:41:05] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_26.bin to /tmp/tmpq4lci24o/tmp/params_shard_26.bin
[2024-05-24 18:41:05] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_25.bin to /tmp/tmpq4lci24o/tmp/params_shard_25.bin
[2024-05-24 18:41:08] INFO download.py:154: Downloaded https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC/resolve/main/params_shard_19.bin to /tmp/tmpq4lci24o/tmp/params_shard_19.bin
100%|███████████████████████████████████████████| 27/27 [00:55<00:00,  2.05s/it]
[2024-05-24 18:41:08] INFO download.py:155: Moving /tmp/tmpq4lci24o/tmp to /home/tlopex/.cache/mlc_llm/model_weights/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC
[2024-05-24 18:41:08] INFO chat_module.py:781: Now compiling model lib on device...
[2024-05-24 18:41:08] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-05-24 18:41:08] INFO jit.py:160: Using cached model lib: /home/tlopex/.cache/mlc_llm/model_lib/489dc4831dc725c82bd025a54da84013.so
[2024-05-24 18:41:09] INFO model_metadata.py:96: Total memory usage: 1756.66 MB (Parameters: 882.66 MB. KVCache: 0.00 MB. Temporary buffer: 874.00 MB)
[2024-05-24 18:41:09] INFO model_metadata.py:105: To reduce memory usage, tweak `prefill_chunk_size`, `context_window_size` and `sliding_window_size`
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /set [overrides]    override settings in the generation config. For example,
                      `/set temperature=0.5;max_gen_len=100;stop=end,stop`
                      Note: Separate stop words in the `stop` option with commas (,).
  Multi-line input: Use escape+enter to start a new line.

<|user|>: Hello!
<|assistant|>: 
Hello! How can I assist you today?

So I believe you all can use it as well. Enjoy trying it!

@tqchen tqchen closed this as completed May 24, 2024
@tqchen
Copy link
Contributor

tqchen commented May 24, 2024

Thanks @tlopex !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

4 participants