Skip to content

weight_only_qlinear_prepack_int4 no found #864

@legend-7-7

Description

@legend-7-7

Describe the bug

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "./gpt-oss-20b-int4-AutoRound-FP8KV"

加载分词器

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

加载模型(使用自定义 modeling_gpt_oss.py)

model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto", # 自动分配到 GPU/CPU
torch_dtype= torch.float32, # 自动选择最佳精度(这里会是 int4 + fp16/32 混合)
trust_remote_code=True # 允许使用本地的 modeling_gpt_oss.py
)

输入

text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)

生成

outputs = model.generate(
**inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.7,
top_p=0.9
)

输出解码

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Scripts\python.exe C:\Users\aiouya\PycharmProjects\PythonProject4\test.py
torch_dtype is deprecated! Use dtype instead!
[W925 22:37:28.000000000 OperatorEntry.cpp:225] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
registered at C:\actions-runner_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at C:\actions-runner_work\pytorch\pytorch\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:37
new kernel: registered at H:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\gpu\xpu\ATen\RegisterXPU_0.cpp:172 (function operator ())
C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\accelerate\utils\memory.py:204: UserWarning: The XPU mem_get_info API is available in IPEX version >=2.5 or PyTorch >=2.6. The current returned available memory is incorrect. Please consider upgrading your IPEX or PyTorch version.
warnings.warn(
2025-09-25 22:37:29,656 INFO modeling.py L1004: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set max_memory in to a higher value to use more memory (at your own risk).
C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\accelerate\utils\modeling.py:1582: UserWarning: Current model requires 10268467328 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
warnings.warn(
2025-09-25 22:37:30,030 INFO modeling.py L1592: Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:

  • 0: 4633067520 bytes required
    These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.
    Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.69s/it]
    2025-09-25 22:37:35,606 WARNING big_modeling.py L442: Some parameters are on the meta device because they were offloaded to the cpu and disk.
    repacking to CPU/XPU format: 0%| | 0/1536 [00:00<?, ?it/s]
    Traceback (most recent call last):
    File "C:\Users\aiouya\PycharmProjects\PythonProject4\test.py", line 10, in
    model = AutoModelForCausalLM.from_pretrained(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 597, in from_pretrained
    return model_class.from_pretrained(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\modeling_utils.py", line 288, in _wrapper
    return func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\modeling_utils.py", line 5283, in from_pretrained
    hf_quantizer.postprocess_model(model, config=config)
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\quantizers\base.py", line 251, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\quantizers\quantizer_auto_round.py", line 71, in _process_model_after_weight_loading
    post_init(model, self.used_backends)
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\auto_round\inference\convert_model.py", line 506, in post_init
    layer.post_init()
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\auto_round_extension\ipex\qlinear_ipex_gptq.py", line 127, in post_init
    self.ipex_linear = ipex.llm.quantization.IPEXWeightOnlyQuantizedLinear.from_weight(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\intel_extension_for_pytorch\llm\quantization\woq_linear.py", line 64, in from_weight
    woq_linear_impl = woq_linear_impl_cls.from_weight(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\intel_extension_for_pytorch\nn\modules\weight_only_quantization.py", line 452, in from_weight
    return cls.from_int4_weight(
    ^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\intel_extension_for_pytorch\nn\modules\weight_only_quantization.py", line 387, in from_int4_weight
    qlinear._op_context = torch.ops.ipex_prepack.weight_only_qlinear_prepack_int4(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\torch_ops.py", line 1353, in getattr
    raise AttributeError(
    AttributeError: '_OpNamespace' 'ipex_prepack' object has no attribute 'weight_only_qlinear_prepack_int4'

进程已结束,退出代码为 1

Versions

(.venv) PS C:\Users\aiouya\PycharmProjects\PythonProject4\auto-round-0.7.1> pip show intel-extension-for-pytorch Name: intel_extension_for_pytorch Version: 2.8.10+xpu Summary: Intel® Extension for PyTorch* Home-page: https://github.com/intel/intel-extension-for-pytorch Author: Intel Corp. Author-email: License: https://www.apache.org/licenses/LICENSE-2.0 Location: C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages Requires: numpy, packaging, psutil Required-by: (.venv) PS C:\Users\aiouya\PycharmProjects\PythonProject4\auto-round-0.7.1> pip show torch Name: torch Version: 2.8.0+xpu Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3-Clause Location: C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages Requires: dpcpp-cpp-rt, filelock, fsspec, intel-cmplr-lib-rt, intel-cmplr-lib-ur, intel-cmplr-lic-rt, i ntel-opencl-rt, intel-openmp, intel-pti, intel-sycl-rt, jinja2, mkl, networkx, onemkl-sycl-blas, onemkl -sycl-dft, onemkl-sycl-lapack, onemkl-sycl-rng, onemkl-sycl-sparse, pytorch-triton-xpu, sympy, tbb, tcmlib, typing-extensions, umf Required-by: accelerate, auto_gptq, auto_round, lm_eval, peft, torchaudio, torchvision

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions