weight_only_qlinear_prepack_int4 no found

### Describe the bug

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "./gpt-oss-20b-int4-AutoRound-FP8KV"

# 加载分词器
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# 加载模型（使用自定义 modeling_gpt_oss.py）
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",       # 自动分配到 GPU/CPU
    torch_dtype= torch.float32,      # 自动选择最佳精度（这里会是 int4 + fp16/32 混合）
    trust_remote_code=True   # 允许使用本地的 modeling_gpt_oss.py
)

# 输入
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)

# 生成
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

# 输出解码
print(tokenizer.decode(outputs[0], skip_special_tokens=True))




C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Scripts\python.exe C:\Users\aiouya\PycharmProjects\PythonProject4\test.py 
`torch_dtype` is deprecated! Use `dtype` instead!
[W925 22:37:28.000000000 OperatorEntry.cpp:225] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
    registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:37
       new kernel: registered at H:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\gpu\xpu\ATen\RegisterXPU_0.cpp:172 (function operator ())
C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\accelerate\utils\memory.py:204: UserWarning: The XPU `mem_get_info` API is available in IPEX version >=2.5 or PyTorch >=2.6. The current returned available memory is incorrect. Please consider upgrading your IPEX or PyTorch version.
  warnings.warn(
2025-09-25 22:37:29,656 INFO modeling.py L1004: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\accelerate\utils\modeling.py:1582: UserWarning: Current model requires 10268467328 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
  warnings.warn(
2025-09-25 22:37:30,030 INFO modeling.py L1592: Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:
  - 0: 4633067520 bytes required
These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00,  1.69s/it]
2025-09-25 22:37:35,606 WARNING big_modeling.py L442: Some parameters are on the meta device because they were offloaded to the cpu and disk.
repacking to CPU/XPU format:   0%|          | 0/1536 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\test.py", line 10, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 597, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\transformers\modeling_utils.py", line 288, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\transformers\modeling_utils.py", line 5283, in from_pretrained
    hf_quantizer.postprocess_model(model, config=config)
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\transformers\quantizers\base.py", line 251, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\transformers\quantizers\quantizer_auto_round.py", line 71, in _process_model_after_weight_loading
    post_init(model, self.used_backends)
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\auto_round\inference\convert_model.py", line 506, in post_init
    layer.post_init()
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\auto_round_extension\ipex\qlinear_ipex_gptq.py", line 127, in post_init
    self.ipex_linear = ipex.llm.quantization.IPEXWeightOnlyQuantizedLinear.from_weight(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\intel_extension_for_pytorch\llm\quantization\woq_linear.py", line 64, in from_weight
    woq_linear_impl = woq_linear_impl_cls.from_weight(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\intel_extension_for_pytorch\nn\modules\weight_only_quantization.py", line 452, in from_weight
    return cls.from_int4_weight(
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\intel_extension_for_pytorch\nn\modules\weight_only_quantization.py", line 387, in from_int4_weight
    qlinear._op_context = torch.ops.ipex_prepack.weight_only_qlinear_prepack_int4(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages\torch\_ops.py", line 1353, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'ipex_prepack' object has no attribute 'weight_only_qlinear_prepack_int4'

进程已结束，退出代码为 1


### Versions

(.venv) PS C:\Users\aiouya\PycharmProjects\PythonProject4\auto-round-0.7.1> pip show intel-extension-for-pytorch Name: intel_extension_for_pytorch Version: 2.8.10+xpu Summary: Intel® Extension for PyTorch* Home-page: https://github.com/intel/intel-extension-for-pytorch Author: Intel Corp. Author-email: License: https://www.apache.org/licenses/LICENSE-2.0 Location: C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages Requires: numpy, packaging, psutil Required-by: (.venv) PS C:\Users\aiouya\PycharmProjects\PythonProject4\auto-round-0.7.1> pip show torch Name: torch Version: 2.8.0+xpu Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3-Clause Location: C:\Users\aiouya\PycharmProjects\PythonProject4\.venv\Lib\site-packages Requires: dpcpp-cpp-rt, filelock, fsspec, intel-cmplr-lib-rt, intel-cmplr-lib-ur, intel-cmplr-lic-rt, i ntel-opencl-rt, intel-openmp, intel-pti, intel-sycl-rt, jinja2, mkl, networkx, onemkl-sycl-blas, onemkl -sycl-dft, onemkl-sycl-lapack, onemkl-sycl-rng, onemkl-sycl-sparse, pytorch-triton-xpu, sympy, tbb, tcmlib, typing-extensions, umf Required-by: accelerate, auto_gptq, auto_round, lm_eval, peft, torchaudio, torchvision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

weight_only_qlinear_prepack_int4 no found #864

Describe the bug

加载分词器

加载模型（使用自定义 modeling_gpt_oss.py）

输入

生成

输出解码

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

weight_only_qlinear_prepack_int4 no found #864

Description

Describe the bug

加载分词器

加载模型（使用自定义 modeling_gpt_oss.py）

输入

生成

输出解码

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions