-
Notifications
You must be signed in to change notification settings - Fork 305
Description
Describe the bug
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "./gpt-oss-20b-int4-AutoRound-FP8KV"
加载分词器
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
加载模型(使用自定义 modeling_gpt_oss.py)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto", # 自动分配到 GPU/CPU
torch_dtype= torch.float32, # 自动选择最佳精度(这里会是 int4 + fp16/32 混合)
trust_remote_code=True # 允许使用本地的 modeling_gpt_oss.py
)
输入
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
生成
outputs = model.generate(
**inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.7,
top_p=0.9
)
输出解码
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Scripts\python.exe C:\Users\aiouya\PycharmProjects\PythonProject4\test.py
torch_dtype is deprecated! Use dtype instead!
[W925 22:37:28.000000000 OperatorEntry.cpp:225] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
registered at C:\actions-runner_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at C:\actions-runner_work\pytorch\pytorch\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:37
new kernel: registered at H:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\gpu\xpu\ATen\RegisterXPU_0.cpp:172 (function operator ())
C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\accelerate\utils\memory.py:204: UserWarning: The XPU mem_get_info API is available in IPEX version >=2.5 or PyTorch >=2.6. The current returned available memory is incorrect. Please consider upgrading your IPEX or PyTorch version.
warnings.warn(
2025-09-25 22:37:29,656 INFO modeling.py L1004: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set max_memory in to a higher value to use more memory (at your own risk).
C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\accelerate\utils\modeling.py:1582: UserWarning: Current model requires 10268467328 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
warnings.warn(
2025-09-25 22:37:30,030 INFO modeling.py L1592: Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:
- 0: 4633067520 bytes required
These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.
Loading checkpoint shards: 100%|██████████| 3/3 [00:05<00:00, 1.69s/it]
2025-09-25 22:37:35,606 WARNING big_modeling.py L442: Some parameters are on the meta device because they were offloaded to the cpu and disk.
repacking to CPU/XPU format: 0%| | 0/1536 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\aiouya\PycharmProjects\PythonProject4\test.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 597, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\modeling_utils.py", line 288, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\modeling_utils.py", line 5283, in from_pretrained
hf_quantizer.postprocess_model(model, config=config)
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\quantizers\base.py", line 251, in postprocess_model
return self._process_model_after_weight_loading(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\transformers\quantizers\quantizer_auto_round.py", line 71, in _process_model_after_weight_loading
post_init(model, self.used_backends)
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\auto_round\inference\convert_model.py", line 506, in post_init
layer.post_init()
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\auto_round_extension\ipex\qlinear_ipex_gptq.py", line 127, in post_init
self.ipex_linear = ipex.llm.quantization.IPEXWeightOnlyQuantizedLinear.from_weight(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\intel_extension_for_pytorch\llm\quantization\woq_linear.py", line 64, in from_weight
woq_linear_impl = woq_linear_impl_cls.from_weight(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\intel_extension_for_pytorch\nn\modules\weight_only_quantization.py", line 452, in from_weight
return cls.from_int4_weight(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\intel_extension_for_pytorch\nn\modules\weight_only_quantization.py", line 387, in from_int4_weight
qlinear._op_context = torch.ops.ipex_prepack.weight_only_qlinear_prepack_int4(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages\torch_ops.py", line 1353, in getattr
raise AttributeError(
AttributeError: '_OpNamespace' 'ipex_prepack' object has no attribute 'weight_only_qlinear_prepack_int4'
进程已结束,退出代码为 1
Versions
(.venv) PS C:\Users\aiouya\PycharmProjects\PythonProject4\auto-round-0.7.1> pip show intel-extension-for-pytorch Name: intel_extension_for_pytorch Version: 2.8.10+xpu Summary: Intel® Extension for PyTorch* Home-page: https://github.com/intel/intel-extension-for-pytorch Author: Intel Corp. Author-email: License: https://www.apache.org/licenses/LICENSE-2.0 Location: C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages Requires: numpy, packaging, psutil Required-by: (.venv) PS C:\Users\aiouya\PycharmProjects\PythonProject4\auto-round-0.7.1> pip show torch Name: torch Version: 2.8.0+xpu Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3-Clause Location: C:\Users\aiouya\PycharmProjects\PythonProject4.venv\Lib\site-packages Requires: dpcpp-cpp-rt, filelock, fsspec, intel-cmplr-lib-rt, intel-cmplr-lib-ur, intel-cmplr-lic-rt, i ntel-opencl-rt, intel-openmp, intel-pti, intel-sycl-rt, jinja2, mkl, networkx, onemkl-sycl-blas, onemkl -sycl-dft, onemkl-sycl-lapack, onemkl-sycl-rng, onemkl-sycl-sparse, pytorch-triton-xpu, sympy, tbb, tcmlib, typing-extensions, umf Required-by: accelerate, auto_gptq, auto_round, lm_eval, peft, torchaudio, torchvision