Enable graph mode for LLM inference

Hi,
I have read the "examples\NPU compilation tutorial.ipynb" about graph mode and eager mode, which helped me a lot.
I was wondering if I could use graph mode in LLM inference to reduce the weights copying between CPU and NPU.
So i simply changed the return value of function `horizontal_fusion_linear` into `return fx_model.to('npu')`, after converting the model, the inference error is:
AttributeError: 'Tensor' object has no attribute 'is_contiguous'
It seems this operation cannot be performed in the NPU？If i want to use graph mode in LLM inference, the above change is correct?

Any comment or advice is appreciated, thanks !




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable graph mode for LLM inference #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable graph mode for LLM inference #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions