how to transfer chatglm2-6b int4 model to npu device

### Is there an existing issue for this?

- [X] I have searched the existing issues

### Current Behavior

I found code as below in quantize.py, it seems like the quantization_code only support  running on GPU. 
Is there any suggestion to deploy the model on NPU.
Is it possible that you can provide the code in the quantization_code and maybe I can rewrite it to support running on a NPU device.  


    class Kernel:
        def __init__(self, code: bytes, function_names: List[str]):
            self.code = code
            self._function_names = function_names
            self._cmodule = LazyKernelCModule(self.code)

            for name in self._function_names:
                setattr(self, name, KernelFunction(self._cmodule, name))

    quantization_code = "XXXX"

    kernels = Kernel(
        bz2.decompress(base64.b64decode(quantization_code)),
        [
            "int4WeightCompression",
            "int4WeightExtractionFloat",
            "int4WeightExtractionHalf",
            "int8WeightExtractionFloat",
            "int8WeightExtractionHalf",
        ],
    )


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to transfer chatglm2-6b int4 model to npu device #649

Is there an existing issue for this?

Current Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how to transfer chatglm2-6b int4 model to npu device #649

Description

Is there an existing issue for this?

Current Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions