[BUG/Help] <title>在colab运行时 实例代码是报错

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Current Behavior

环境为：colab, python==3.11, 按照requirements后运行hugging face上的示例代码，得到如下错误：

```
ValueError                                Traceback (most recent call last)
[<ipython-input-4-d0416a7bae87>](https://localhost:8080/#) in <cell line: 0>()
----> 1 response, history = model.chat(tokenizer, "你好", history=[])
      2 print(response)
      3 
      4 
      5 response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)

5 frames
[~/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b-int4/a954550736dda16d022a7019b9ffecd753aa1b84/modeling_chatglm.py](https://localhost:8080/#) in _update_model_kwargs_for_generation(self, outputs, model_kwargs, is_encoder_decoder)
    868     ) -> Dict[str, Any]:
    869         # update past_key_values
--> 870         cache_name, cache = self._extract_past_from_model_output(outputs)
    871         model_kwargs[cache_name] = cache
    872 

ValueError: too many values to unpack (expected 2)
```

### Expected Behavior

_No response_

### Steps To Reproduce

1. in the colab, using default setup with T4-GPU
2. 按照配置文件
```
!pip install protobuf
!pip install transformers==4.30.2
!pip install cpm_kernels
!pip install torch>=2.0
!pip install gradio
!pip install mdtex2html
!pip install sentencepiece
!pip install accelerate
!pip install sse-starlette
!pip install streamlit>=1.24.0
```

3.  运行下列code，下载对应的model
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True).half().cuda()
model = model.eval()
```

4. 运行下列code，尝试使用model，得到error：
```
response, history = model.chat(tokenizer, "你好", history=[])
print(response)


response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)
```

### Environment

```markdown
- OS: colab
- Python: 3.11
- Transformers: 4.30.2
- PyTorch: 2.6.0+cu124
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True
```

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG/Help] <title>在colab运行时实例代码是报错 #689

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG/Help] <title>在colab运行时 实例代码是报错 #689

Description

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG/Help] <title>在colab运行时实例代码是报错 #689