Issue with attention mask unsqueeze in attention weights #4

meadewaking · 2024-01-24T08:15:24Z

After completing a batch inference, I discovered a bug in the attention weight computation. The attention mask was being added to the attention weights with an unsqueeze operation that was using the wrong dimension. It should have been unsqueezed along the second dimension instead of the first. Here's the revised and corrected code:

attn_weights = attn_weights + attention_mask[position_ids, :attn_weights.shape[3]].unsqueeze(1)

batch inference code:

```
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = None

model_filename = "data/model.safetensors"
tokenizer_filename = "data/tokenizer.model"

state_dict = load_safetensors(model_filename, device, dtype)
tokenizer = ChatTokenizer(tokenizer_filename)
prompts = ['import numpy as np', '#include<stdio.h>']

token_ids = [tokenizer.encode(i) for i in prompts]

cache = {}
max_text_len = 100
pad_id = -1

token_ids_torch = torch.full((len(token_ids), max_text_len), pad_id, dtype=torch.long, device=device)
for k, t in enumerate(token_ids):
    token_ids_torch[k, : len(t)] = torch.tensor(t, dtype=torch.long, device=device)
prompt_tokens_mask = token_ids_torch != pad_id

# Generate tokens one by one
for cur_pos in range(1, max_text_len):

    inputs = token_ids_torch[:, cur_pos-1:cur_pos]
    position_ids = torch.full(inputs.size(), cur_pos-1)

    # Predict logits
    logits = llama(inputs, position_ids, cache, state_dict)

    # Choose the most likely token
    new_token_ids = logits[:, -1].argmax(-1)

    # Stop if we reach the special end token
    if new_token_ids.any() == tokenizer.end_token_id:
        break

    # Append the new token to the list of tokens
    new_token_ids = torch.where(prompt_tokens_mask[:, cur_pos], token_ids_torch[:, cur_pos], new_token_ids)
    token_ids_torch[:, cur_pos] = new_token_ids

    token_ids = token_ids_torch.cpu().tolist()

response = [tokenizer.decode(i) for i in token_ids]
print(response)

The text was updated successfully, but these errors were encountered:

99991 · 2024-01-24T08:48:09Z

Good catch! And thank you for the correction. I must have forgotten to test with batch size > 1 at some point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with attention mask unsqueeze in attention weights #4

Issue with attention mask unsqueeze in attention weights #4

meadewaking commented Jan 24, 2024

99991 commented Jan 24, 2024

Issue with attention mask unsqueeze in attention weights #4

Issue with attention mask unsqueeze in attention weights #4

Comments

meadewaking commented Jan 24, 2024

99991 commented Jan 24, 2024