Open
Description
Motivation and description
let's say we have an array of shape (embedding_size, seq_len, batch_size), our padding mask will have a shape of (seq_len, batch_size) which can't be used in multi-head-attension mask layer, we can only use casual masking which has the shape (seq_len, seq_len)
Possible Implementation
No response
Metadata
Metadata
Assignees
Labels
No labels