Different inference results between flash attention and manually implemented attention appeared. #50

Jaeckel-d · 2024-07-02T02:41:59Z

When I loaded the smallest GPT-2 model weights from Hugging Face and performed inference using both flash attention and a manually implemented attention under the same seed setting, I obtained consistent results within each method individually. However, the results between the two methods were not consistent, and the manually implemented attention seemed to produce more reasonable outputs. Is this normal?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different inference results between flash attention and manually implemented attention appeared. #50

Different inference results between flash attention and manually implemented attention appeared. #50

Jaeckel-d commented Jul 2, 2024

Different inference results between flash attention and manually implemented attention appeared. #50

Different inference results between flash attention and manually implemented attention appeared. #50

Comments

Jaeckel-d commented Jul 2, 2024