You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for providing such excellent work for the community to use!
I have a question regarding an implementation detail. In Line 338, it appears that the code is adapted from Llama. However, when looking closer, the implementation in DeepSeek seems to differ, particularly from Line 363 to Line 367, compared to Llama’s implementation in Line 223.
Could you explain the reasoning behind this difference? Were there specific considerations that led to this change?
I look forward to your response. Thank you again for your great work!
Best regards,
The text was updated successfully, but these errors were encountered:
Dear Authors,
Thank you for providing such excellent work for the community to use!
I have a question regarding an implementation detail. In Line 338, it appears that the code is adapted from Llama. However, when looking closer, the implementation in DeepSeek seems to differ, particularly from Line 363 to Line 367, compared to Llama’s implementation in Line 223.
Could you explain the reasoning behind this difference? Were there specific considerations that led to this change?
I look forward to your response. Thank you again for your great work!
Best regards,
The text was updated successfully, but these errors were encountered: