[UPDATE] Clarify Token-to-Embedding Conversion in 'What Are LLMs' Section #225

venkatmanish · 2025-02-22T15:46:13Z

Feedback: Clarification on Token-to-Embedding Conversion

Hi Hugging Face Team,

I recently went through the AI agents course, and I wanted to suggest a clarification in the section explaining tokenization and embeddings.

The current passage:

"Once the input text is tokenized, the model computes a representation of the sequence that captures information about the meaning and the position of each token in the input sequence. This representation goes into the model, which outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence."

This wording may confuse learners because it sounds like the representation is computed outside the model first and then fed back into the model. But, in reality, the model processes the tokens inside itself, including the conversion into embeddings, and then computes the representations of the sequence.

To make this clearer, I suggest modifying the explanation to emphasize that the embedding layer computes the token representations within the model itself, and then those representations are further processed by the model’s layers.

Here’s a revised version:

"Once the input text is tokenized, it is passed through the model's embedding layer, which computes a representation for each token, capturing both its meaning and position within the sequence. These embeddings are then passed through the model's deeper layers, which capture complex relationships between tokens, and the model outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence."

I hope this adjustment will make it clearer that the token representations are computed within the model, not outside of it, and avoid any confusion about the flow of data.

Thank you for the great course!

Best regards,
Venkat Manish

venkatmanish added the documentation Improvements or additions to documentation label Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UPDATE] Clarify Token-to-Embedding Conversion in 'What Are LLMs' Section #225

[UPDATE] Clarify Token-to-Embedding Conversion in 'What Are LLMs' Section #225

venkatmanish commented Feb 22, 2025

[UPDATE] Clarify Token-to-Embedding Conversion in 'What Are LLMs' Section #225

[UPDATE] Clarify Token-to-Embedding Conversion in 'What Are LLMs' Section #225

Comments

venkatmanish commented Feb 22, 2025