huggingface · saikumar0605 · Aug 21, 2023 · Aug 21, 2023
diff --git a/chapters/en/chapter6/7.mdx b/chapters/en/chapter6/7.mdx
@@ -64,11 +64,11 @@ So, the sum of all frequencies is 210, and the probability of the subword `"ug"`
 
 Now, to tokenize a given word, we look at all the possible segmentations into tokens and compute the probability of each according to the Unigram model. Since all tokens are considered independent, this probability is just the product of the probability of each token. For instance, the tokenization `["p", "u", "g"]` of `"pug"` has the probability:
 
-$$P([``p", ``u", ``g"]) = P(``p") \times P(``u") \times P(``g") = \frac{5}{210} \times \frac{36}{210} \times \frac{20}{210} = 0.000389$$
+$$P([``p", ``u", ``g"]) = P(``p") \times P(``u") \times P(``g") = \frac{17}{210} \times \frac{36}{210} \times \frac{20}{210} = 0.001322$$
 
 Comparatively, the tokenization `["pu", "g"]` has the probability:
 
-$$P([``pu", ``g"]) = P(``pu") \times P(``g") = \frac{5}{210} \times \frac{20}{210} = 0.0022676$$
+$$P([``pu", ``g"]) = P(``pu") \times P(``g") = \frac{17}{210} \times \frac{20}{210} = 0.007709$$
 
 so that one is way more likely. In general, tokenizations with the least tokens possible will have the highest probability (because of that division by 210 repeated for each token), which corresponds to what we want intuitively: to split a word into the least number of tokens possible.