Here, we have 3 given parameters:
-
en_embeddings
: English words and their corresponding embeddings.
-
fr_embeddings
: French words and their corresponding embeddings.
Now, we have to create an English embedding matrix and French embedding matrix:
Given dictionaries of English and French word embeddings we will create a transformation matrix R
. In other words, given an english word embedding, e
, we need to multiply e
with R
, i.e., (eR
) to generate a new word embedding f
.
We can describe our translation problem as finding a matrix R
that minimizes the following equation:
For this, we calculate the loss by modifying the original Forbenius norm :
Original Forbenius Norm:
Modified Forbenius Norm:
Finally, our loss funtion will look something like this:
Now, in order to optimize the value of R (from a random squared matrix to an optimal value), we need to perform calculate the gradient of the loss we calculated in Step 2, w.r.t the transform matrix R
. The formula for calculating the gradient of the Loss Function, L(X, Y, R)
is:
Gradient descent is an iterative algorithm which is used in searching for the optimum of the function. Here, we calculate gradient g
of the loss with respect to the matrix, R
(Step 3). Next, we update the original R
with the formula:
where, alpha
is the learning rate (scalar quantity).
k-NN
(K-Nearest Neighbours) is a method which takes a vector as input and finds the other vectors in the dataset that are closest to it. The 'k' is the number of "nearest neighbors" to find (e.g. k=2 finds the closest two neighbors).
Since we're approximating the translation function from English to French embeddings by a linear transformation matrix R
, most of the time we won't get the exact embedding of a French word when we transform embedding e
of some particular English word into the French embedding space.
By using 1-NN
with eR
as input, we can search for an embedding f
(as a row) in the matrix Y
which is the closest to the transformed vector eR
. In order to find the the similarity between two vectors u
and v
, we calculate the cosine of the angle between them using the formula:
Finally, we calculate the accuracy of our translation model using the following formula:
Using the formula above, our model achieved an accuracy of 55.7%
on unseen data. This transaltion was achieved by using some basic linear algebra and learning a mapping of words from one language to another!
You can find the entire code (Python) for Machine Translation (English-to-French) over here.