forked from DimplesL/Deeplearning.ai
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathC5W1_Quiz_Recurrent Neural Networks.txt
47 lines (25 loc) · 2.48 KB
/
C5W1_Quiz_Recurrent Neural Networks.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Quiz Week 1
Recurrent Neural Networks
1. Suppose your training examples are sentences (sequences of words). Which of the following refers to the j^{th}jth word in the i^{th}ith training example?
x(i)<j>
2. Consider this RNN: This specific type of architecture is appropriate when:
T_x = T_y
3. To which of these tasks would you apply a many-to-one RNN architecture? (Check all that apply).
Sentiment classification (input a piece of text and output a 0/1 to denote positive or negative sentiment)
Gender recognition from speech (input an audio clip and output a label indicating the speaker’s gender)
4. You are training this RNN language model. At the t^{th}tth time step, what is the RNN doing? Choose the best answer.
Estimating P(y^{<t>} | y^{<1>}, y^{<2>}, …, y^{<t-1>})
5. You have finished training a language model RNN and are using it to sample random sentences, as follows:
What are you doing at each time step tt?
(i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as {y}^{<t>}. (ii) Then pass this selected word to the next time-step.
6. You are training an RNN, and find that your weights and activations are all taking on the value of NaN (“Not a Number”). Which of these is the most likely cause of this problem?
Exploding gradient problem.
7. Suppose you are training a LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100-dimensional activations a^{<t>}. What is the dimension of Γu at each time step?
10000
8. Here’re the update equations for the GRU.
Betty’s model (removing Γr), because if Γu ≈ 0 for a timestep, the gradient can propagate back through that timestep without much decay.
9. Here are the equations for the GRU and the LSTM: From these, we can see that the Update Gate and Forget Gate in the LSTM play a role similar to _______ and ______ in the GRU. What should go in the the blanks?
Γu and 1-Γu
10. You have a pet dog whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as x^{<1>}, …, x^{<365>}. You’ve also collected data on your dog’s mood, which you represent as y^{<1>}, …, y^{<365>}. You’d like to build a model to map from x \rightarrow yx→y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?
Unidirectional RNN, because the value of y^{<t>} depends only on x^{<1>}, …, x^{<t>}, but not on x^{<t+1>}, …, x^{<365>}