Skip to content

Commit aba42ee

Browse files
Update 1.3 Shallow neural networks.md
1 parent 7161a18 commit aba42ee

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

1.3 Shallow neural networks.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -87,20 +87,20 @@ For example, if X has 3 training examples, each exmaple with 2 values.
8787

8888
- $a = tanh(z) = \frac{e^z-e^{-z}}{e^z+e^{-z}}$
8989
- Range: a [-1,1]
90-
- Almost always work better than sigmoid because the value between -1 and 1, the activation is close to have mean 0 (The effect is similar to centering yhe data)
90+
- Almost always work better than sigmoid because the value between -1 and 1, the activation is close to have mean 0 (The effect is similar to centering the data)
9191
- If z is very lare or z is very small, the slope of the gradient is almost 0, so this can slow down gradient descent.
9292

9393
- ReLu
9494

9595
- $a = max(0,z)$
9696
- Derivative is almost 0 when z is negative and 1 when z is positive
97-
- Due to the derivative property, it can be faster than tank
97+
- Due to the derivative property, it can be faster than tanh
9898

99-
- Leacy ReLu
99+
- Leaky ReLu
100100

101101
Rules of thumb:
102102

103-
- If output is 0,1 value (binary classification) ->sigmoid
103+
- If output is 0,1 value (binary classification) -> sigmoid
104104
- If dont know which to use: -> ReLu
105105

106106
## Why non linear activation function?
@@ -211,4 +211,4 @@ The general methodology to build a Neural Network is to:
211211
predictions = (A2>0.5)
212212
```
213213

214-
214+

0 commit comments

Comments
 (0)