You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"\nNeural Networks\n===============\n\nNeural networks can be constructed using the ``torch.nn`` package.\n\nNow that you had a glimpse of ``autograd``, ``nn`` depends on\n``autograd`` to define models and differentiate them.\nAn ``nn.Module`` contains layers, and a method ``forward(input)``\\ that\nreturns the ``output``.\n\nFor example, look at this network that classifies digit images:\n\n.. figure:: /_static/img/mnist.png\n :alt: convnet\n\n convnet\n\nIt is a simple feed-forward network. It takes the input, feeds it\nthrough several layers one after the other, and then finally gives the\noutput.\n\nA typical training procedure for a neural network is as follows:\n\n- Define the neural network that has some learnable parameters (or\n weights)\n- Iterate over a dataset of inputs\n- Process input through the network\n- Compute the loss (how far is the output from being correct)\n- Propagate gradients back into the network\u2019s parameters\n- Update the weights of the network, typically using a simple update rule:\n ``weight = weight - learning_rate * gradient``\n\nDefine the network\n------------------\n\nLet\u2019s define this network:\n\n"
19
+
]
20
+
},
21
+
{
22
+
"cell_type": "code",
23
+
"execution_count": null,
24
+
"metadata": {
25
+
"collapsed": false
26
+
},
27
+
"outputs": [],
28
+
"source": [
29
+
"import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass Net(nn.Module):\n\n def __init__(self):\n super(Net, self).__init__()\n # 1 input image channel, 6 output channels, 5x5 square convolution\n # kernel\n self.conv1 = nn.Conv2d(1, 6, 5)\n self.conv2 = nn.Conv2d(6, 16, 5)\n # an affine operation: y = Wx + b\n self.fc1 = nn.Linear(16 * 5 * 5, 120)\n self.fc2 = nn.Linear(120, 84)\n self.fc3 = nn.Linear(84, 10)\n\n def forward(self, x):\n # Max pooling over a (2, 2) window\n x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n # If the size is a square you can only specify a single number\n x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n x = x.view(-1, self.num_flat_features(x))\n x = F.relu(self.fc1(x))\n x = F.relu(self.fc2(x))\n x = self.fc3(x)\n return x\n\n def num_flat_features(self, x):\n size = x.size()[1:] # all dimensions except the batch dimension\n num_features = 1\n for s in size:\n num_features *= s\n return num_features\n\n\nnet = Net()\nprint(net)"
30
+
]
31
+
},
32
+
{
33
+
"cell_type": "markdown",
34
+
"metadata": {},
35
+
"source": [
36
+
"You just have to define the ``forward`` function, and the ``backward``\nfunction (where gradients are computed) is automatically defined for you\nusing ``autograd``.\nYou can use any of the Tensor operations in the ``forward`` function.\n\nThe learnable parameters of a model are returned by ``net.parameters()``\n\n"
"Let try a random 32x32 input\nNote: Expected input size to this net(LeNet) is 32x32. To use this net on\nMNIST dataset, please resize the images from the dataset to 32x32.\n\n"
"<div class=\"alert alert-info\"><h4>Note</h4><p>``torch.nn`` only supports mini-batches. The entire ``torch.nn``\n package only supports inputs that are a mini-batch of samples, and not\n a single sample.\n\n For example, ``nn.Conv2d`` will take in a 4D Tensor of\n ``nSamples x nChannels x Height x Width``.\n\n If you have a single sample, just use ``input.unsqueeze(0)`` to add\n a fake batch dimension.</p></div>\n\nBefore proceeding further, let's recap all the classes you\u2019ve seen so far.\n\n**Recap:**\n - ``torch.Tensor`` - A *multi-dimensional array* with support for autograd\n operations like ``backward()``. Also *holds the gradient* w.r.t. the\n tensor.\n - ``nn.Module`` - Neural network module. *Convenient way of\n encapsulating parameters*, with helpers for moving them to GPU,\n exporting, loading, etc.\n - ``nn.Parameter`` - A kind of Tensor, that is *automatically\n registered as a parameter when assigned as an attribute to a*\n ``Module``.\n - ``autograd.Function`` - Implements *forward and backward definitions\n of an autograd operation*. Every ``Tensor`` operation, creates at\n least a single ``Function`` node, that connects to functions that\n created a ``Tensor`` and *encodes its history*.\n\n**At this point, we covered:**\n - Defining a neural network\n - Processing inputs and calling backward\n\n**Still Left:**\n - Computing the loss\n - Updating the weights of the network\n\nLoss Function\n-------------\nA loss function takes the (output, target) pair of inputs, and computes a\nvalue that estimates how far away the output is from the target.\n\nThere are several different\n`loss functions <http://pytorch.org/docs/nn.html#loss-functions>`_ under the\nnn package .\nA simple loss is: ``nn.MSELoss`` which computes the mean-squared error\nbetween the input and the target.\n\nFor example:\n\n"
91
+
]
92
+
},
93
+
{
94
+
"cell_type": "code",
95
+
"execution_count": null,
96
+
"metadata": {
97
+
"collapsed": false
98
+
},
99
+
"outputs": [],
100
+
"source": [
101
+
"output = net(input)\ntarget = torch.randn(10) # a dummy target, for example\ntarget = target.view(1, -1) # make it the same shape as output\ncriterion = nn.MSELoss()\n\nloss = criterion(output, target)\nprint(loss)"
102
+
]
103
+
},
104
+
{
105
+
"cell_type": "markdown",
106
+
"metadata": {},
107
+
"source": [
108
+
"Now, if you follow ``loss`` in the backward direction, using its\n``.grad_fn`` attribute, you will see a graph of computations that looks\nlike this:\n\n::\n\n input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n -> view -> linear -> relu -> linear -> relu -> linear\n -> MSELoss\n -> loss\n\nSo, when we call ``loss.backward()``, the whole graph is differentiated\nw.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``\nwill have their ``.grad`` Tensor accumulated with the gradient.\n\nFor illustration, let us follow a few steps backward:\n\n"
"Backprop\n--------\nTo backpropagate the error all we have to do is to ``loss.backward()``.\nYou need to clear the existing gradients though, else gradients will be\naccumulated to existing gradients.\n\n\nNow we shall call ``loss.backward()``, and have a look at conv1's bias\ngradients before and after the backward.\n\n"
127
+
]
128
+
},
129
+
{
130
+
"cell_type": "code",
131
+
"execution_count": null,
132
+
"metadata": {
133
+
"collapsed": false
134
+
},
135
+
"outputs": [],
136
+
"source": [
137
+
"net.zero_grad() # zeroes the gradient buffers of all parameters\n\nprint('conv1.bias.grad before backward')\nprint(net.conv1.bias.grad)\n\nloss.backward()\n\nprint('conv1.bias.grad after backward')\nprint(net.conv1.bias.grad)"
138
+
]
139
+
},
140
+
{
141
+
"cell_type": "markdown",
142
+
"metadata": {},
143
+
"source": [
144
+
"Now, we have seen how to use loss functions.\n\n**Read Later:**\n\n The neural network package contains various modules and loss functions\n that form the building blocks of deep neural networks. A full list with\n documentation is `here <http://pytorch.org/docs/nn>`_.\n\n**The only thing left to learn is:**\n\n - Updating the weights of the network\n\nUpdate the weights\n------------------\nThe simplest update rule used in practice is the Stochastic Gradient\nDescent (SGD):\n\n ``weight = weight - learning_rate * gradient``\n\nWe can implement this using simple python code:\n\n.. code:: python\n\n learning_rate = 0.01\n for f in net.parameters():\n f.data.sub_(f.grad.data * learning_rate)\n\nHowever, as you use neural networks, you want to use various different\nupdate rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.\nTo enable this, we built a small package: ``torch.optim`` that\nimplements all these methods. Using it is very simple:\n\n"
145
+
]
146
+
},
147
+
{
148
+
"cell_type": "code",
149
+
"execution_count": null,
150
+
"metadata": {
151
+
"collapsed": false
152
+
},
153
+
"outputs": [],
154
+
"source": [
155
+
"import torch.optim as optim\n\n# create your optimizer\noptimizer = optim.SGD(net.parameters(), lr=0.01)\n\n# in your training loop:\noptimizer.zero_grad() # zero the gradient buffers\noutput = net(input)\nloss = criterion(output, target)\nloss.backward()\noptimizer.step() # Does the update"
156
+
]
157
+
},
158
+
{
159
+
"cell_type": "markdown",
160
+
"metadata": {},
161
+
"source": [
162
+
".. Note::\n\n Observe how gradient buffers had to be manually set to zero using\n ``optimizer.zero_grad()``. This is because gradients are accumulated\n as explained in `Backprop`_ section.\n\n"
0 commit comments