-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathperceptron_notes.Rmd
160 lines (120 loc) · 7.17 KB
/
perceptron_notes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: "Notes on Perceptron"
author: "Faiyaz Hasan"
date: "July 19, 2016"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Introduction
------------
The Perceptron is a simple learning algorithm designed by Frank Rosenblatt. We'll get into the formalism in a bit. First consider an extreme example: A data set containing a list of weights and the corresponding binary labels saying either elephant or mouse depending on the weight. The perceptron would take in training dataset and eventually learn to identify the correct classes based on the weight of the object. Of course, certain condition need to be satisfied for the convergence properties of the model.
Perceptron - Binary classification algorithm
--------------------------------------------
Here, we use the Iris data set.
```{r, iris exploratory data analysis + clean up data frame}
# load iris data set
data(iris)
# subset of iris data frame - extract only species versicolor and setosa
# we will only focus on the sepal and petal lengths of the dataset
irissubdf <- iris[1:100, c(1, 3, 5)]
names(irissubdf) <- c("sepal", "petal", "species")
head(irissubdf)
# plot data - a picture is worth a 1000 words. Melt data => then ggplot
library(ggplot2)
ggplot(irissubdf, aes(x = sepal, y = petal)) +
geom_point(aes(colour=species, shape=species), size = 3) +
xlab("sepal length") +
ylab("petal length") +
ggtitle("Species vs sepal and petal lengths")
# add binary labels corresponding to species - Initialize all values to 1
# add setosa label of -1. The binary +1, -1 labels are in the fourth
# column. It is better to create two separate data frames: one containing
# the attributes while the other contains the class values.
irissubdf[, 4] <- 1
irissubdf[irissubdf[, 3] == "setosa", 4] <- -1
x <- irissubdf[, c(1, 2)]
y <- irissubdf[, 4]
# head and tail of data
head(x)
head(y)
```
In this section, let us simply implement the perceptron learning algorithm to this data. In the next section, we will work on the convergence of the weight factors and fiddling with the learning rate.
```{r, perceptron algorithm}
# write function that takes in the data frame, learning rate - eta, and number of epochs - n.iter and updates the weight factor. At this stage, I am only conserned with the final weight and the number of epochs required for the weight to converge
perceptron <- function(x, y, eta, niter) {
# initialize weight vector
weight <- rep(0, dim(x)[2] + 1)
errors <- rep(0, niter)
# loop over number of epochs niter
for (jj in 1:niter) {
# loop through training data set
for (ii in 1:length(y)) {
# Predict binary label using Heaviside activation
# function
z <- sum(weight[2:length(weight)] *
as.numeric(x[ii, ])) + weight[1]
if(z < 0) {
ypred <- -1
} else {
ypred <- 1
}
# Change weight - the formula doesn't do anything
# if the predicted value is correct
weightdiff <- eta * (y[ii] - ypred) *
c(1, as.numeric(x[ii, ]))
weight <- weight + weightdiff
# Update error function
if ((y[ii] - ypred) != 0.0) {
errors[jj] <- errors[jj] + 1
}
}
}
# weight to decide between the two species
print(weight)
return(errors)
}
err <- perceptron(x, y, 1, 10)
plot(1:10, err, type="l", lwd=2, col="red", xlab="epoch #", ylab="errors")
title("Errors vs epoch - learning rate eta = 1")
```
Note that increasing the learning rate does exactly as its name suggests, the only problem being that the rate might not be sensitive enough to ensure convergence. While convergence is ensured, a larger learning rate (between 0 and 1) leads to a faster convergence.
How to implement a multiclass classification in the perceptron?
---------------------------------------------------------------
The iris data set we began with had three different species. We removed Virginica for the sake of binary classification. Now, using simple logic, I will extend the perceptron algorithm to deal with three species.
The game plan is as follows:
* First create a data frame with sepal and petal lengths.
* Label the corresponding species data as 1 for virginica and -1 for setosa **OR** versicolor.
* The weight obtained after convergence will decide between virginica and not virginica.
* Then, the weight from the previous section can be used to determine between setosa and versicolor.
* Possible issue that might crop up is that the sepal and petal lengths might not be enough to classify the species. So, begin by plotting it.
```{r, species vs sepal and petal length}
# iris data subset
irisdata <- iris[, c(1, 3, 5)]
names(irisdata) <- c("sepal", "petal", "species")
# ggplot the data
ggplot(irisdata, aes(x = sepal, y = petal)) +
geom_point(aes(colour=species, shape=species), size = 3) +
xlab("sepal length") +
ylab("petal length") +
ggtitle("Species vs sepal and petal lengths")
```
From the plot, we can see that the data is not linearly separable based on these attributes. More attributes are needed. So, I'll keep all the attributes of the iris data set (sepal and petal lengths and widths) to see if the weight can converge or not.
```{r, virginica classification}
# subset of properties of flowers of iris data set
x <- iris[, 1:4]
names(x) <- tolower(names(x))
# create species labels
y <- rep(-1, dim(x)[1])
y[iris[, 5] == "virginica"] <- 1
# compute and plot error
err <- perceptron(x, y, 0.01, 50)
plot(1:50, err, type="l", lwd=2, col="red", xlab="epoch #", ylab="errors")
title("Errors in differentiating Virginica vs epoch - learning rate eta = 0.01")
```
This plot is even more informative than the differentiation between setosa and versicolor. First of all note that though we get a convergence of the weight, it does not converge to 0. The minimum errors we can get is $2$. Interestingly enough, now we can assign a semi-meaningful confidence level to our predictions since we got `r 2/dim(iris)[1]`of the predicitons wrong with out weights.
Another thought that arises is that I should separate out setosa from non-setosa first since it is linearly separably. Any uncertainty that will be there is in the next step where we try to distinguish between non linearly-separable data sets. If we do it in reverse order, then the uncertainty will be propagated and become larger.
List of questions and curiosities
---------------------------------
1. **How to figure out the optimum learning rate from the data?**