-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy path420_basicModelBuildingUnsupervised.Rmd
77 lines (55 loc) · 1.95 KB
/
420_basicModelBuildingUnsupervised.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# (PART) Unsupervised Models {-}
# Unsupervised or Descriptive modeling
From the descriptive (unsupervised) point of view, patterns are found to predict future behaviour or estimate. This include association rules, clustering, or tree clustering which aim at grouping together objects (e.g., animals) into successively larger clusters, using some measure of similarity or distance. The dataset will be as the previous table without the $C$ class attribute
| Att~1~| | Att~n~ |
|-------|-----| -------|
| a~11~ | ... | a~1n~ |
| a~21~ | ... | a~2n~ |
| ... | ... | ... |
| a~m1~ | ... | a~mn~ |
## Clustering
```{r warning=FALSE, message=FALSE}
library(foreign)
library(fpc)
kc1 <- read.arff("./datasets/defectPred/D1/KC1.arff")
# Split into training and test datasets
set.seed(1)
ind <- sample(2, nrow(kc1), replace = TRUE, prob = c(0.7, 0.3))
kc1.train <- kc1[ind==1, ]
kc1.test <- kc1[ind==2, ]
# No class
kc1.train$Defective <- NULL
ds <- dbscan(kc1.train, eps = 0.42, MinPts = 5)
kc1.kmeans <- kmeans(kc1.train, 2)
```
### k-Means
```{r warning=FALSE, message=FALSE}
library(reshape, quietly=TRUE)
library(graphics)
kc1kmeans <- kmeans(sapply(na.omit(kc1.train), rescaler, "range"), 10)
#plot(kc1kmeans, col = kc1kmeans$cluster)
#points(kc1kmeans$centers, col = 1:5, pch = 8)
```
## Association rules
```{r warning=FALSE, message=FALSE}
library(arules)
# x <- as.numeric(kc1$LOC_TOTAL)
# str(x)
# summary(x)
# hist(x, breaks=30, main="LoC Total")
# xDisc <- discretize(x, categories=5)
# table(xDisc)
for(i in 1:21) kc1[,i] <- discretize(kc1[,i], method = "interval", breaks = 5)
rules <- apriori(kc1,
parameter = list(minlen=3, supp=0.05, conf=0.35),
appearance = list(rhs=c("Defective=Y"),
default="lhs"),
control = list(verbose=F))
#rules <- apriori(kc1,
# parameter = list(minlen=2, supp=0.05, conf=0.3),
# appearance = list(rhs=c("Defective=Y", "Defective=N"),
# default="lhs"))
inspect(rules)
library(arulesViz)
plot(rules)
```