forked from bbest/eds232-ml
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlab4c_inat.Rmd
178 lines (134 loc) · 9.2 KB
/
lab4c_inat.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: "Lab 4c. Deep Learning - iNaturalist"
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = T, warning = F)
```
# Deep Learning with R / Python Exercises
You'll first learn about Computer Vision techniques by going through the Chapter 5 lab exercises:
- 5.1 Introduction to convnets
R: [html](./lab4c_5.1.intro-convnets.html), [Rmd](https://raw.githubusercontent.com/bbest/eds232-ml/main/lab4c_5.1.intro-convnets.Rmd) ; Python: [html](https://github.com/bbest/deep-learning-with-python-notebooks/blob/master/first_edition/5.1-introduction-to-convnets.ipynb), [ipynb](https://github.com/bbest/deep-learning-with-python-notebooks/raw/master/first_edition/5.1-introduction-to-convnets.ipynb)
- 5.2 Training a convnet from scratch on a small dataset
R: [html](./lab4c_5.2.small-convnets.html), [Rmd](https://raw.githubusercontent.com/bbest/eds232-ml/main/lab4c_5.2.small-convnets.Rmd) ; Python: [html](https://github.com/bbest/deep-learning-with-python-notebooks/blob/master/first_edition/5.2-using-convnets-with-small-datasets.ipynb), [ipynb](https://github.com/bbest/deep-learning-with-python-notebooks/raw/master/first_edition/5.2-using-convnets-with-small-datasets.ipynb)
The subsequent lab exercises meet the limits of using a CPU over a GPU, which is not available on `taylor.bren.ucsb.edu`. Here's as far as I was able to get for demonstration sake, but you're not expected to run this. You might want to try if you have personal computer with a GPU setup.
- 5.3 Using a pretrained convnet
R: [html](./lab4c_5.3-using-a-pretrained-convnet.html), [Rmd](https://raw.githubusercontent.com/bbest/eds232-ml/main/lab4c_5.3-using-a-pretrained-convnet.Rmd) ; Python: [html](https://github.com/bbest/deep-learning-with-python-notebooks/blob/master/first_edition/5.3-using-a-pretrained-convnet.ipynb), [ipynb](https://github.com/bbest/deep-learning-with-python-notebooks/raw/master/first_edition/5.3-using-a-pretrained-convnet.ipynb)
# iNaturalist
The main lab that you'll turn in is to apply these techniques to a small subset of the iNaturalist species imagery. These data were downloaded from the links provided at [github.com/visipedia/inat_comp:2021/](https://github.com/visipedia/inat_comp/tree/master/2021). Of all the 10,000 species and many images for each from training (Train), training mini (Train Mini), validation (Val) and test images, you'll draw only from the Train Mini set of images:
![](https://github.com/visipedia/inat_comp/raw/master/2021/assets/train_val_distribution.png)
```{r, echo=F, eval=F}
# in Terminal:
# cd /courses/EDS232; mkdir 'inaturalist-2021'
# curl -o train_mini.tar.gz https://ml-inat-competition-datasets.s3.amazonaws.com/2021/train_mini.tar.gz
# tar -xf train_mini.tar.gz
librarian::shelf(
dplyr, glue, jsonlite, listviewer, purrr, readr, tidyjson, tidyr)
train_mini <- jsonlite::read_json("~/Desktop/iNat/train_mini.json")
write_meta <- function(m){
train_mini[[m]] %>%
tidyjson::spread_all() %>%
tibble() %>%
select(-document.id, -`..JSON`) %>%
write_csv(
glue("~/Desktop/iNat/train_mini_{m}.csv"))
}
write_meta("images")
write_meta("annotations")
write_meta("categories")
```
The first step is to move the images into directories for the variety of models. The `keras::`[`flow_images_from_directory()`](https://keras.rstudio.com/reference/flow_images_from_directory.html) expects the first argument `directory` to "contain one subdirectory per class". We are building models for two species `spp2` (binary) and ten species `spp10` (multiclass), plus we want to have `train` (n=30), `validation` (n=10) and `test` (n=10) images assigned to each. So we want a directory structure that looks something like this:
```
├── spp10
│ ├── test
│ │ ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│ │ │ ├── cfd17d74-c7aa-49a2-9417-0a4e6aa4170d.jpg
│ │ │ ├── d6c2cf8f-89ef-40a2-824b-f51c85be030b.jpg
│ │ │ └── ...[+n_img=8]
│ │ ├── 06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens
│ │ │ └── ...[n_img=10]
│ │ └── ...[+n_spp=8]
│ ├── train
│ │ ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│ │ │ └── ...[n_img=30]
│ │ └── ...[+n_spp=9]
│ └── validation
│ ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│ │ └── ...[n_img=10]
│ └── ...[+n_spp=9]
└── spp2
├── test
│ └── ...[n_spp=2]
├── train
│ └── ...[n_spp=2]
└── validation
└── ...[n_spp=2]
```
```{r}
librarian::shelf(
digest, dplyr, DT, glue, purrr, readr, stringr, tidyr)
# path to folder containing species directories of images
dir_src <- "/courses/EDS232/inaturalist-2021/train_mini"
dir_dest <- "~/inat"
dir.create(dir_dest, showWarnings = F)
# get list of directories, one per species (n = 10,000 species)
dirs_spp <- list.dirs(dir_src, recursive = F, full.names = T)
n_spp <- length(dirs_spp)
# set seed (for reproducible results)
# just before sampling (otherwise get different results)
# based on your username (unique amongst class)
Sys.info()[["user"]] %>%
digest::digest2int() %>%
set.seed()
i10 <- sample(1:n_spp, 10)
# show the 10 indices sampled of the 10,000 possible
i10
# show the 10 species directory names
basename(dirs_spp)[i10]
# show the first 2 species directory names
i2 <- i10[1:2]
basename(dirs_spp)[i2]
# setup data frame with source (src) and destination (dest) paths to images
d <- tibble(
set = c(rep("spp2", 2), rep("spp10", 10)),
dir_sp = c(dirs_spp[i2], dirs_spp[i10]),
tbl_img = map(dir_sp, function(dir_sp){
tibble(
src_img = list.files(dir_sp, full.names = T),
subset = c(rep("train", 30), rep("validation", 10), rep("test", 10))) })) %>%
unnest(tbl_img) %>%
mutate(
sp = basename(dir_sp),
img = basename(src_img),
dest_img = glue("{dir_dest}/{set}/{subset}/{sp}/{img}"))
# show source and destination for first 10 rows of tibble
d %>%
select(src_img, dest_img)
# iterate over rows, creating directory if needed and copying files
d %>%
pwalk(function(src_img, dest_img, ...){
dir.create(dirname(dest_img), recursive = T, showWarnings = F)
file.copy(src_img, dest_img) })
# uncomment to show the entire tree of your destination directory
# system(glue("tree {dir_dest}"))
```
Your task is to apply your deep learning skills to build the following models:
1. **2 Species (binary classification) - neural net**. Draw from [3.4 🍿 Movies (binary classification)](./lab4b_examples.html). You'll need to pre-process the images to be a consistent shape first though -- see 5.2.4 Data preprocessing.
1. **2 Species (binary classification) - convolutional neural net**. Draw from the [dogs vs cats example](https://bbest.github.io/eds232-ml/lab4c_5.2.small-convnets.html).
1. **10 Species (multi-class classification) - neural net**. Draw from [3.5 📰 Newswires (multi-class classification)](./lab4b_examples.html).
1. **10 Species (multi-class classification) - convolutional neural net**. Draw from [dogs vs cats example](https://bbest.github.io/eds232-ml/lab4c_5.2.small-convnets.html) and update necessary values to go from binary to mult-class classification.
In your models, be sure to include the following:
- Split the original images per species (n=50) into train (n=30), validate (n=10) and test (n=10). These are almost absurdly few files to feed into these complex deep learning models but will serve as a good learning example.
- Include accuracy metric and validation in the fitting process and history plot.
- Evaluate loss and accuracy on your test model results. Compare standard neural network and convolutional neural network results.
# Deep Learning Cheat Sheets
Parameterizing all the image processing and layers can be confusing. Here are a few cheat sheets to help with a deeper understanding:
* [Deep learning with Keras cheatsheet | RStudio](https://raw.githubusercontent.com/rstudio/cheatsheets/main/keras.pdf)
* [Understanding Convolutions and Pooling in Neural Networks: a simple explanation | by Miguel Fernández Zafra | Towards Data Science](https://towardsdatascience.com/understanding-convolutions-and-pooling-in-neural-networks-a-simple-explanation-885a2d78f211)
* [Classification: Image Classification Cheatsheet | Codecademy](https://www.codecademy.com/learn/dlsp-classification-track/modules/dlsp-image-classification/cheatsheet)
* [CS 230 - Convolutional Neural Networks Cheatsheet | Stanford](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
* [CS231n Convolutional Neural Networks for Visual Recognition | Stanford](https://cs231n.github.io/convolutional-networks/)
# Submit Lab 4
To submit Lab 4, please submit the path (`/Users/*`) on [taylor.bren.ucsb.edu](https://taylor.bren.ucsb.edu) to your iNaturalist Rmarkdown (`*.Rmd`) or Jupyter Notebook (`*.pynb`) file here:
- [Submit Lab 4. iNaturalist](https://docs.google.com/forms/d/e/1FAIpQLSddHuLejY_V-PAIQOO19TLHAyyQRyTUSsfEwpgeJP-Jx4MClA/viewform?usp=sf_link){target="_blank"}