Skip to content

Commit

Permalink
cataract detection using cnn added
Browse files Browse the repository at this point in the history
  • Loading branch information
17arindam committed Oct 14, 2024
1 parent c045aaa commit ff96e95
Show file tree
Hide file tree
Showing 210 changed files with 802 additions and 0 deletions.
242 changes: 242 additions & 0 deletions Prediction Models/Cataract-Detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# BvS
**Dawn of AI**
An Image classifier to identify whether the given image is Batman or Superman using a CNN with high accuracy.
(Without using Dogs Vs Cats, From getting images from google to saving our trained model for reuse.)

# What are we gonna do:
* We will build a 3 layered **Community Standard CNN Image classifier** to classify whether the given image is a image of Batman or Superman.
* Learn how to build a model from scratch in Tensoflow which is accurate.
* How to train and test it.
* How to save and use it further.

**Setup:**
* Python 3.5
* Tensorflow 1.5.0
* CUDA 9.0
* CUDANN 7.0.5

Indepth explanation of each section:
[Medium post with detailed step by step explanation](https://medium.com/@ipaar3/how-i-built-a-convolutional-image-classifier-using-tensorflow-from-scratch-f852c34e1c95) for deeper understanding of CNNs and architecture of the network.

# Data:

### Collect data:
* [Google Images Downloader](https://github.com/hardikvasa/google-images-download).It's fast, easy, simple and efficient.
* I've collected 300 images each for Supes and Batsy respectively, But more data is highly preferable. Try to collect as much clean data as possible.
<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/image_collection.png" width="800" height="400">
</p>

### Augmentation:
* 300 is not a number at all in Deep learning. So, we must Augment the images to get more images from whatever we collected.
* You can use the following to do it easily, [Augmentor](https://github.com/mdbloice/Augmentor).
* [This](https://github.com/perseus784/BvS/blob/master/augment.py) is the code I've used for augmenting my images.
* Same image, augmented using various transformations. I had 3500 images each after augmentation for each class.
*Careful: While Augmenting, be careful about what kind of transformation you use. You can mirror flip a Bat Logo but cannot make it upside down.*

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/augment.png" width="800" height="400">
</p>

### Standardize:
* After Augmentation, Make a folder named rawdata in the current working directory.
* Create folders with their respective class names and put all the images in their respective folders.
* Run [this](https://github.com/perseus784/BvS/blob/master/preprocessing.py) file in the same directory as rawdata.
* This will resize all the images to a standard resolution and same format and put it in a new folder named data.
**Note:** As I embedded it in *trainer.py*, it is unnecessary to run it explicitly.
**Update** :You can get the **data** folder itself from [here(50mb)](https://drive.google.com/open?id=1GUPBBdLlqStnxjhISkxT1qOf1XPnmRcF). Just download and extract!.

<p align="left">
<img src="https://github.com/perseus784/BvS/blob/master/media/convert.png" width="400" height="200">
<img src="https://github.com/perseus784/BvS/blob/master/media/file_structure.png" width="300" height="400">
</p>


# Architecture:
### A Simple Architecture:
> For detailed explanation of Architecture and CNNs please read the medium [post](https://medium.com/@ipaar3/how-i-built-a-convolutional-image-classifier-using-tensorflow-from-scratch-f852c34e1c95).
I've explained CNNs in depth over there, I highly recommend reading it.

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/convolution_nn_medium_post.png" width="800" height="400">
</p>

In code:

#level 1 convolution
network=model.conv_layer(images_ph,5,3,16,1)
network=model.pooling_layer(network,5,2)
network=model.activation_layer(network)

#level 2 convolution
network=model.conv_layer(network,4,16,32,1)
network=model.pooling_layer(network,4,2)
network=model.activation_layer(network)

#level 3 convolution
network=model.conv_layer(network,3,32,64,1)
network=model.pooling_layer(network,3,2)
network=model.activation_layer(network)

#flattening layer
network,features=model.flattening_layer(network)

#fully connected layer
network=model.fully_connected_layer(network,features,1024)
network=model.activation_layer(network)
#output layer
network=model.fully_connected_layer(network,1024,no_of_classes)

### A Brief Architecture:
With dimentional informations:
<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/brief_architecture.png" width="800" height="400">
</p>

# Training:
* Clone this repo.
* Do the Augmentation.
* Put the images in thier respective folders in *rawdata*.

rawdata/batman: 3810 images
rawdata/superman: 3810 images

**Update** :You can get the **data** folder itself from [here(50mb)](https://drive.google.com/open?id=1GUPBBdLlqStnxjhISkxT1qOf1XPnmRcF). Just download and extract!.

Our file structure should look like this,
<p align="left">
<img src="https://github.com/perseus784/BvS/blob/master/media/file_placement.png" width="500" height="400">
<img src="https://github.com/perseus784/BvS/blob/master/media/fstr.png" width="300" height="400">
</p>

***data*** folder will be generated automatically by trainer.py from raw_data if *data* folder does not exist.

* **Configuration:** If you want to edit something, you can do it using [this](https://github.com/perseus784/BvS/blob/master/config.py) file.

raw_data='rawdata'
data_path='data'
height=100
width=100
all_classes = os.listdir(data_path)
number_of_classes = len(all_classes)
color_channels=3
epochs=300
batch_size=10
model_save_name='checkpoints\\'


* **Run** [trainer.py](https://github.com/perseus784/BvS/blob/master/trainer.py).
* Wait for few hours.
* For me it took **8 hrs for 300 epochs**. I did it in my laptop which has **i5 processors, 8 Gigabytes of RAM, Nvidia geforce 930M 2GB setup**. You can end the process anytime if saturated, as the model will be saved frequently.

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/train_info.png" width="800" height="400">
</p>

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/training_loss.png" width="800" height="400">
</p>

### Saving our model:
Once training is over, we can see a folder named checkpoints is created which contains our model for which we trained. These two simple lines does that for us in tensorflow:

saver = tf.train.Saver(max_to_keep=4)
saver.save(session, model_save_name)
You can get my pretrained model [here.](https://drive.google.com/file/d/1l9_ByLxtGqRMJxWvNr9Ls7XDFRxsqywN/view?usp=sharing)

We have three files in our checkpoints folder,
* .meta file - it has your graph structure saved.
* .index - it identifies the respective checkpoint file.
* .data - it stores the values of all the variables.

How to use it?
Tensorflow is so well built that, it does all the heavy lifting for us. We just have to write four simple lines to load and infer our model.

#Create a saver object to load the model
saver = tf.train.import_meta_graph
(os.path.join(model_folder,'.meta'))
#restore the model from our checkpoints folder
saver.restore(session,os.path.join('checkpoints','.\\'))
#Create graph object for getting the same network architecture
graph = tf.get_default_graph()
#Get the last layer of the network by it's name which includes all the previous layers too
network = graph.get_tensor_by_name("add_4:0")
Yeah, simple. Now that we got our network as well as the tuned values, we have to pass an image to it using the same placeholders(Image, labels).

im_ph= graph.get_tensor_by_name("Placeholder:0")
label_ph = graph.get_tensor_by_name("Placeholder_1:0")

If you run it now, you can see the output as [1234,-4322] like that. While this is right as the maximum value index represents the class, this is not as convenient as representing it in 1 and 0. Like this [1,0]. For that we should include a line of code before running it,

network=tf.nn.sigmoid(network)

While we could have done this in our training architecture itself and nothing would have changed, I want to show you that, you can add layers to our model even now, even in prediction stage. Flexibility.

# Inference time:
> Your training is nothing, If you don't have the will to act - Ra's Al Ghul.
To run a simple prediction,
* Edit the image name in [predict.py](https://github.com/perseus784/BvS/blob/master/predict.py).
* Download the model files and extract in the same folder.
* Run [predict.py](https://github.com/perseus784/BvS/blob/master/predict.py).

image='sup.jpg'
img=cv2.imread(image)
session=tf.Session()
img=cv2.resize(img,(100,100))
img=img.reshape(1,100,100,3)
labels = np.zeros((1, 2))
# Creating the feed_dict that is required to be feed the io:
feed_dict_testing = {im_ph: img, label_ph: labels}
result=session.run(network, feed_dict=feed_dict_testing)
print(result)

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/output_screeenshot.png" width="800" height="300">
</p>

You can see the results as [1,0]{Batman}, [0,1]{Superman} corresponding to the index.
*Please note that this is not one-hot encoding.*

# Accuracy:
It is actually pretty good. It is almost right all the time. I even gave it an image with both Batman and Superman, it actually gave me values which are almost of same magnitude(after removing the sigmoid layer that we added just before).

*Comment out **network=tf.nn.sigmoid(network)** in predict.py to see the real magnitudes as this will only give squashed outputs.*

From here on you can do whatever you want with those values.
Initially loading the model will take some time(70 seconds) but once the model is loaded, you can put a for loop or something to throw in images and *get output in a second or two!*

# Tensorboard:
I have added some additional lines in the training code for tensorboard options. Using tensorboard we can track progress of our training even while training and after. You can also see your network structure and all the other components inside it.*It is very useful for visualizing the things happening.*
To start it, just go to the directory and open command line,

tensorboard --logdir checkpoints

You should see the following,

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/Inkedtensorboard_start_LI.jpg" width="800" height="300">
</p>

Now type the same address in in your browser. Your tensorboard is now started. Play with it.

# Graph Structure Visualization:
Yeah, you can see our entire model with dimensions in each layer and operations here!

<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/tensorboard_graph.png" width="800" height="400">
</p>

# Future Implementations:
While this works for Binary classification, it will also work for Multiclass classification but not as well. We might need to alter architecture and build a larger model depending on the number of classes we want.

> So, that's how Batman wins!
<p align="center">
<img src="https://github.com/perseus784/BvS/blob/master/media/lego-batman-movie-tuxedo.jpg" alt="Batwin" width="800" height="400">
</p>
Please Star the repo if you like it.
For any suggestions, doubts, clarifications please mail: [email protected] or raise an issue!.
"# Cataract-Detection"
Binary file not shown.
15 changes: 15 additions & 0 deletions Prediction Models/Cataract-Detection/augment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import sys
import Augmentor

folder_name='folder'
p= Augmentor.Pipeline(source_directory=folder_name,save_format="png")
p.flip_left_right(0.5)
p.black_and_white(0.1)
p.gaussian_distortion(probability=0.4, grid_width=7, grid_height=6
, magnitude=6, corner="ul", method="in", mex=0.5, mey=0.5, sdx=0.05, sdy=0.05)

p.rotate(0.3, 10,10)
p.skew(0.4,0.5)
p.skew_tilt(0.6,0.8)
p.skew_left_right(0.5, magnitude=0.8)
p.sample(10000)
59 changes: 59 additions & 0 deletions Prediction Models/Cataract-Detection/build_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import tensorflow as tf


#model's unit definitions
class model_tools:
# Defined functions for all the basic tensorflow components that we needed for building a model.
# function definitions are in the respective comments

def add_weights(self,shape):
# a common method to create all sorts of weight connections
# takes in shapes of previous and new layer as a list e.g. [2,10]
# starts with random values of that shape.
return tf.Variable(tf.truncated_normal(shape=shape, stddev=0.05))

def add_biases(self,shape):
# a common method to add create biases with default=0.05
# takes in shape of the current layer e.g. x=10
return tf.Variable(tf.constant(0.05, shape=shape))

def conv_layer(self,layer, kernel, input_shape, output_shape, stride_size):
#convolution occurs here.
#create weights and biases for the given layer shape
weights = self.add_weights([kernel, kernel, input_shape, output_shape])
biases = self.add_biases([output_shape])
#stride=[image_jump,row_jump,column_jump,color_jump]=[1,1,1,1] mostly
stride = [1, stride_size, stride_size, 1]
#does a convolution scan on the given image
layer = tf.nn.conv2d(layer, weights, strides=stride, padding='SAME') + biases
return layer

def pooling_layer(self,layer, kernel_size, stride_size):
# basically it reduces the complexity involved by only taking the important features alone
# many types of pooling is there.. average pooling, max pooling..
# max pooling takes the maximum of the given kernel
#kernel=[image_jump,rows,columns,depth]
kernel = [1, kernel_size, kernel_size, 1]
#stride=[image_jump,row_jump,column_jump,color_jump]=[1,2,2,1] mostly
stride = [1, stride_size, stride_size, 1]
return tf.nn.max_pool(layer, ksize=kernel, strides=stride, padding='SAME')

def flattening_layer(self,layer):
#make it single dimensional
input_size = layer.get_shape().as_list()
new_size = input_size[-1] * input_size[-2] * input_size[-3]
return tf.reshape(layer, [-1, new_size]),new_size

def fully_connected_layer(self,layer, input_shape, output_shape):
#create weights and biases for the given layer shape
weights = self.add_weights([input_shape, output_shape])
biases = self.add_biases([output_shape])
#most important operation
layer = tf.matmul(layer,weights) + biases # mX+b
return layer

def activation_layer(self,layer):
# we use Rectified linear unit Relu. it's the standard activation layer used.
# there are also other layer like sigmoid,tanh..etc. but relu is more efficent.
# function: 0 if x<0 else x.
return tf.nn.relu(layer)
17 changes: 17 additions & 0 deletions Prediction Models/Cataract-Detection/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import preprocessing as ppr
import os

#Parameters
raw_data='rawdata'
data_path='data'
height=100
width=100
if not os.path.exists(data_path):
ppr.image_processing(raw_data,data_path,height,width)
all_classes = os.listdir(data_path)
number_of_classes = len(all_classes)
color_channels=3
epochs=300
batch_size=10
batch_counter=0
model_save_name='checkpoints/'
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 42 additions & 0 deletions Prediction Models/Cataract-Detection/model_architecture.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from build_model import model_tools
import tensorflow as tf
model=model_tools()

def generate_model(images_ph,number_of_classes):
#MODEL ARCHITECTURE:
#level 1 convolution
network=model.conv_layer(images_ph,5,3,16,1)
network=model.pooling_layer(network,5,2)
network=model.activation_layer(network)
print(network)

#level 2 convolution
network=model.conv_layer(network,4,16,32,1)
network=model.pooling_layer(network,4,2)
network=model.activation_layer(network)
print(network)

#level 3 convolution
network=model.conv_layer(network,3,32,64,1)
network=model.pooling_layer(network,3,2)
network=model.activation_layer(network)
print(network)

#flattening layer
network,features=model.flattening_layer(network)
print(network)

#fully connected layer
network=model.fully_connected_layer(network,features,1024)
network=model.activation_layer(network)
print(network)

#output layer
network=model.fully_connected_layer(network,1024,number_of_classes)
print(network)
return network


if __name__== "__main__":
images_ph = tf.placeholder(tf.float32, shape=[None, 100,100,3])
generate_model(images_ph,2)
Loading

0 comments on commit ff96e95

Please sign in to comment.