streetview_cnn

CNN for reading street view images of building addresses.

Detailed Project Description

This project explores different implementations of convolutional neural networks for the task of reading street view images of building addresses. We used PyTorch to implement an adaptation of the LeNet architecture in addition to a basic exploratory CNN.

Project Layout

Run tree streetview_cnn/ -A -L x in the directory above the project to get the project layout where -L is the directory level to go down. An example:

streetview_cnn/
├── README.md
├── Project.ipynb
├── Pipfile
├── Pipfile.lock
├── data
├── models
│   ├── LeNet_cnn.pth
│   └── basic_cnn.pth
└── results
    ├── Confusion_BasicCNN.png
    ├── Confusion_LeNet.png
    ├── Error_Examples_BasicCNN.png
    └── Error_Examples_LeNet.png

Project.ipynb: Notebook used for downloading data, training models, and analyzing results.
data: Directory for data, will be automatically filled after running Project.ipynb.
models: Directory for saved models.
results: Directory for resulting plots and analysis of model performance.

Running the Project

This repository uses Pipenv to manage the virtual environment by using the Pipfile to load all necessary packages. The virtual environment allows anyone to properly run the project and avoid any possible dependency and version issues. You may be able to properly run the project without this virtual environment, but to avoid issues we recommend following the instructions detailed below.

Start the virtual environment as specified:
- NOTE: Our environment uses Python 3.8 so it must be installed if not present on your local machine.
  - https://www.python.org/downloads/release/python-380/
- NOTE: If pipenv is not installed, it can be installed here:
  - https://pypi.org/project/pipenv/.
- Use terminal to navigate into your local copy of the repository, and run pipenv shell
Open JupyterLab (or any IDE of your choice):
- JupyterLab which is already included within the virtual environment so anyone can use this method regardless of what you have installed on your local machine.
  - Run jupyter lab in the terminal that you opened in step 2.
- You could alternatively use any IDE of your choice, such as VS Code, as long as you ensure the virtual environment is properly activated.
Run Project.ipynb:
- Running the first two cells will download the data and thus fill the ‘data’ folder.
- NOTE: Running the entire notebook will result in retraining all the models which may take some time depending on your machine
  - You can simply skip the training cells (cell 8 and 12) and just load the models as they are already saved in the ‘models’ folder.
Close JupyterLab once you are finished:
- Press (Control+C) in the terminal window (if using MacOS).
You can now close the virtual environment:
- Run exit within the terminal window.

Results & Analysis

Between the two tested models, the variation on LeNet outperformed the basic CNN architecture at an accuracy of 88.9% to 87.67%. We believe this improvement, while marginal, is evidence of a reasonable tradeoff between computational resources and complexity (more fully connected layers in the LeNet model) versus overfitting. An interesting aspect of the models to analyze is the misclassification of digits. For one, the most common misclassification was of the digit ‘3’ in both models, as can be seen in the following confusion matrices.

Basic CNN	LeNet CNN

However, due to the uninterpretable nature of CNN layers, it's hard to pinpoint exactly why exactly that is the case. However, from a qualitative standpoint, the typographical nature of the ‘9’ digit is very similar to the ‘3’ digit, where if the top curve of ‘3’ is close to the center, it can look like a closed loop that forms a ‘9’. Furthermore, the slight increase in saturation of the colors for the LeNet confusion matrix corresponds to the increased accuracy in the model.

The achieved accuracy also was in line with other models examined during preliminary literature review. This suggests a degree of interoperability of the images at such a small scale and with the phenomena discussed in the introduction. A sample of the misclassified images from the better performing LeNet model can be seen below: A few commonalities can be observed here. For one, most misclassifications come when there are still multiple digits within the cropped and centered source image. Thus, we can glean that features are detected from the non-centered images that confuse the CNN, leading to a misclassification. Besides that, misclassified images continuing a singular digit are frequently blurry to a degree that even a human would have trouble identifying. The takeaway here is that there is likely an upper bound on the possible accuracy for the dataset as a whole.

Contact Details

Kevin Tang ([email protected])
Daniel O'Brien ([email protected])
Isha Shah ([email protected])
Shayan Saadat ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

streetview_cnn

Detailed Project Description

Project Layout

Running the Project

Results & Analysis

Contact Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
models		models
results		results
.gitignore		.gitignore
Pipfile		Pipfile
Project.ipynb		Project.ipynb
README.md		README.md

kevintang513/streetview_cnn

Folders and files

Latest commit

History

Repository files navigation

streetview_cnn

Detailed Project Description

Project Layout

Running the Project

Results & Analysis

Contact Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages