Munge

Module for DICOM data preprocessing and loading

General Setup

Install package dependencies

$ conda install --yes --file requirements.txt

config.json

Download the data and put it in a directory and make changes to config.json accordingly. Currently I have my data in the data (git ignored) folder and hence the content of the file is:

{
    "dicom_path_template": "data/dicoms/{}/{}.dcm",
    "icontour_dir_template": "data/contourfiles/{}/i-contours/",
    "ocontour_dir_template": "data/contourfiles/{}/o-contours/",
    "link_file_path": "data/link.csv"
}

Jupyter Notebook

The usage of the package is easily illustrated in this Jupyter Notebook. Detailed explanation is also given here in Phase 1 Solutions.

Part 1: Parse the o-contours

Modified Dataset class get_all and get_by_study methods to get the o-contour for each image if it exists
Modified DataElement class to accept the optional argument of o-contour while constructing
Added overlay_contours method to DataElement to view both contours at the same time
Did not make any change to DataLoader class, assuming that the primary label for the ML training pipeline will be the i-contours

Part 2: Heuristic LV Segmentation approaches

This jupyter notebook has all the code samples that were used to generate the below figures and to do the operations described below.

Analysis of the O-Contour Histogram

Looking at the above histogram of the o-contour of the image (SCD0000101/59.dcm) we can see that it is a bi-modal distribution with two peaks corresponding to low intensity and high intensity regions respectively. Looking at the data, it is obvious that we are interested only in the high intensity region. So we should automatically choose a threshold that will retain only the high intensity pixels.

It is not always the case that the o-contour histogram is bi-modal in nature. Some times the image has low intensity (black), medium intensity (gray) and high intensity (white) pixels thus having a tri-modal nature like the one shown above (SCD0000201/80.dcm). In this case we should choose a threshold that will retain the medium and the high intensity pixels because in most of the cases the i-contour region is comprised of medium and high intensity pixels and the region between o-contour and i-contour mostly has low intensity pixels.

Fitting a Gaussian Mixture Model to the histogram

Looking at the above histograms, it is obvious that they are distributions comprising of a mixture of gaussians. Hence we can fit a Gaussian Mixture Model (GMM) to the histogram to automatically choose the optimum threshold value for that image. This threshold is chosen as,

The average of the gaussians' means in case of a bi-modal distribution (this is the point where the two curves cross - refer figure below)
The average of the first two gaussians' means in case of a tri-modal distribution

The below figures show the histograms along with the gaussians that were used to fit and the chosen threshold value.

Bi-modal GMM Fit

Tri-modal GMM Fit

Finding the i-contour by simple thresholding

Once the threshold has been automatically found out by fitting a GMM model, we do simple thresholding on the o-contour region i.e set all pixels greater than the threshold value to white and all other pixels black. Then we get the co-ordinates of these white pixels which is nothing but the detected i-contour.

Qualitative evaluation - Visualization

Once we have the detected i-contour, we can overlay this with the ground truth i-contour and look at the image to qualitatively assess the performance of the segmentation. One such visualization is shown below.

Quantitative evaluation - Jaccard Coefficient

Jaccard Coefficient or Jaccard Index is a widely used metric to measure the performance of segmentation. Below is the overlay visualization along with the jaccard coefficient metric. For the first image, the metric is 0.908, which shows that it is a very good prediction. For the Second image, the metric is 0.728, which shows that it is not a great prediction and there is room for improvement.

Final comments on simple thresholding

From the above results and discussions, it is evident that a simple thresholding scheme is very much possible to do segmentation but that alone would not suffice when the images have salt and pepper noise and poorly defined boundaries and intensity differences. For example, the performance of the above scheme purely depends on how well separated the region intensities are, which might not be the case always.

Alternate methods

Morphological Operations and Connected Components: In the first image below, we see tiny speckles in the predicted contour and also minor cracks in the boundary of the detected contour region. The tiny speckles can be removed by the morphological operation erosion and the cracks can be filled by the morphological operation dilation.

From the below two images we can see that dilation has improved the jaccard score from 0.728 to 0.826. Here I have dilated using a disk shaped structural element with an arbitrary radius of 3.

Before Dilation:

After Dilation:

After dilation we can further erode the image to get rid of tiny speckles and finally do connected components to find the i-contour region.

Scale Invariant Feature Transform (SIFT): Apart from using raw intensities as heuristic, we can do smarter by detecting local features like edges and disks using SIFT operations. Using SIFT we can also do key point detection and description to localize the i-contour region.

Auto generated documentation using Sphinx

Documentation can be found here.

Future Work

Improve GMM by automatically finding the modality (bi-modal or tri-modal) of the image either by using Bayesian Information Criterion(BIC) or Akaike Information Criterion (AIC) or some other metric.
The threshold is currently chosen by taking average of the two gaussians. This will be incorrect when the two gaussians have different standard deviations. So instead take weighted average to compute the threshold.
Currently dilation is being done using a disk-shaped structural element of arbitrary radius 3. This radius can be decided based on the radius of the annotated o-contour region.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
_source		_source
_static		_static
data		data
docs		docs
htmlcov		htmlcov
munge		munge
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
Thresholding.ipynb		Thresholding.ipynb
Usage.ipynb		Usage.ipynb
__init__.py		__init__.py
config.json		config.json
coverage.svg		coverage.svg
data_loader.log		data_loader.log
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Munge

Module for DICOM data preprocessing and loading

General Setup

Jupyter Notebook

Part 1: Parse the o-contours

Part 2: Heuristic LV Segmentation approaches

Analysis of the O-Contour Histogram

Fitting a Gaussian Mixture Model to the histogram

Finding the i-contour by simple thresholding

Qualitative evaluation - Visualization

Quantitative evaluation - Jaccard Coefficient

Final comments on simple thresholding

Alternate methods

Auto generated documentation using Sphinx

Future Work

About

Releases

Packages

Languages

srivathsapv/dicom-munge

Folders and files

Latest commit

History

Repository files navigation

Munge

Module for DICOM data preprocessing and loading

General Setup

Jupyter Notebook

Part 1: Parse the o-contours

Part 2: Heuristic LV Segmentation approaches

Analysis of the O-Contour Histogram

Fitting a Gaussian Mixture Model to the histogram

Finding the i-contour by simple thresholding

Qualitative evaluation - Visualization

Quantitative evaluation - Jaccard Coefficient

Final comments on simple thresholding

Alternate methods

Auto generated documentation using Sphinx

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages