Name		Name	Last commit message	Last commit date
parent directory ..
misc		misc
1_patchify_atlas.py		1_patchify_atlas.py
2_segment_lung.py		2_segment_lung.py
3_registration.py		3_registration.py
4_patchify_images.py		4_patchify_images.py
5_group_patch.py		5_group_patch.py
README.md		README.md

README.md

Preprocess your data

The data preprocessing pipeline consists of five steps, listed as following:

Step 1: Patchifying the atlas image & save the anatomical landmark locations

python 1_patchify_atlas.py --atlas_image path_to/atlas_image.nii.gz 
                           --atlas_roi_mask path_to/atlas_lung_mask.nii.gz
                           --output_dir ./patch_data_32_6_reg --patch_size 32 --step_size 26

The path_to/atlas_roi_mask.nii.gz is the ROI mask for the atlas image, we use lungmask to segment lung region as ROI. The script will print the number of patch for each subject, which will be used in step 4.

The atlas image we used for COPDGene (lung) dataset is available here, and the output landmark location for lung CT is available here.

Step 2: Lung segmentation

python 2_segment_lung.py --input_csv ./dataset.csv

The dataset.csv should at least contains two columns: sid and image, the sid column contains unique ID of subjects and the image column contains path to images of each subject.

Step 3: Registration

python 3_registration.py --atlas_image ./misc/atlas_lung_mask.nii.gz \
                         --input_csv ./dataset.csv

We use registration on the lung mask for faster convergence and more robust performance. This is the most time-consuming step, it takes 7 min per sample.

Step 4: Mapping landmarks and patchifying

python ./src/preprocess/4_patchify_images.py --atlas_image ./misc/atlas_lung_mask.nii.gz \
                            --atlas_patch_loc ./misc/atlas_patch_loc.npy \
                            --lowerThreshold -1024 --upperThreshold 240 \
                            --input_csv ./dataset.csv \
                            --output_dir ./results/processed_patch \
                            --num_processor 4 \
                            --patch_size 32 \
                            --step_size 26

The atlas_patch_loc.npy is the output patch location file from step 1.

Step 5: Grouping patches (for pre-training only)

python 5_group_patch.py --num_patch 581
                        --batch_size 48
                        --num_jobs 28
                        --root_dir ./results/processed_patch/

The step is used to reduce IO demand and accelerate the training process.

After the five steps, the preprocessed dataset folder ./results/processed_patch/ can be used for pre-training the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocess

preprocess

README.md

Preprocess your data

Step 1: Patchifying the atlas image & save the anatomical landmark locations

Step 2: Lung segmentation

Step 3: Registration

Step 4: Mapping landmarks and patchifying

Step 5: Grouping patches (for pre-training only)

Files

preprocess

Directory actions

More options

Directory actions

More options

Latest commit

History

preprocess

Folders and files

parent directory

README.md

Preprocess your data

Step 1: Patchifying the atlas image & save the anatomical landmark locations

Step 2: Lung segmentation

Step 3: Registration

Step 4: Mapping landmarks and patchifying

Step 5: Grouping patches (for pre-training only)