HS2P is an open-source project largely based on CLAM tissue segmentation and patching code.
install requirements via pip3 install -r requirements.txt
- [Optional] Configure wandb
If you want to benefit from wandb logging, you need to follow these simple steps:
- grab your wandb API key under your profile and export
- run the following command in your terminal:
export WANDB_API_KEY=<your_personal_key>
- change wandb paramters in the configuration file under
config/
(setenable
toTrue
)
- Create a .csv file containing paths to the desired slides:
slide_id,slide_path
slide_id_1,path/to/slide_1.tif
slide_id_2,path/to/slide_2.tif
...
You can optionally provide paths to pre-computed segmentation masks under the 'segmentation_mask_path' column
slide_id,slide_path,segmentation_mask_path
slide_id_1,path/to/slide_1.tif,path/to/slide_1_mask.tif
slide_id_2,path/to/slide_2.tif,path/to/slide_2_mask.tif
...
- Create a configuration file under
config/extraction/
A good starting point is to use the default configuration file config/extraction/default.yaml
where parameters are documented.
- Run the following command to kick off the algorithm:
python3 patch_extraction.py --config-file </path/to/config/file>
- Depending on which flags have been set to True, it will produce (part of) the following results:
Patch extraction output
hs2p/
├── output/<experiment_name>/
│ ├── masks/
│ │ ├── slide_id_1.jpg
│ │ ├── slide_id_2.jpg
│ │ └── ...
│ ├── patches/<patch_size>/<format>/
│ │ ├── slide_id_1/
│ │ │ ├── slide_id_1.h5
│ │ │ ├── slide_id_1.npy
│ │ │ └── imgs/
│ │ │ ├── x0_y0.<format>
│ │ │ ├── x1_y0.<format>
│ │ │ └── ...
│ │ ├── slide_id_2/
│ │ └── ...
│ ├── visualization/
│ │ └── <patch_size>/
│ │ ├── slide_id_1.jpg
│ │ ├── slide_id_2.jpg
│ │ └── ...
│ ├── tiles.csv
│ └── process_list.csv
masks/
will contain a downsampled view of the slide with tissue segmentation overlayed
visualization/
will contain a downsampled view of the slide where extracted patches are highlighted
tiles.csv
contain patching information for each slide that ended up having patches extracted:
slide_id,tile_size,spacing,level,level_dim,x,y,contour
slide_id_1,2048,0.5,0,"(10496, 20992)",752,5840,0
...
Extracted patches will be saved as x_y.jpg
where x
and y
represent the true location in the slide at level 0:
- if spacing at level 0 is
0.25
and you extract [256, 256] patches at spacing0.25
, two consecutive patches will be distant from256
pixels (either alongx
ory
axis) - if spacing at level 0 is
0.25
and you extract [256, 256] patches at spacing0.5
, two consecutive patches will be distant from512
pixels (either alongx
ory
axis)
- [Optional] Configure wandb
see above
- Create a .csv file containing paths to the desired slides & associated annotation masks:
slide_id,slide_path,annotation_mask_path
slide_id_1,path/to/slide_1.tif,path/to/slide_1_annot_mask.tif
slide_id_2,path/to/slide_2.tif,path/to/slide_2_annot_mask.tif
...
In the same way as for patch extraction, you can optionally provide paths to pre-computed segmentation masks under the 'segmentation_mask_path' column.
- Create a configuration file under
config/sampling/
A good starting point is to use the default configuration file config/sampling/default.yaml
where parameters are documented.
- Run the following command to kick off the algorithm:
python3 patch_sampling.py --config-file </path/to/config/file>
- Depending on your config, it will produce (part of) the following results:
Patch sampling output
hs2p/
├── output/<experiment_name>/
│ ├── annotation_mask/
│ │ ├── slide_id_1.jpg
│ │ ├── slide_id_2.jpg
│ │ └── ...
│ ├── segmentation_mask/
│ │ ├── slide_id_1.jpg
│ │ ├── slide_id_2.jpg
│ │ └── ...
│ ├── patches/
│ │ ├── raw/
│ │ │ ├── category_1/
│ │ │ │ ├── slide_id_1_x0_y0.<format>
│ │ │ │ ├── slide_id_1_x1_y0.<format>
│ │ │ │ └── ...
│ │ │ ├── category_2/
│ │ │ └── ...
│ │ ├── mask/
│ │ │ ├── category_1/
│ │ │ │ ├── slide_id_1_x0_y0_mask.<format>
│ │ │ │ ├── slide_id_1_x1_y0_mask.<format>
│ │ │ │ └── ...
│ │ │ ├── category_2/
│ │ │ └── ...
│ │ └── h5/
│ │ ├── slide_id_1.h5
│ │ ├── slide_id_2.h5
│ │ └── ...
│ ├── visualization/
│ │ ├── slide_id_1.jpg
│ │ ├── slide_id_2.jpg
│ │ └── ...
│ └── sampled_tiles.csv
annotation_mask/
will contain a downsampled view of the slide with corresponding annotation mask overlayed
segmentation_mask/
will contain a downsampled view of the slide with tissue segmentation overlayed
visualization/
will contain a downsampled view of the slide where sampled patches are highlighted
sampled_patches.csv
contain information for each patch that ended up being extracted:
slide_id,category,x,y,pct
slide_id_1,category_1,3488,2512,0.8203125
...
Again, extracted patches will be saved as x_y.jpg
where x
and y
represent the true location in the slide at level 0.
If the generated visualization are noisy, you'll need to change libpixman
version. Running the following command should fix this issue:
wget https://www.cairographics.org/releases/pixman-0.40.0.tar.gz
tar -xf pixman-0.40.0.tar.gz
cd pixman-0.40.0
./configure
make
make install
export LD_PRELOAD=/usr/local/lib/libpixman-1.so.0.40.0