colon crypt segmentation

2 - Image Segmentation Task

Deep Learning Framework used : Pytorch

This repository contains pytorch codes to do colon crypt segmentation.

Repository struture

models/
overlays/
Datapreparation.ipynb
Inference.ipynb
Mask_from_RLE.ipynb
Training.ipynb
Viz.ipynb
utils.py
patch_extractor.py 
requirements.txt
submission.csv
.>
.>
.>

Data Description:

Dataset - https://drive.google.com/file/d/1RHQjRav-Kw1CWT30iLJMDQ2hPMyIwAUi/view?usp=sharing

After downloading the data and extracting the files in the above directory structure, it has the following folders.

Colonic_crypt_dataset

     .  train/
     .  train_mask/
     .  test/
     .  test_mask/
     .  train.csv
     .  test.csv
     .  colon-dataset_information.csv

train/ contain training H & E images and the annotation jsons

test/ contain H & E images and the annotation jsons

train_mask/ contain training data masks

test_mask/ contain test data masks

train.csv contain training image ids and their corresponding Run-Length encoding annotations.

test,csv contain test image ids and their corresponding Run-Length encoding annotations.

Data Preprocessing

Mask Generation
- The dataset already has masks generated for training and test sets. But if required, use Mask_from_RLE.ipynb notebook to generate masks from RLE using the train and test csv files.
Patch Generation
- The image shape is (4536, 4704,3). So, generated patches of shape (512,512,3) for training.
- For patch generation, used window size of (512,512) and stride of (256,256). So there was slight overlap. Used "valid" operation for patch generation. Also, code to ignore patches with black rectangular artifacts found on our training and testing WSIs is implemented.
- For patch generation, Datapreparation.ipynb notebook is used. This notebook creates data/images and data/masks directories to save the image patches and their corresponding masks. The notebook also returns a csv train_data.csv that has the following three columns.
  - Train_image_path
  - Train_mask_path
  - Class (1 if mask contain crypt annotation,else 0)
- We use the csv generated here as an input to our pytorch Dataset to access images and masks.

Training

Dataset
- We use the train_data.csv previously generated to access the training data.
- Applied the following transforms on the training data (in utils.py):
```
A.Compose([
   A.HorizontalFlip(),
   A.VerticalFlip(),
   A.RandomRotate90(),
   A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.2, rotate_limit=15, p=0.9,
                    border_mode=cv2.BORDER_REFLECT),
   A.OneOf([
       A.ElasticTransform(p=.3),
       A.GaussianBlur(p=.3),
       A.GaussNoise(p=.3),
       A.OpticalDistortion(p=0.3),
       A.GridDistortion(p=.1),
       A.PiecewiseAffine(p=0.3),
   ], p=0.3),
   A.OneOf([
       A.HueSaturationValue(15,25,0),
       A.CLAHE(clip_limit=2),
       A.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3),
   ], p=0.3),
```
- I trained segmentation models using Linknet, Unet, UnetPlusPlus and Multi-Attention network architectures with different backbone encoders like efficientnet-b2, efficientnet-b3, resnet50 etc.
- Unet and UnetPlusPlus architectures performed the best, with Unet - efficientnetb2 fetching me the best dice score.
- Trained the model for 5 folds.
- Used cross-entropy loss for training, with Ranger optimizer (RAdam + Lookahead). For Ranger optimizer please install
```
pip3 install torch-optimizer 
```
- Implemented early stopping with patience=5 monitoring the validation loss.
- The training metrics and losses per each epoch is stored as csv file in models/ folder.

Inference

The inference script (inference.ipynb) performs inference on the test data and generates submission csv. It has a lot of helper functions for predicting masks at patch level and stictching back the predicted mask patches to the full mask, for calculating the dice score, and generating the submission csv.
This notebook generates the following csvs:
- submission.csv (contains test data image ids and RLE of the respective predicted mask)
- dice_report_submission.csv (Contains test set dice scores)
- dice_report_train_predictions.csv (contains train set dice scores)
Test data scores

                  id	                dice
1	CL_HandE_1234_B004_bottomleft	0.922929778972226
2	HandE_B005_CL_b_RGB_bottomleft	0.7634746349031302
3	      Average	                0.8432022069376781

Train data dice score

	          id	                           dice
1	CL_HandE_1234_B004_bottomright	          0.936980592055458
2	CL_HandE_1234_B004_topleft	          0.9195726177015423
3	CL_HandE_1234_B004_topright  	          0.9216310830724985
4	HandE_B005_CL_b_RGB_bottomright	          0.8983568682591906
5	HandE_B005_CL_b_RGB_topleft	          0.9104829978846856
6	   Average	                          0.9174048317946749

** The inference is done by predicting over all the 5 models we trained and then taking the average pixelwise probabilities.

Predictions

The inference.ipynb has a code to overlay predictions on the original patch. Below are the sample outputs.

Incorrectly segmented outputs

The below images are some examples where the model didn't perform well.

I think the model didn't perform well on the above example because, majority of the crypts in training examples have darker boundary texture and the inside is more often than not lighter. In the example above, the non segmented crypt doesn't seem to have a definite dark boundary.

Here is another example where the model didn't perform well.

I think the number of crypts examples in traning data is lower than the background class. Increasing the crypt data will help improve the segmentation results.

Visualizations

notebook - viz.ipynb

Feature representation generation
- Took the segmentation model we trained above (with weights), and set it to return the output feature maps of the last convolutional block of the encoder.
- Passed the patches(train and test) through the network, generated their respective feature maps, and flattened them to one dimensional vectors.
- Used these feature representations of the patches for visualization
- Due to size limit on github, couldn't upload the feature_representation.pkl file. Please find it here - https://drive.google.com/file/d/1RcX8Pvi6rEx3J-RNjKbOEUFbkkToo-na/view?usp=sharing

PCA

As we can see, the training data and test data are very much in the same space.

Here

class 1 represents the feature representation of a patch which has atleast one crypt.
class 0 represents the feature representation of a patch which has not crypts in it at all

The plots above represent the overlap between crypt/ non crypt class datapoints. The overlap between the classes is because our patches with crypts also have significant backgroud (non crypt) class information.

TSNE

The training and test set datapoints overlap

Here

class 1 represents the feature representation of a patch which has atleast one crypt.
class 0 represents the feature representation of a patch which has not crypts in it at all

The plots above shows the overlap between crypt/ non crypt class datapoints.

UMAP

Here

class 1 represents the feature representation of a patch which has atleast one crypt.
class 0 represents the feature representation of a patch which has not crypts in it at all

The plots above shows the overlap between crypt/ non crypt class datapoints. The overlap between the classes is again because our patches with crypts also have significant backgroud (non crypt) class information.

MDS

The training and test datapoints are overlapping.

Here

class 1 represents the feature representation of a patch which has atleast one crypt.
class 0 represents the feature representation of a patch which has not crypts in it at all

The plots above represent the overlap between crypt/ non crypt class datapoints. The overlap between the classes is because our patches with crypts also have significant backgroud (non crypt) class information.

Summary of Deep learning for cellular image analysis

file - Summary_Deep learning for cellular image analysis _Erick Moen et.al.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

colon crypt segmentation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
models		models
overlays		overlays
utility notebooks		utility notebooks
viz_overlays		viz_overlays
.gitignore		.gitignore
Datapreparation.ipynb		Datapreparation.ipynb
Inference.ipynb		Inference.ipynb
Mask_from_RLE.ipynb		Mask_from_RLE.ipynb
README.md		README.md
Summary_Deep learning for cellular image analysis _Erick Moen et.al.pdf		Summary_Deep learning for cellular image analysis _Erick Moen et.al.pdf
Training.ipynb		Training.ipynb
Viz.ipynb		Viz.ipynb
dice_report_submission.csv		dice_report_submission.csv
dice_report_train_predictions.csv		dice_report_train_predictions.csv
patch_extractor.py		patch_extractor.py
report_4_UNET.csv		report_4_UNET.csv
requirements.txt		requirements.txt
submission.csv		submission.csv
train_data.csv		train_data.csv
train_predictions.csv		train_predictions.csv
utils.py		utils.py

j-sripad/colon_crypt_segmentation

Folders and files

Latest commit

History

Repository files navigation

colon crypt segmentation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages