Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add instance segmentation support #854

Open
dgketchum opened this issue Nov 14, 2019 · 16 comments
Open

add instance segmentation support #854

dgketchum opened this issue Nov 14, 2019 · 16 comments

Comments

@dgketchum
Copy link

dgketchum commented Nov 14, 2019

It would be helpful to have access to the torchvision.models.detection.maskrcnn_resnet50_fpn via raster-vision.

Hi @lewfish

I've been slowly working through the code in my fork, adding modules that mimic the semantic segmentation functionality.

At this point I'm planning on using 2048x2048 NAIP images as the training source, on the RGB. I have 2048x2048 lables where I've rasterized the overlying vector data (agricultural fields), each feature a new 'instance' with an integer instance label, from 1 to number_features, background is 0.

I think within RV I'll split the instance-encoded image into binary masks, one for each feature, and get the bounding box from each as they do in this torchvision tutorial. Then on to prepare the (image, {boxes, labels, masks, id, area}) for the DataLoader in data.build_databunch(), as expected by torchvision.models.detection.maskrcnn_resnet50_fpn.

Hope this seems reasonable; if so I'll just leave a running commentary on what I'v done on this issue thread. Any input from you will be greatly appreciated.

This is by far the most sophisticated project I've touched. Fun to learn though.

@lewfish
Copy link
Contributor

lewfish commented Nov 14, 2019

Sounds good! In the future we might want to generalize this to handle more than one category, but this is a good simplifying assumption to get started. I would try to get each command working in the order in which they run (chip, train, etc). Getting some debug chips made at the beginning of the training step will be a good sanity check.

One slightly tricky issue you need to deal with when working with instance based methods (like object detection and instance segmentation) is how to deal with instances that straddle chip boundaries during prediction. In RV, we use a sliding window to be able to make predictions over large images. For object detection, we use a sliding window with 50% overlap and make the chip size > the instance size. This ensures that each instance is glimpsed in its entirety by some window. We then do a de-duplication step to remove any instances that were predicted by multiple windows.

However, recently, we realized that you can train a model on small chips (eg 200x200) and do a forward pass (during prediction) on large chips (1000x1000+) without any drop in accuracy (sometimes it's actually more accurate due to more context). It might be possible to just make a prediction on whole 2048x2048 images, and then you don't need to worry about this issue.

@dgketchum
Copy link
Author

Thanks for your feedback!

Interesting point about instances straddling the boundaries of the chip, straddles will be common for my problem, as the instances (fields) are large relative to the chip and scene size. Do you have prior knowledge of how big your instances might be and set the chip size accordingly?

At this point I have almost gotten the model to take a training step. I added code to build a (image, target) like tuple(Tensor[C, H, W], {boxes: (FloatTensor[N, 4]), labels: (Int64Tensor[N],), masks (UInt8Tensor[N, H, W])}) in InstanceSegmentationDataset.__getitem__. detection.Mask_RCNN is actually expecting lists of image tensors and target dicts. So in my attempt to conform to the design pattern, I'm calling out = model([x], [target]) in train.train_epoch(). It took me awhile to figure out that DataLoader adds a 4th dimension (size = batch_sz) to the training tensors (including target items), (concatenating permutations?) which is higher dimensional than Mask_RCNN expects. Maybe I can get (image, target) instances back in the correct form using a collate function.

@dgketchum
Copy link
Author

Added the rv collate_fn for DataLoader call, removed the squeeze operation on x, and sending the tensors to device tin train_epoch() and it took a training step!

@dgketchum
Copy link
Author

Hi @lewfish ,
I was able to get the mask rcnn model to take training steps using the data I had built for agricultural fields instance segmentation, though I can't really tell yet if it is learning. Since I have little experience in this area, I thought it would be good to take a step back and set up an experiment to run the COCO dataset through rv and check to make sure it's working, as we have access to a pre-trained model.

This brought up a design question: what level of pre-processing should rv expect for instance segmentation? At this point, for the fields data I have fed it raster labels where the labels were (1, img_sz, img_sz), and each instance had a unique integer assignment > 0. This won't be ideal, because while it is convenient to pass in a single-channel label, and subsequently break it into (nb_features, img_sz, img_sz), encoding multiple instances from multiple classes would get confusing. I imagine eventually it would be ideal to pass rv a vector source and have labels rasterized with instance-aware class labels using rasterio?

With COCO, I have been stacking masks as (nb_features, img_sz_x, imgsz_y), which allows me to create a separate mask for each instance, and label it with the COCO category integer. This just necessitates some modification of the chipping code, where in several places, I sum the masks over axis=2 to identify empty locations, and to save in process_scene_data, etc. Hopefully getting instance segmentation to run on COCO in rv will help me see where I can generalize the code.

Cheers,

@lewfish
Copy link
Contributor

lewfish commented Nov 20, 2019

Do you have prior knowledge of how big your instances might be and set the chip size accordingly?

Roughly yes. I think I've only ever used object detection for buildings and cars.

@lewfish
Copy link
Contributor

lewfish commented Nov 20, 2019

You said you were trying to mimic semantic segmentation functionality, but now that I think about it, the object detection code is probably more relevant to what you're doing, although you probably realize that. Also, this is helpful and not sure if you saw it: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

I imagine eventually it would be ideal to pass rv a vector source and have labels rasterized with instance-aware class labels using rasterio?

Yes, that sounds right.

Hopefully getting instance segmentation to run on COCO in rv will help me see where I can generalize the code.

Getting things working on a non-geospatial well-known dataset is a good idea, and something I've done in the past. Just keep in mind that COCO is a really big dataset though, and it takes a lot of GPU time to train a model, so you might want to try something smaller.

@lewfish
Copy link
Contributor

lewfish commented Nov 20, 2019

Sidenote which may be of interest: since COCO images are large most people use an 8 GPU machine where each GPU has a batch size of 2. But an alternative approach called SNIPER breaks the large images into small chips first, and then can train with a large batch size on a single GPU if desired. It's the first time I've seen this sliding window style approach applied in a non-GIS setting: https://arxiv.org/pdf/1805.09300.pdf

@dgketchum
Copy link
Author

dgketchum commented Nov 21, 2019

Thanks for your thoughts @lewfish !

now that I think about it, the object detection code is probably more relevant to what you're doing

I have spent the majority of my time studying the semantic segmentation-related rv code, it will be necessary for me to study relevant object detection-related rv functionality as well.

not sure if you saw it: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

Yes I've read lots of torchvision tutorials and discussion forums and found them quite useful. Also the multimodal learning implementation of mask rcnn has been informative, as the algorithm is relatively well exposed in pytorch.

I've gotten rv to evaluate on COCO images using the COCO pretrained model. I want to set up the label output so I can visualize it and ensure that the output is sane. Then figure out why I get emtpy evaluations sometimes (advice?), why one of the five losses goes to inf (advice?), then try and transfer the pretrained backbone to my agricultural fields dataset.

COCO images are large most people use an 8 GPU machine where each GPU has a batch size of 2

Only after I convince myself that I can transfer the COCO backbone to my agriculture problem on 3 channels will I worry about training anything with fine-tuning or from scratch, let alone multispectral. I've started discussions with colleagues at NASA who are working on getting me an account with NASA Earth Exchange, which has a 32 GB GPU setup. I also have AWS money on an upcoming grant for 2020. Right now I have a RTX 2080 8GB on my research machine. Hopefully I can tap your expertise when it comes time for real training on a powerful machine.

@ammarsdc
Copy link
Contributor

Hi. Any works update on implementing instance segmentation in rv?

@lewfish
Copy link
Contributor

lewfish commented Nov 23, 2022

Hi. Any works update on implementing instance segmentation in rv?

It's a on a list of potential new features, but I don't think it's likely that it will be implemented soon. What application were you thinking of using it for? It would require a lot of code to be written, but it should be relatively straightforward since it should just require adding various subclasses following the pattern of semantic segmentation and object detection. If you or someone else were interested in taking it on, we would be happy to provide guidance.

@ammarsdc
Copy link
Contributor

ammarsdc commented Nov 30, 2022

What application were you thinking of using it for?

Here we're working on digitising solar panels on ortho.

So we have received panel annotation which they're originally annotated close to each other. Since semantic will read all annotation as one label, we find out that it's not possible to digitise each panel separately. Hence we add buffer between them by updating the annotations to cover only the inner side of the panels. But still challenging for the trained model to distinguish and digitise each panel. Some of the panels is already distinguished but somehow still a lot weren't.

Hence, reading and review the concept of instance segmentation seems possible to solve this since the inference output could be done up to each panel.

If you or someone else were interested in taking it on, we would be happy to provide guidance.

That is a very good offer @lewfish! But, I still have a lot to learn since still new in this ML DL field. But, I am happy to contribute. Could you please guide me?

@MathiasBaumgartinger
Copy link

Similar application as @ammarsdc. I would gladly contribute as well, given some guidance.

@lewfish
Copy link
Contributor

lewfish commented Dec 6, 2022

Here is a rough list of the tasks that would need to be completed to add instance segmentation to RV. I've made some very rough estimates about how many days each task would take assuming you are proficient with Python and PyTorch, have some experience working on large codebases, and things go relatively smoothly. This would be a lot of work! I would start with the first part about reading and visualizing a dataset. That should give you a better idea of the approach to take and how much work the whole thing will take. It would also be a good contribution to RV even if you weren't able to complete the whole thing. Where I've listed the name of a class, the idea would be to extend it for instance segmentation, and I've linked to the corresponding class for semantic segmentation to give you an idea of what that would look like (although object detection would also be relevant). @AdeelH Please chime in if you have anything to add, and feel free to directly edit this list.

@dgketchum
Copy link
Author

dgketchum commented Dec 6, 2022 via email

@ammarsdc
Copy link
Contributor

ammarsdc commented Dec 7, 2022

Thanks @lewfish. What would likely be the best way to share our progress for those checklist?
Glad to hear that @MathiasBaumgartinger!
Also thanks to @dgketchum. Will go through them.

@lewfish
Copy link
Contributor

lewfish commented Dec 7, 2022

Thanks @lewfish. What would likely be the best way to share our progress for those checklist?

You can make a PR with your work in progress code and get feedback if you'd like. There's a draft mode you can put a PR in. Any higher-level conceptual discussion can go in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants