Skip to content

Latest commit

 

History

History
117 lines (67 loc) · 8.75 KB

README.md

File metadata and controls

117 lines (67 loc) · 8.75 KB

Surgical Tool Localization

Surgical Tool Classification, Detection, Localization to assess surgical performance and efficiency

Purpose of ML App

Outline

Software Dependencies

  • Nvidia DALI Pipeline
  • TensorFlow-GPU 2.7.3 or PyTorch 1.1
  • Unity 2021 LTS

Setup Software Dev Environment

How to Run Demo

ML Pipeline Approaches

Approach 1

Methodology

From research paper [1A], they designed a lightweight attention-guided CNN that inherits the advantages of the single- and two-stage detection methods and works more accurately and efficiently than RefineDet.

Their proposed approach performed the STD via a coarse detection module (CDM) and a refined detection module (RDM). Their method achieved end-to-end training by using multi-task loss function. Their distance interfsection-over-union non-maximum suppression (DIoU-NMS) was proposed to post-process the tools detection results. Refer to the figure 1 diagram in the paper [3A]. Their CDM is powered by a modified VGG-16 and it does binary classification to decide whether the anchor is a tool or background filtering out a large number of negative anchors (cases where there isnt a tool in the frame). RDM consists of multiple conv layers and SENet to generate accurate locations and classification scors of surgical tools.

Approach 2

For this custom approach, there are 2 DNNs that come from research paper [2A].

  1. Heatmap Hourglass CNN Network output the heatmaps which represent the location of the instruments tip area

  2. Bounding-box regression network is a modified version of VGG-16 originally used for image classification.

They compared their model against Faster RCNN, YOLOv3 and Retinanet with a Non-maximum suppression (NMS) with a threshold of 0.5

Their DNN Networks: mAP1 = 91.60 %, mAP2 = 100.00 %, Detection Time (per frame)s = 0.023

YOLOv3 (Darknet-53): mAP1 = 90.92 %, mAP2 = 99.07 %, Detection time (per frame)s = 0.034

  • Refer to network diagrams in research paper [2A].

Approach 3

  1. Train a surgical tool detector on image-level labels and learn the whole region boundaries of the surgical tools in laparoscopic videos.

  2. Employ a Convolutional LSTM (ConvLSTM) to learn the spatio-temporal coherence across the surgical video frames for tool localization and tracking.

Methodology

From research paper [3A], there models are built on ResNet-18.

Their Detector is their own custom FCN Baseline with ResNet-18, 7-channel Lh-map with spatial pooling outputting 7 classes for 7 surgical tools.

Their Tracker is their own custom ConvLSTM and it leverages the separation of the tool type in the 7-channel Lh-map from the FCN Detector to build a baseline model for tool tracking.

IMPORTANT: Their ConvLSTM tracker model trained on videos at 1fps can generalize to unlabelled videos at 25fps, potentially making it unconstrained by the fps.

  • Refer to network diagrams in research paper [3A].

Approach 4

For this custom approach, there are 2 DNNs that come from research paper [4A].

  • First a Faster RCNN based on VGG-16 network takes a laparascopic surgical video. Then Region Proposal Network (RPN) shares convolutional feattures with object detection networks. For each input image, the RPN generates region proposals to contain an object and features pooled over these regions before being passed to a final classification and bounding box refinement network. The output is spatial bounding box positions of detected surgical tools in the video frame.

  • Refer to network diagrams in research paper [4A].

Dataset Links

  • Download Surgical Tool Localization in Endoscopic Videos 109GB Dataset of 24K Video Clips from Endoscopic Vision Challenge 2022

  • Download 2016 M2CAI Tool Presence Detection Challenge which includes a m2cai16-tool dataset

    • m2cai16-tool consists of 15 videos total recorded at 25 fps of cholecystectomy procedures performed at the University Hospital of Strasbourg France where each video is labeled with binary annotations indicating tool presence
    • From m2cai-tool dataset, first 10 videos are used for training RCNN model and 11-15 videos are used for testing that model
    • To the best of Stanford's team knowledge, there was no such dataset that currently exists for real-world laparosopic surgical videos. Thus, from the m2cai16-tool dataset, they created m2cai16-tool-locations with spatial annotations of the tools.
    • For their m2cai16-tool dataset, they label 2532 of the frames under supervision and spot-checking from a surgeon, with the coordinates of spatial bounding boxes around the tools. They usee 50%, 30% and 20% for training, validation and test spliits. The 2532 frames were selected from among the 23,000 total frames.
      • Again these 23,000 frames were from videos whose durations range from 20 to 75 minutes downsampled to 1 fps for processing and labeled with binary annotations indicating presence or absence of seven surgical tools: grasper, bipolar, hook, scissors, clip applier, irrigator and specimen bag.
  • Download JIGSAWS: The JHU-ISI Gesture and Skill Assessment Working Set. This dataset is used with Deep Learning Models for Evaluating, Identifying Surgical Actions and Measuring Performance as noted in research paper [6A].

    • More info on the dataset can be found in research paper [7A].

Research Publication Links

AI/DL Perspective

UI/UX Perspective

  • [1B] Gasques, D., Johnson, J.G., Sharkey, T., Feng, Y., Wang, R., Xu, Z.R., Zavala, E., Zhang, Y., Xie, W., Zhang, X., Davis, K.L., Yip, M.C., & Weibel, N. (2021). ARTEMIS: A Collaborative Mixed-Reality System for Immersive Surgical Telementoring. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.