Relevance Detection in Cataract Surgery Videos

Our paper has been accepted at ICPR 2020.

Title: Relevance Detection in Cataract Surgery Videos by
Spatio-Temporal Action Localization

Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Klaus Schoeffmann

In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To op- timize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval, these results can be further used for skill assessment and irregularity detection in cataract surgery videos. In this paper, a three-module framework is proposed to detect and classify the relevant phase segments in cataract videos. Taking advantage of an idle frame recognition network, the video is divided into idle and action segments. To boost the performance in relevance detection Mask R-CNN is utilized to detect the cornea in each frame where the relevant surgical actions are conducted. The spatio-temporal localized segments containing higher-resolution information about the pupil texture and actions, and complementary temporal information from the same phase are fed into the relevance detection module. This module consists of four parallel recurrent CNNs being responsible to detect four relevant phases that have been defined with medical experts. The results will then be integrated to classify the action phases as irrelevant or one of four relevant phases. Experimental results reveal that the proposed approach outperforms static CNNs and different configurations of feature-based and end-to- end recurrent networks.

Relevance-based Compression of Cataract Surgery Videos

Our recent work on relevance-based compression of cataract surgery videos has been accepted as a full paper at ACM Multimedia 2020.

Title: Relevance-based Compression of Cataract Surgery Videos Using Convolutional Neural Networks

Authors: Negin Ghamsarian, Hadi Amirpour, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann

Abstract: Recorded cataract surgery videos play a prominent role in training and investigating the surgery, and enhancing the surgical outcomes. Due to storage limitations in hospitals, however, the recorded cataract surgeries are deleted after a short time and this precious source of information cannot be fully utilized. Lowering the quality to reduce the required storage space is not advisable since the degraded visual quality results in the loss of relevant information that limits the usage of these videos. To address this problem, we propose a relevance-based compression technique consisting of two modules: (i) relevance detection, which uses neural networks for semantic segmentation and classification of the videos to detect relevant spatio-temporal information, and (ii) content-adaptive compression, which restricts the amount of distortion applied to the relevant content while allocating less bitrate to irrelevant content. The proposed relevance-based compression framework is implemented considering five scenarios based on the definition of relevant information from the target audience’s perspective. Experimental results demonstrate the capability of the proposed approach in relevance detection. We further show that the proposed approach can achieve high compression efficiency by abstracting substantial redundant information while retaining the high quality of the relevant content.

Keywords: Video Coding, Convolutional Neural Networks, HEVC, ROI Detection, Medical Multimedia.

Pupil Segmentation in Cataract Videos

Our workshop paper on iris and pupil segmentation in cataract surgery videos has been accepted for presentation at the ISBI 2020 conference.

Title: Pixel-Based Iris and Pupil Segmentation in Cataract Surgery Videos Using Mask R-CNN

Authors: Natalia Sokolova, Mario Taschwer, Stephanie Sarny, Doris Putzgruber-Adamitsch,  Klaus Schoeffmann

Abstract: Automatically detecting clinically relevant events in surgery video recordings is becoming increasingly important for documentary, educational, and scientific purposes in the medical domain. From a medical image analysis perspective, such events need to be treated individually and associated with specific visible objects or regions. In the field of cataract surgery (lens replacement in the human eye), pupil reaction (dilation or restriction) during surgery may lead to complications and hence represents a clinically relevant event. Its detection requires automatic segmentation and measurement of pupil and iris in recorded video frames. In this work, we contribute to research on pupil and iris segmentation methods by (1) providing a dataset of 82 annotated images for training and evaluating suitable machine learning algorithms, and (2) applying the Mask R-CNN algorithm to this problem, which – in contrast to existing techniques for pupil segmentation – predicts free-form pixel-accurate segmentation masks for iris and pupil. The proposed approach achieves consistent high segmentation accuracies on several metrics while delivering an acceptable prediction efficiency, establishing a promising basis for further segmentation and event detection approaches on eye surgery videos.

Tool Segmentation in Cataract Videos

Our conference paper on instrument segmentation in cataract surgery videos has been accepted for presentation at the CBMS 2020 conference.

Title: Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN

Authors: Markus Fox, Mario Taschwer, and Klaus Schoeffmann

Abstract: Automatically detecting surgical tools in recorded surgery videos is an important building block of further content-based video analysis. In ophthalmology, the results of such methods can support training and teaching of operation techniques and enable investigation of medical research questions on a dataset of recorded surgery videos. Our work applies a recent deep-learning segmentation method (Mask R-CNN) to localize and segment surgical tools used in ophthalmic cataract surgery. We add ground-truth annotations for multi-class instance segmentation to two existing datasets of cataract surgery videos and make resulting datasets publicly available for research purposes. In the absence of comparable results from literature, we tune and evaluate Mask R-CNN on these datasets for instrument segmentation/localization and achieve promising results (61% mean average precision on 50% intersection over union for instance segmentation, working even better for bounding box detection or binary segmentation), establishing a reasonable baseline for further research. Moreover, we experiment with common data augmentation techniques and analyze the achieved segmentation performance with respect to each class (instrument), providing evidence for future improvements of this approach.

Relevance-based Exploration of Cataract Videos

The doctoral symposium paper of Negin on ‘Relevance-based Exploration of Cataract Videos‘ has been accepted for publication at the ACM International Conference on Multimedia Retrieval (ICMR 2020).

Title: Enabling Relevance-Based Exploration of Cataract Videos

Author: Negin Ghamsarian

Abstract: Training new surgeons as one of the major duties of experienced expert surgeons demands a considerable supervisory investment of them. To expedite the training process and subsequently reduce the extra workload on their tight schedule, surgeons are seeking a surgical video retrieval system. Automatic workflow analysis approaches can optimize the training procedure by indexing the surgical video segments to be used for online video exploration. The aim of the doctoral project described in this paper is to provide the basis for a cataract video exploration system, that is able to (i) automatically analyze and extract the relevant segments of videos from cataract surgery, and (ii) provide interactive exploration means for browsing archives of cataract surgery videos. In particular, we apply deep-learning-based classification and segmentation approaches to cataract surgery videos to enable automatic phase and action recognition and similarity detection.

Deblurring Cataract Surgery Videos

Our recent work on deblurring surgery videos has been accepted for publication at the ISBI 2020 conference.

Title: Deblurring Cataract Surgery Videos Using a Multi-Scale Deconvolutional Neural Network

Authors: Negin Ghamsarian, Klaus Schoeffmann, Mario Taschwer

Abstract: A common quality impairment observed in surgery videos is blur, caused by object motion or a defocused camera. Degraded image quality hampers the progress of machine-learning-based approaches in learning and recognizing semantic information in surgical video frames like instruments, phases, and surgical actions. This problem can be mitigated by automatically deblurring video frames as a preprocessing method for any subsequent video analysis task. In this paper, we propose and evaluate a multi-scale deconvolutional neural network to deblur cataract surgery videos. Experimental results confirm the effectiveness of the proposed approach in terms of the visual quality of frames as well as PSNR improvement.

MMM’20: Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos

Our paper has been accepted for publication at the MMM 2020 Conference on Multimedia Modeling. Authors: Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer, Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi Abstract: In the field of ophthalmic surgery, many clinicians nowadays record their microscopic procedures with a video camera and use the recorded footage for later purpose, such as forensics, teaching, or training. However, in order to efficiently use the video material after surgery, the video content needs to be analyzed automatically. Important semantic content to be analyzed and indexed in these short videos are operation instruments, since they provide an indication of the corresponding operation phase and surgical action. Related work has already shown that it is possible to accurately detect instruments in cataract surgery videos. However, their underlying dataset (from the CATARACTS challenge) has very good visual quality, which is not reflecting the typical quality of videos acquired in general hospitals. In this paper, we therefore analyze the generalization performance of deep learning models for instrument recognition in terms of dataset change. More precisely, we trained such models as ResNet-50, Inception v3 and NASNet Mobile using a dataset of high visual quality (CATARACT) and test it on another dataset with low visual quality (Cataract-101), and vice versa. Our results show that the generalizability is rather low in general, but clearly worse for the model trained on the high-quality dataset. Another important observation is the fact that the trained models are able to detect similar instruments in the other dataset even if their appearance is different.



Second PhD student joined the project

In February 2019,  a second PhD student (Negin Ghamsarian) joined the OVID project. She will work on surgical workflow analysis and video preprocessing.

Since a third PhD student withdrew her application in a late stage of the hiring process, we decided to conduct the project with only two PhD students for now.

OVID project started on October 1, 2018

The OVID project officially started on October 1, 2018. The research project is supported by the Austrian Science Fund FWF and is scheduled to run for three years.

The FWF grant supports employment of three PhD students (for 3 years each) and one student assistant (for 15 months). Currently, one PhD student (Natalia Sokolova) and the student assistant (Markus Fox) have taken up their work. The other two PhD students will follow within the next few months.