Loading…

Surgical gesture classification from video and kinematic data

[Display omitted] •Surgical gesture recognition from video and kinematic data.•Explanation of linear dynamical systems (LDSs) and their metrics.•Extensive explanation of the bag of features (BoF) framework and its variations.•Combination of LDS and BoF via Multiple Kernel Learning (MKL).•Combination...

Full description

Saved in:
Bibliographic Details
Published in:Medical image analysis 2013-10, Vol.17 (7), p.732-745
Main Authors: Zappella, Luca, Béjar, Benjamín, Hager, Gregory, Vidal, René
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •Surgical gesture recognition from video and kinematic data.•Explanation of linear dynamical systems (LDSs) and their metrics.•Extensive explanation of the bag of features (BoF) framework and its variations.•Combination of LDS and BoF via Multiple Kernel Learning (MKL).•Combination of heterogeneous data (video and kinematic).•Recognition rates that outperform state of the art rates based on kinematic data. Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on dynamic cues (e.g., time to completion, speed, forces, torque) or kinematic data (e.g., robot trajectories and velocities). While videos could be equally or more discriminative (e.g., videos contain semantic information not present in kinematic data), they are typically not used because of the difficulties associated with automatic video interpretation. In this paper, we propose several methods for automatic surgical gesture classification from video data. We assume that the video of a surgical task (e.g., suturing) has been segmented into video clips corresponding to a single gesture (e.g., grabbing the needle, passing the needle) and propose three methods to classify the gesture of each video clip. In the first one, we model each video clip as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words, and use a bag-of-features (BoF) approach to classify new video clips. In the third one, we use multiple kernel learning (MKL) to combine the LDS and BoF approaches. Since the LDS approach is also applicable to kinematic data, we also use MKL to combine both types of data in order to exploit their complementarity. Our experiments on a typical surgical training setup show that methods based on video data perform equally well, if not better, than state-of-the-art approaches based on kinematic data. In turn, the combination of both kinematic and video data outperforms any other algorithm based on one type of data alone.
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2013.04.007