Loading…

Classifying cancer pathology reports with hierarchical self-attention networks

•HiSANs are a neural architecture designed for classifying cancer pathology reports.•HiSANs achieve better accuracy and macro F-score than existing classifiers.•HiSANs are an order of magnitude faster than the previous state-of-the-art, HANs.•HiSANs allow easy visualization of its decision-making pr...

Full description

Saved in:

Bibliographic Details
Published in:	Artificial intelligence in medicine 2019-11, Vol.101, p.101726-101726, Article 101726
Main Authors:	Gao, Shang, Qiu, John X., Alawad, Mohammed, Hinkle, Jacob D., Schaefferkoetter, Noah, Yoon, Hong-Jun, Christian, Blair, Fearn, Paul A., Penberthy, Lynne, Wu, Xiao-Cheng, Coyle, Linda, Tourassi, Georgia, Ramanathan, Arvind
Format:	Article
Language:	English
Subjects:	60 APPLIED LIFE SCIENCES Cancer pathology reports Clinical reports Deep learning MATHEMATICS AND COMPUTING Natural language processing Text classification
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•HiSANs are a neural architecture designed for classifying cancer pathology reports.•HiSANs achieve better accuracy and macro F-score than existing classifiers.•HiSANs are an order of magnitude faster than the previous state-of-the-art, HANs.•HiSANs allow easy visualization of its decision-making process. We introduce a deep learning architecture, hierarchical self-attention networks (HiSANs), designed for classifying pathology reports and show how its unique architecture leads to a new state-of-the-art in accuracy, faster training, and clear interpretability. We evaluate performance on a corpus of 374,899 pathology reports obtained from the National Cancer Institute's (NCI) Surveillance, Epidemiology, and End Results (SEER) program. Each pathology report is associated with five clinical classification tasks – site, laterality, behavior, histology, and grade. We compare the performance of the HiSAN against other machine learning and deep learning approaches commonly used on medical text data – Naive Bayes, logistic regression, convolutional neural networks, and hierarchical attention networks (the previous state-of-the-art). We show that HiSANs are superior to other machine learning and deep learning text classifiers in both accuracy and macro F-score across all five classification tasks. Compared to the previous state-of-the-art, hierarchical attention networks, HiSANs not only are an order of magnitude faster to train, but also achieve about 1% better relative accuracy and 5% better relative macro F-score.
ISSN:	0933-3657 1873-2860
DOI:	10.1016/j.artmed.2019.101726