Loading…

VisionCervix: Papanicolaou cervical smears classification using novel CNN-Vision ensemble approach

•Manual diagnosis of cervical cell is challenging.•Vision Transformer based method has been proposed for Pap smear cervical cell image classification.•The proposed method has been compared with another LSTM-based method.•The proposed method achieved state-of-the-art classification accuracy for the p...

Full description

Saved in:
Bibliographic Details
Published in:Biomedical signal processing and control 2023-01, Vol.79, p.104156, Article 104156
Main Authors: Maurya, Ritesh, Nath Pandey, Nageshwar, Kishore Dutta, Malay
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Manual diagnosis of cervical cell is challenging.•Vision Transformer based method has been proposed for Pap smear cervical cell image classification.•The proposed method has been compared with another LSTM-based method.•The proposed method achieved state-of-the-art classification accuracy for the present classification task. About half a million women in the world are affected by cervical cancer and about 0.3 million deaths occur per year due to cervical cancer. Cytologists perform Pap-smear tests to screen the Pap Smear images of the cervical cells. This manual screening is prone also to error. Therefore, an automated computer-aided detection systems have been proposed for the classification of cervical cancer cell images. In the proposed work, an ensemble of Vision Transformer network (ViT) and convolution neural network (CNN) has been proposed for the classification of cervical cell Pap smear images. ViT has been known for its minimal inductive bias and its competitive classification performance in comparison to the state-of-the-art convolution neural network. Fine-tuning large ViT network is a computationally intensive procedure; therefore, as an alternative to ViT-CNN approach, another transfer learning-based approach has also been proposed in which the features extracted from the pre-trained CNNs are combined and classified with the resource-efficient Long Short Term Memory (LSTM) network. Comparison between both the approaches has been made on the basis of their classification performance, test time, generalization ability and attention maps. Experimental results show that the ViT-CNN ensemble approach achieved 97.65% classification accuracy whereas the LSTM-based approach achieved 95.80% classification accuracy. ViT-CNN ensemble approach achieves better classification accuracy at the cost of the huge demand for computation since it takes more computational resources in terms of high amount of random access memory (RAM) in the graphical processing unit (GPU); whereas, the CNN-LSTM approach is less accurate and computationally cheaper.
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2022.104156