Loading…
VisionCervix: Papanicolaou cervical smears classification using novel CNN-Vision ensemble approach
•Manual diagnosis of cervical cell is challenging.•Vision Transformer based method has been proposed for Pap smear cervical cell image classification.•The proposed method has been compared with another LSTM-based method.•The proposed method achieved state-of-the-art classification accuracy for the p...
Saved in:
Published in: | Biomedical signal processing and control 2023-01, Vol.79, p.104156, Article 104156 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Manual diagnosis of cervical cell is challenging.•Vision Transformer based method has been proposed for Pap smear cervical cell image classification.•The proposed method has been compared with another LSTM-based method.•The proposed method achieved state-of-the-art classification accuracy for the present classification task.
About half a million women in the world are affected by cervical cancer and about 0.3 million deaths occur per year due to cervical cancer. Cytologists perform Pap-smear tests to screen the Pap Smear images of the cervical cells. This manual screening is prone also to error. Therefore, an automated computer-aided detection systems have been proposed for the classification of cervical cancer cell images. In the proposed work, an ensemble of Vision Transformer network (ViT) and convolution neural network (CNN) has been proposed for the classification of cervical cell Pap smear images. ViT has been known for its minimal inductive bias and its competitive classification performance in comparison to the state-of-the-art convolution neural network. Fine-tuning large ViT network is a computationally intensive procedure; therefore, as an alternative to ViT-CNN approach, another transfer learning-based approach has also been proposed in which the features extracted from the pre-trained CNNs are combined and classified with the resource-efficient Long Short Term Memory (LSTM) network. Comparison between both the approaches has been made on the basis of their classification performance, test time, generalization ability and attention maps. Experimental results show that the ViT-CNN ensemble approach achieved 97.65% classification accuracy whereas the LSTM-based approach achieved 95.80% classification accuracy. ViT-CNN ensemble approach achieves better classification accuracy at the cost of the huge demand for computation since it takes more computational resources in terms of high amount of random access memory (RAM) in the graphical processing unit (GPU); whereas, the CNN-LSTM approach is less accurate and computationally cheaper. |
---|---|
ISSN: | 1746-8094 1746-8108 |
DOI: | 10.1016/j.bspc.2022.104156 |