Loading…

Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy

[Display omitted] •Automatic clinical decision support system for breast cancer malignancy grading.•Different methodologies for segmentation and feature extraction from FNA slides.•An efficient classifier ensemble for imbalanced problems with difficult data.•Ensemble combines boosting with evolution...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing 2016-01, Vol.38, p.714-726
Main Authors: Krawczyk, Bartosz, Galar, Mikel, Jeleń, Łukasz, Herrera, Francisco
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •Automatic clinical decision support system for breast cancer malignancy grading.•Different methodologies for segmentation and feature extraction from FNA slides.•An efficient classifier ensemble for imbalanced problems with difficult data.•Ensemble combines boosting with evolutionary undersampling.•Extensive computational experiments on a large database collected by authors. In this paper, we propose a complete, fully automatic and efficient clinical decision support system for breast cancer malignancy grading. The estimation of the level of a cancer malignancy is important to assess the degree of its progress and to elaborate a personalized therapy. Our system makes use of both Image Processing and Machine Learning techniques to perform the analysis of biopsy slides. Three different image segmentation methods (fuzzy c-means color segmentation, level set active contours technique and grey-level quantization method) are considered to extract the features used by the proposed classification system. In this classification problem, the highest malignancy grade is the most important to be detected early even though it occurs in the lowest number of cases, and hence the malignancy grading is an imbalanced classification problem. In order to overcome this difficulty, we propose the usage of an efficient ensemble classifier named EUSBoost, which combines a boosting scheme with evolutionary undersampling for producing balanced training sets for each one of the base classifiers in the final ensemble. The usage of the evolutionary approach allows us to select the most significant samples for the classifier learning step (in terms of accuracy and a new diversity term included in the fitness function), thus alleviating the problems produced by the imbalanced scenario in a guided and effective way. Experiments, carried on a large dataset collected by the authors, confirm the high efficiency of the proposed system, shows that level set active contours technique leads to an extraction of features with the highest discriminative power, and prove that EUSBoost is able to outperform state-of-the-art ensemble classifiers in a real-life imbalanced medical problem.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2015.08.060