Loading…
Learning image by-parts using early and late fusion of auto-encoder features
A novel sub-part learning scheme is introduced in our work for the purpose of recognizing handwritten numeral images. The idea is borrowed from the concept of visual perception and part-wise integration of visual information by the cortical regions of the brain. In this context, each numeral image i...
Saved in:
Published in: | Multimedia tools and applications 2021-08, Vol.80 (19), p.29601-29615 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A novel sub-part learning scheme is introduced in our work for the purpose of recognizing handwritten numeral images. The idea is borrowed from the concept of visual perception and part-wise integration of visual information by the cortical regions of the brain. In this context, each numeral image is divided into four half-parts: top-half, bottom-half, left-half and right-half; the other half of the image being kept masked. An efficient data representation is derived in an unsupervised manner, from each image part, using convolutional auto-encoders (CAE), for our learning scheme that involves both early and late fusion of features. The chief advantage of the features derived from convolutional auto-encoders is the preservation of 2D spatial locality while the features are being filtered layer-by-layer through the convolutional architecture. The features derived from each individual CAE are fused by concatenation in our early fusion scheme, and learnt using an appropriate classifier. The late fusion strategy involves learning the probability density pertaining to the predicted values emanating from the four base classifiers using a meta-learner classifier. The early-cum-late fusion is proposed in the later stage of our work to combine the goodness of both schemes and enhance the performance. The support vector machine is used in all the classification stages. Experiments on the benchmark MNIST dataset of handwritten English numerals prove that our method competes favorably to the state of the art, as inferred from the high classification scores achieved. Our method thus provides a computationally simple and effective methodology for sub-part learning and part-wise integration of information from different parts of the image. The method also contributes to saving in computational expense since, at a time, only a small part of the image is processed, speeding up the inferencing process. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-021-11092-8 |