Loading…
A fine-grained approach for visual interpretability of convolutional neural networks
In this paper, we propose Multilayer network-based Visual Interpreter (MuVI), a framework for visual interpretability of Convolutional Neural Networks (CNNs) based on their mapping into multilayer networks. The peculiarity of MuVI is that it constructs a pixel-level heatmap of the salient parts of a...
Saved in:
Published in: | Applied soft computing 2025-02, Vol.170, p.112635, Article 112635 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we propose Multilayer network-based Visual Interpreter (MuVI), a framework for visual interpretability of Convolutional Neural Networks (CNNs) based on their mapping into multilayer networks. The peculiarity of MuVI is that it constructs a pixel-level heatmap of the salient parts of an image processed by a CNN, where the importance of each pixel depends on all layers of the CNN and not only on the final ones, as in the existing approaches in the literature. MuVI first maps the CNN into a multilayer network. It then uses this representation to identify the parts of the CNN that most influence the prediction results by extracting those paths within the multilayer network whose nodes correspond to the most active areas of the feature maps. The weight of the paths is given by the sum of the weights of the arcs corresponding to the activations across all feature maps of the CNN; this characteristic allows MuVI to consider all layers of the CNN, not just the last ones. Finally, MuVI constructs the visual interpretability heatmap by selecting the paths with the highest weights. The experimental tests performed show that MuVI is able to achieve very satisfactory results in terms of AUC insertion (0.25), AUC deletion (0.11), % Increase in Confidence (12.32), Average Drop % (51.22), Pointing Game Accuracy (0.28) and Computation time (26.226s). These results, taking all these measures together, are better than those obtained by the classical approaches already proposed in the literature, such as SmoothGrad, Grad-CAM, Grad-CAM++, and RISE. They are also comparable to state-of-the-art approaches in the literature, such as Score-CAM and HSIC.
•A framework for visual interpretability of CNNs through a multilayer network.•Approaches to compute heatmaps from multilayer networks.•Visual interpretability for VGG16 and Imagenet.•Computation of heatmaps with pixel-level rather than area-level precision.•Competitive performance of the proposed framework compared to existing ones. |
---|---|
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2024.112635 |