Loading…

Applying feature-similarity-metrics for long-tailed problem of phytoplankton microscopic images classification

The distribution of phytoplankton in natural water bodies holds significant importance for aquatic for maintaining healthy aquatic environments. While deep learning has become a popular research field of automating phytoplankton identification, its performance is diminished by the uneven species dis...

Full description

Saved in:

Bibliographic Details
Published in:	Algal research (Amsterdam) 2024-08, Vol.82, p.103673, Article 103673
Main Authors:	Liang, Tianhong, Yin, Gaofang, Zhao, Nanjing, Jia, Renqing, Zhang, Xiaoling, Xu, Min, Zhang, Zihao, Dong, Ming, Hu, Xiang, Huang, Peng
Format:	Article
Language:	English
Subjects:	Deep learning Feature-center-constraint Feature-similarity-metrics Long-tailed distribution Microscopic image Phytoplankton
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The distribution of phytoplankton in natural water bodies holds significant importance for aquatic for maintaining healthy aquatic environments. While deep learning has become a popular research field of automating phytoplankton identification, its performance is diminished by the uneven species distribution in nature world, which biases classification models towards more advantaged species. Addressing this long-tailed problem, the paper proposed a novel deep learning idea termed feature-similarity-metrics species which utilizes the principle of decreasing intra-class differences and increasing the inter-class differences. A method called feature-center-constraint is introduced to implement the idea. Initially, a center is presupposed for each category, and these centers are distributed evenly and sparsely within the feature space using a uniform distribution initialization method. During model training, the Euclidean distance is used to measure the similarity between sample features and corresponding center. This approach enhanced the model's ability to predict disadvantaged phytoplankton species without compromising the classification accuracy of advantaged ones. Using a phytoplankton dataset of 26,564 images from 68 genera, collected from Chaohu Lake, this method demonstrates superior performance compared to existing techniques. Specially, it achieves a 5.69 % increase in micro-average F1 score and an 11.85 % rise in macro-average F1 score over general method, while F1 score is widely recognized as a reliable metric for measuring classification models' performance. These enhancements not only increase the accuracy of automated phytoplankton identification but also facilitate more effective monitoring of biodiversity and ecological health in aquatic systems. This makes the research highly beneficial for practical applications in environmental science, offering rapid and accurate insights into phytoplankton community structures, for ecological assessments. •Proposed a novel idea, feature-similarity-metrics, which decrease the intra-class and increase the inter-class differences.•Introduced an implementation called feature-center-constraint, which outperforms the SOTA methods from macro-domains.•The feature-center-constraint method is a plugin-like algorithm, which can be easily incorporated with other methods.•Collected 26,564 images covering 68 genera, built a phytoplankton microscopic image dataset with a long-tailed distribution.
ISSN:	2211-9264 2211-9264
DOI:	10.1016/j.algal.2024.103673