Loading…

Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons

Passive acoustic monitoring – an approach that utilizes autonomous acoustic recording units – allows for non-invasive monitoring of individuals, assuming that it is possible to acoustically distinguish individuals. However, identifying effective analytical approaches for individual identification re...

Full description

Saved in:

Bibliographic Details
Published in:	Ecological informatics 2024-05, Vol.80, p.102457, Article 102457
Main Authors:	Lakdari, Mohamed Walid, Ahmad, Abdul Hamid, Sethi, Sarab, Bohn, Gabriel A., Clink, Dena J.
Format:	Article
Language:	English
Subjects:	Acoustic indices acoustics automation Convolutional neural networks females humans Hylobates Mel-frequency cepstral coefficients signal-to-noise ratio Sound feature extraction trees Vocal individuality
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Passive acoustic monitoring – an approach that utilizes autonomous acoustic recording units – allows for non-invasive monitoring of individuals, assuming that it is possible to acoustically distinguish individuals. However, identifying effective analytical approaches for individual identification remains a challenge. Our study investigates how the use of different feature representations impacts our ability to distinguish between individual female Northern grey gibbons (Hylobates funereus). We broadcast pre-recorded calls from twelve gibbon females and re-recorded the calls at varying distances (directly under the tree to ∼400 m away) using autonomous recording units. We evaluated the effectiveness of using different automated feature extraction approaches to classify gibbon calls: Mel-frequency cepstral coefficients (MFCCs), embeddings from three pre-trained neural networks (BirdNET, VGGish, and Wav2Vec2), and four commonly used acoustic indices. We used a supervised classification approach (random forest) to classify calls to the respective female and compared two unsupervised clustering approaches (affinity propagation clustering and hierarchical density-based spatial clustering) to evaluate which features were most effective for distinguishing female calls without using class labels. We used MFCCs as a baseline as previous work has shown they can be used to distinguish high-quality calls of individual gibbon females. Human annotators could only identify calls in spectrograms from recordings 10 dB), while the remaining features did not perform well. Contrary to our expectations, we found that MFCCs outperformed all other features for the unsupervised clustering tasks at closer distances and none of the features performed well at farther distances. The ability to acoustically discriminate animals under noisy conditions and from low signal-to-noise ratio calls has important implications for monitoring populations of endangered animals, such as gibbons. Focusing only on high signal-to-noise ratio calls for individual discrimination may not be possible for rare sounds, and future work should focus on
ISSN:	1574-9541
DOI:	10.1016/j.ecoinf.2023.102457