Loading…

Neural-network-based voice-tracking algorithm

A voice-tracking algorithm was developed and tested for the purposes of electronically separating the voice signals of simultaneous talkers. Many individuals suffer from hearing disorders that often inhibit their ability to focus on a single speaker in a multiple speaker environment (the cocktail pa...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of the Acoustical Society of America 2002-11, Vol.112 (5_Supplement), p.2304-2304
Main Authors: Baker, Mary, Stevens, Charise, Chaparro, Brennen, Paschall, Dwayne
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A voice-tracking algorithm was developed and tested for the purposes of electronically separating the voice signals of simultaneous talkers. Many individuals suffer from hearing disorders that often inhibit their ability to focus on a single speaker in a multiple speaker environment (the cocktail party effect). Digital hearing aid technology makes it possible to implement complex algorithms for speech processing in both the time and frequency domains. In this work, an average magnitude difference function (AMDF) was performed on mixed voice signals in order to determine the fundamental frequencies present in the signals. A time prediction neural network was trained to recognize normal human voice inflection patterns, including rising, falling, rising–falling, and falling–rising patterns. The neural network was designed to track the fundamental frequency of a single talker based on the training procedure. The output of the neural network can be used to design an active filter for speaker segregation. Tests were done using audio mixing of two to three speakers uttering short phrases. The AMDF function accurately identified the fundamental frequencies present in the signal. The neural network was tested using a single speaker uttering a short sentence. The network accurately tracked the fundamental frequency of the speaker.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.4779277