Loading…

A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech

Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately usin...

Full description

Saved in:
Bibliographic Details
Main Authors: Kalluri, Shareef Babu, Vijayasenan, Deepu, Ganapathy, Sriram
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 6584
container_issue
container_start_page 6580
container_title
container_volume
creator Kalluri, Shareef Babu
Vijayasenan, Deepu
Ganapathy, Sriram
description Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.
doi_str_mv 10.1109/ICASSP.2019.8683397
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8683397</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8683397</ieee_id><sourcerecordid>8683397</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3</originalsourceid><addsrcrecordid>eNotkNtKAzEYhKMgWGufoDf_C2xNNtlNclnb2ir1AKvgXclu_rRrDynZiPj2htqrD2ZgmBlChoyOGKP67nEyrqq3UU6ZHqlSca7lBRloqZiQWivGGbskvZxLnTFNP6_JTdd9UUqVFKpHtmOYIh7hBb-D2SXEHx-2cG86tDA7WIj-hGdvcQfOB3jy7SHCAtv1JoJJ1niNMOtiuzex9Qdwwe-h2vgQYZoyT1p1RGw2t-TKmV2HgzP75ONh9j5ZZMvXeVqxzFomi5iV6KwWpcylsa5ghhe1MbQQ0tV1jk7UJeZGN6KmjXFILdW0ScPTAbVCKxreJ8P_3BYRV8eQmoXf1fkb_gdI-lkv</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><source>IEEE Xplore All Conference Series</source><creator>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</creator><creatorcontrib>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</creatorcontrib><description>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781479981311</identifier><identifier>EISBN: 1479981311</identifier><identifier>DOI: 10.1109/ICASSP.2019.8683397</identifier><language>eng</language><publisher>IEEE</publisher><subject>Automatic Joint Height and Age Estimation ; Deep neural network ; Feature extraction ; Neural networks ; Prediction algorithms ; Predictive models ; Short duration ; Support vector machines ; Support Vector Regression ; Training ; Training data</subject><ispartof>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, p.6580-6584</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8683397$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8683397$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kalluri, Shareef Babu</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Ganapathy, Sriram</creatorcontrib><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><title>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</description><subject>Automatic Joint Height and Age Estimation</subject><subject>Deep neural network</subject><subject>Feature extraction</subject><subject>Neural networks</subject><subject>Prediction algorithms</subject><subject>Predictive models</subject><subject>Short duration</subject><subject>Support vector machines</subject><subject>Support Vector Regression</subject><subject>Training</subject><subject>Training data</subject><issn>2379-190X</issn><isbn>9781479981311</isbn><isbn>1479981311</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkNtKAzEYhKMgWGufoDf_C2xNNtlNclnb2ir1AKvgXclu_rRrDynZiPj2htqrD2ZgmBlChoyOGKP67nEyrqq3UU6ZHqlSca7lBRloqZiQWivGGbskvZxLnTFNP6_JTdd9UUqVFKpHtmOYIh7hBb-D2SXEHx-2cG86tDA7WIj-hGdvcQfOB3jy7SHCAtv1JoJJ1niNMOtiuzex9Qdwwe-h2vgQYZoyT1p1RGw2t-TKmV2HgzP75ONh9j5ZZMvXeVqxzFomi5iV6KwWpcylsa5ghhe1MbQQ0tV1jk7UJeZGN6KmjXFILdW0ScPTAbVCKxreJ8P_3BYRV8eQmoXf1fkb_gdI-lkv</recordid><startdate>201905</startdate><enddate>201905</enddate><creator>Kalluri, Shareef Babu</creator><creator>Vijayasenan, Deepu</creator><creator>Ganapathy, Sriram</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201905</creationdate><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><author>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Automatic Joint Height and Age Estimation</topic><topic>Deep neural network</topic><topic>Feature extraction</topic><topic>Neural networks</topic><topic>Prediction algorithms</topic><topic>Predictive models</topic><topic>Short duration</topic><topic>Support vector machines</topic><topic>Support Vector Regression</topic><topic>Training</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Kalluri, Shareef Babu</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Ganapathy, Sriram</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kalluri, Shareef Babu</au><au>Vijayasenan, Deepu</au><au>Ganapathy, Sriram</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</atitle><btitle>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2019-05</date><risdate>2019</risdate><spage>6580</spage><epage>6584</epage><pages>6580-6584</pages><eissn>2379-190X</eissn><eisbn>9781479981311</eisbn><eisbn>1479981311</eisbn><abstract>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2019.8683397</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2379-190X
ispartof ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, p.6580-6584
issn 2379-190X
language eng
recordid cdi_ieee_primary_8683397
source IEEE Xplore All Conference Series
subjects Automatic Joint Height and Age Estimation
Deep neural network
Feature extraction
Neural networks
Prediction algorithms
Predictive models
Short duration
Support vector machines
Support Vector Regression
Training
Training data
title A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T15%3A11%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Deep%20Neural%20Network%20Based%20End%20to%20End%20Model%20for%20Joint%20Height%20and%20Age%20Estimation%20from%20Short%20Duration%20Speech&rft.btitle=ICASSP%202019%20-%202019%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Kalluri,%20Shareef%20Babu&rft.date=2019-05&rft.spage=6580&rft.epage=6584&rft.pages=6580-6584&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2019.8683397&rft.eisbn=9781479981311&rft.eisbn_list=1479981311&rft_dat=%3Cieee_CHZPO%3E8683397%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8683397&rfr_iscdi=true