Loading…

A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech

Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately usin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kalluri, Shareef Babu, Vijayasenan, Deepu, Ganapathy, Sriram
Format:	Conference Proceeding
Language:	English
Subjects:	Automatic Joint Height and Age Estimation Deep neural network Feature extraction Neural networks Prediction algorithms Predictive models Short duration Support vector machines Support Vector Regression Training Training data
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	6584
container_issue
container_start_page	6580
container_title
container_volume
creator	Kalluri, Shareef Babu Vijayasenan, Deepu Ganapathy, Sriram
description	Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.
doi_str_mv	10.1109/ICASSP.2019.8683397
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8683397</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8683397</ieee_id><sourcerecordid>8683397</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3</originalsourceid><addsrcrecordid>eNotkNtKAzEYhKMgWGufoDf_C2xNNtlNclnb2ir1AKvgXclu_rRrDynZiPj2htqrD2ZgmBlChoyOGKP67nEyrqq3UU6ZHqlSca7lBRloqZiQWivGGbskvZxLnTFNP6_JTdd9UUqVFKpHtmOYIh7hBb-D2SXEHx-2cG86tDA7WIj-hGdvcQfOB3jy7SHCAtv1JoJJ1niNMOtiuzex9Qdwwe-h2vgQYZoyT1p1RGw2t-TKmV2HgzP75ONh9j5ZZMvXeVqxzFomi5iV6KwWpcylsa5ghhe1MbQQ0tV1jk7UJeZGN6KmjXFILdW0ScPTAbVCKxreJ8P_3BYRV8eQmoXf1fkb_gdI-lkv</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><source>IEEE Xplore All Conference Series</source><creator>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</creator><creatorcontrib>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</creatorcontrib><description>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781479981311</identifier><identifier>EISBN: 1479981311</identifier><identifier>DOI: 10.1109/ICASSP.2019.8683397</identifier><language>eng</language><publisher>IEEE</publisher><subject>Automatic Joint Height and Age Estimation ; Deep neural network ; Feature extraction ; Neural networks ; Prediction algorithms ; Predictive models ; Short duration ; Support vector machines ; Support Vector Regression ; Training ; Training data</subject><ispartof>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, p.6580-6584</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8683397$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8683397$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kalluri, Shareef Babu</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Ganapathy, Sriram</creatorcontrib><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><title>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</description><subject>Automatic Joint Height and Age Estimation</subject><subject>Deep neural network</subject><subject>Feature extraction</subject><subject>Neural networks</subject><subject>Prediction algorithms</subject><subject>Predictive models</subject><subject>Short duration</subject><subject>Support vector machines</subject><subject>Support Vector Regression</subject><subject>Training</subject><subject>Training data</subject><issn>2379-190X</issn><isbn>9781479981311</isbn><isbn>1479981311</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkNtKAzEYhKMgWGufoDf_C2xNNtlNclnb2ir1AKvgXclu_rRrDynZiPj2htqrD2ZgmBlChoyOGKP67nEyrqq3UU6ZHqlSca7lBRloqZiQWivGGbskvZxLnTFNP6_JTdd9UUqVFKpHtmOYIh7hBb-D2SXEHx-2cG86tDA7WIj-hGdvcQfOB3jy7SHCAtv1JoJJ1niNMOtiuzex9Qdwwe-h2vgQYZoyT1p1RGw2t-TKmV2HgzP75ONh9j5ZZMvXeVqxzFomi5iV6KwWpcylsa5ghhe1MbQQ0tV1jk7UJeZGN6KmjXFILdW0ScPTAbVCKxreJ8P_3BYRV8eQmoXf1fkb_gdI-lkv</recordid><startdate>201905</startdate><enddate>201905</enddate><creator>Kalluri, Shareef Babu</creator><creator>Vijayasenan, Deepu</creator><creator>Ganapathy, Sriram</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201905</creationdate><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><author>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Automatic Joint Height and Age Estimation</topic><topic>Deep neural network</topic><topic>Feature extraction</topic><topic>Neural networks</topic><topic>Prediction algorithms</topic><topic>Predictive models</topic><topic>Short duration</topic><topic>Support vector machines</topic><topic>Support Vector Regression</topic><topic>Training</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Kalluri, Shareef Babu</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Ganapathy, Sriram</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kalluri, Shareef Babu</au><au>Vijayasenan, Deepu</au><au>Ganapathy, Sriram</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</atitle><btitle>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2019-05</date><risdate>2019</risdate><spage>6580</spage><epage>6584</epage><pages>6580-6584</pages><eissn>2379-190X</eissn><eisbn>9781479981311</eisbn><eisbn>1479981311</eisbn><abstract>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2019.8683397</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2379-190X
ispartof	ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, p.6580-6584
issn	2379-190X
language	eng
recordid	cdi_ieee_primary_8683397
source	IEEE Xplore All Conference Series
subjects	Automatic Joint Height and Age Estimation Deep neural network Feature extraction Neural networks Prediction algorithms Predictive models Short duration Support vector machines Support Vector Regression Training Training data
title	A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T15%3A11%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Deep%20Neural%20Network%20Based%20End%20to%20End%20Model%20for%20Joint%20Height%20and%20Age%20Estimation%20from%20Short%20Duration%20Speech&rft.btitle=ICASSP%202019%20-%202019%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Kalluri,%20Shareef%20Babu&rft.date=2019-05&rft.spage=6580&rft.epage=6584&rft.pages=6580-6584&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2019.8683397&rft.eisbn=9781479981311&rft.eisbn_list=1479981311&rft_dat=%3Cieee_CHZPO%3E8683397%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8683397&rfr_iscdi=true