Loading…
A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately usin...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 6584 |
container_issue | |
container_start_page | 6580 |
container_title | |
container_volume | |
creator | Kalluri, Shareef Babu Vijayasenan, Deepu Ganapathy, Sriram |
description | Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files. |
doi_str_mv | 10.1109/ICASSP.2019.8683397 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8683397</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8683397</ieee_id><sourcerecordid>8683397</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3</originalsourceid><addsrcrecordid>eNotkNtKAzEYhKMgWGufoDf_C2xNNtlNclnb2ir1AKvgXclu_rRrDynZiPj2htqrD2ZgmBlChoyOGKP67nEyrqq3UU6ZHqlSca7lBRloqZiQWivGGbskvZxLnTFNP6_JTdd9UUqVFKpHtmOYIh7hBb-D2SXEHx-2cG86tDA7WIj-hGdvcQfOB3jy7SHCAtv1JoJJ1niNMOtiuzex9Qdwwe-h2vgQYZoyT1p1RGw2t-TKmV2HgzP75ONh9j5ZZMvXeVqxzFomi5iV6KwWpcylsa5ghhe1MbQQ0tV1jk7UJeZGN6KmjXFILdW0ScPTAbVCKxreJ8P_3BYRV8eQmoXf1fkb_gdI-lkv</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><source>IEEE Xplore All Conference Series</source><creator>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</creator><creatorcontrib>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</creatorcontrib><description>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781479981311</identifier><identifier>EISBN: 1479981311</identifier><identifier>DOI: 10.1109/ICASSP.2019.8683397</identifier><language>eng</language><publisher>IEEE</publisher><subject>Automatic Joint Height and Age Estimation ; Deep neural network ; Feature extraction ; Neural networks ; Prediction algorithms ; Predictive models ; Short duration ; Support vector machines ; Support Vector Regression ; Training ; Training data</subject><ispartof>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, p.6580-6584</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8683397$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8683397$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kalluri, Shareef Babu</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Ganapathy, Sriram</creatorcontrib><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><title>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</description><subject>Automatic Joint Height and Age Estimation</subject><subject>Deep neural network</subject><subject>Feature extraction</subject><subject>Neural networks</subject><subject>Prediction algorithms</subject><subject>Predictive models</subject><subject>Short duration</subject><subject>Support vector machines</subject><subject>Support Vector Regression</subject><subject>Training</subject><subject>Training data</subject><issn>2379-190X</issn><isbn>9781479981311</isbn><isbn>1479981311</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkNtKAzEYhKMgWGufoDf_C2xNNtlNclnb2ir1AKvgXclu_rRrDynZiPj2htqrD2ZgmBlChoyOGKP67nEyrqq3UU6ZHqlSca7lBRloqZiQWivGGbskvZxLnTFNP6_JTdd9UUqVFKpHtmOYIh7hBb-D2SXEHx-2cG86tDA7WIj-hGdvcQfOB3jy7SHCAtv1JoJJ1niNMOtiuzex9Qdwwe-h2vgQYZoyT1p1RGw2t-TKmV2HgzP75ONh9j5ZZMvXeVqxzFomi5iV6KwWpcylsa5ghhe1MbQQ0tV1jk7UJeZGN6KmjXFILdW0ScPTAbVCKxreJ8P_3BYRV8eQmoXf1fkb_gdI-lkv</recordid><startdate>201905</startdate><enddate>201905</enddate><creator>Kalluri, Shareef Babu</creator><creator>Vijayasenan, Deepu</creator><creator>Ganapathy, Sriram</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201905</creationdate><title>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</title><author>Kalluri, Shareef Babu ; Vijayasenan, Deepu ; Ganapathy, Sriram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Automatic Joint Height and Age Estimation</topic><topic>Deep neural network</topic><topic>Feature extraction</topic><topic>Neural networks</topic><topic>Prediction algorithms</topic><topic>Predictive models</topic><topic>Short duration</topic><topic>Support vector machines</topic><topic>Support Vector Regression</topic><topic>Training</topic><topic>Training data</topic><toplevel>online_resources</toplevel><creatorcontrib>Kalluri, Shareef Babu</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Ganapathy, Sriram</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kalluri, Shareef Babu</au><au>Vijayasenan, Deepu</au><au>Ganapathy, Sriram</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech</atitle><btitle>ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2019-05</date><risdate>2019</risdate><spage>6580</spage><epage>6584</epage><pages>6580-6584</pages><eissn>2379-190X</eissn><eisbn>9781479981311</eisbn><eisbn>1479981311</eisbn><abstract>Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2019.8683397</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2379-190X |
ispartof | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, p.6580-6584 |
issn | 2379-190X |
language | eng |
recordid | cdi_ieee_primary_8683397 |
source | IEEE Xplore All Conference Series |
subjects | Automatic Joint Height and Age Estimation Deep neural network Feature extraction Neural networks Prediction algorithms Predictive models Short duration Support vector machines Support Vector Regression Training Training data |
title | A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T15%3A11%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Deep%20Neural%20Network%20Based%20End%20to%20End%20Model%20for%20Joint%20Height%20and%20Age%20Estimation%20from%20Short%20Duration%20Speech&rft.btitle=ICASSP%202019%20-%202019%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Kalluri,%20Shareef%20Babu&rft.date=2019-05&rft.spage=6580&rft.epage=6584&rft.pages=6580-6584&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2019.8683397&rft.eisbn=9781479981311&rft.eisbn_list=1479981311&rft_dat=%3Cieee_CHZPO%3E8683397%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-6efd946727adf51a35baa0547fbb2ef4b6e2a9c4b0cafe0d090c868201b8ed4c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8683397&rfr_iscdi=true |