Loading…

Semantic Tagging of Singing Voices in Popular Music Recordings

Singing voice is a key sound source in popular music. As recent music streaming and entertainment services call for more intelligent solutions to retrieve songs or evaluate musical characteristics, automatic analysis of popular music targeted to singing voice has been a significant research subject....

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2020, Vol.28, p.1656-1668
Main Authors: Kim, Keunhyoung Luke, Lee, Jongpil, Kum, Sangeun, Park, Chae Lin, Nam, Juhan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53
cites cdi_FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53
container_end_page 1668
container_issue
container_start_page 1656
container_title IEEE/ACM transactions on audio, speech, and language processing
container_volume 28
creator Kim, Keunhyoung Luke
Lee, Jongpil
Kum, Sangeun
Park, Chae Lin
Nam, Juhan
description Singing voice is a key sound source in popular music. As recent music streaming and entertainment services call for more intelligent solutions to retrieve songs or evaluate musical characteristics, automatic analysis of popular music targeted to singing voice has been a significant research subject. The majority of studies have focused on quantitative or objective information of singing voice such as pitch, lyrics or singer identity. However, singing voice has a wide variety of dimensions that are somewhat difficult to quantify and therefore we often describe by words. In this article, we address the qualitative analysis of singing voice as a music auto-tagging task that annotates songs with a set of tag words. To this end, we build a music tag dataset dedicated to singing voice. Specifically, we define a vocabulary that describes timbre and singing styles of K-pop vocalists and collect human annotations for individual tracks. We then conduct statistical analysis to understand the global and temporal characteristics of the tag words. Using the dataset, we train a deep neural network model to automatically predict the voice-specific tags from popular music recordings and evaluate the model in different conditions. We discuss the results by comparing them to the statistical analysis of tag words. Finally, we show potential applications of the vocal tagging system in music retrieval, music thumbnailing and singing evaluation.
doi_str_mv 10.1109/TASLP.2020.2993893
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9097399</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9097399</ieee_id><sourcerecordid>2410516948</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWGr_gG4GXE-9eUwydyOU4gsqFlvdhjSTKVPaSU1mFv57U1tdnbP4zr3wEXJNYUwp4N1yspjNxwwYjBkiL5GfkQHjDHPkIM7_OkO4JKMYNwBAQSEqMSD3C7czbdfYbGnW66ZdZ77OFikP9dM31sWsabO53_dbE7LXPib03VkfqkTEK3JRm210o1MOycfjw3L6nM_enl6mk1luGRZdXlFTCVFLbpyghUJpVVHZ0oArqColylJUVHJbKVhJRmG1YtSANQhYKm4KPiS3x7v74L96Fzu98X1o00vNBIWCShRlotiRssHHGFyt96HZmfCtKeiDKv2rSh9U6ZOqNLo5jhrn3P8gfVY8ET-QCWMm</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2410516948</pqid></control><display><type>article</type><title>Semantic Tagging of Singing Voices in Popular Music Recordings</title><source>IEEE Electronic Library (IEL) Journals</source><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Kim, Keunhyoung Luke ; Lee, Jongpil ; Kum, Sangeun ; Park, Chae Lin ; Nam, Juhan</creator><creatorcontrib>Kim, Keunhyoung Luke ; Lee, Jongpil ; Kum, Sangeun ; Park, Chae Lin ; Nam, Juhan</creatorcontrib><description>Singing voice is a key sound source in popular music. As recent music streaming and entertainment services call for more intelligent solutions to retrieve songs or evaluate musical characteristics, automatic analysis of popular music targeted to singing voice has been a significant research subject. The majority of studies have focused on quantitative or objective information of singing voice such as pitch, lyrics or singer identity. However, singing voice has a wide variety of dimensions that are somewhat difficult to quantify and therefore we often describe by words. In this article, we address the qualitative analysis of singing voice as a music auto-tagging task that annotates songs with a set of tag words. To this end, we build a music tag dataset dedicated to singing voice. Specifically, we define a vocabulary that describes timbre and singing styles of K-pop vocalists and collect human annotations for individual tracks. We then conduct statistical analysis to understand the global and temporal characteristics of the tag words. Using the dataset, we train a deep neural network model to automatically predict the voice-specific tags from popular music recordings and evaluate the model in different conditions. We discuss the results by comparing them to the statistical analysis of tag words. Finally, we show potential applications of the vocal tagging system in music retrieval, music thumbnailing and singing evaluation.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2020.2993893</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; Artificial neural networks ; convolutional neural networks ; Datasets ; Instruments ; K-Pop ; Marking ; music tagging ; Musical performances ; Popular music ; Qualitative analysis ; semantic analysis ; Singing ; Singing voice ; Sound sources ; Statistical analysis ; Tagging ; Timbre ; Vocabulary ; vocal ; Voice ; Words (language)</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2020, Vol.28, p.1656-1668</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53</citedby><cites>FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53</cites><orcidid>0000-0002-1126-0081 ; 0000-0001-8167-5752 ; 0000-0003-2664-2119</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9097399$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4010,27900,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Kim, Keunhyoung Luke</creatorcontrib><creatorcontrib>Lee, Jongpil</creatorcontrib><creatorcontrib>Kum, Sangeun</creatorcontrib><creatorcontrib>Park, Chae Lin</creatorcontrib><creatorcontrib>Nam, Juhan</creatorcontrib><title>Semantic Tagging of Singing Voices in Popular Music Recordings</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>Singing voice is a key sound source in popular music. As recent music streaming and entertainment services call for more intelligent solutions to retrieve songs or evaluate musical characteristics, automatic analysis of popular music targeted to singing voice has been a significant research subject. The majority of studies have focused on quantitative or objective information of singing voice such as pitch, lyrics or singer identity. However, singing voice has a wide variety of dimensions that are somewhat difficult to quantify and therefore we often describe by words. In this article, we address the qualitative analysis of singing voice as a music auto-tagging task that annotates songs with a set of tag words. To this end, we build a music tag dataset dedicated to singing voice. Specifically, we define a vocabulary that describes timbre and singing styles of K-pop vocalists and collect human annotations for individual tracks. We then conduct statistical analysis to understand the global and temporal characteristics of the tag words. Using the dataset, we train a deep neural network model to automatically predict the voice-specific tags from popular music recordings and evaluate the model in different conditions. We discuss the results by comparing them to the statistical analysis of tag words. Finally, we show potential applications of the vocal tagging system in music retrieval, music thumbnailing and singing evaluation.</description><subject>Annotations</subject><subject>Artificial neural networks</subject><subject>convolutional neural networks</subject><subject>Datasets</subject><subject>Instruments</subject><subject>K-Pop</subject><subject>Marking</subject><subject>music tagging</subject><subject>Musical performances</subject><subject>Popular music</subject><subject>Qualitative analysis</subject><subject>semantic analysis</subject><subject>Singing</subject><subject>Singing voice</subject><subject>Sound sources</subject><subject>Statistical analysis</subject><subject>Tagging</subject><subject>Timbre</subject><subject>Vocabulary</subject><subject>vocal</subject><subject>Voice</subject><subject>Words (language)</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNo9kEtLAzEUhYMoWGr_gG4GXE-9eUwydyOU4gsqFlvdhjSTKVPaSU1mFv57U1tdnbP4zr3wEXJNYUwp4N1yspjNxwwYjBkiL5GfkQHjDHPkIM7_OkO4JKMYNwBAQSEqMSD3C7czbdfYbGnW66ZdZ77OFikP9dM31sWsabO53_dbE7LXPib03VkfqkTEK3JRm210o1MOycfjw3L6nM_enl6mk1luGRZdXlFTCVFLbpyghUJpVVHZ0oArqColylJUVHJbKVhJRmG1YtSANQhYKm4KPiS3x7v74L96Fzu98X1o00vNBIWCShRlotiRssHHGFyt96HZmfCtKeiDKv2rSh9U6ZOqNLo5jhrn3P8gfVY8ET-QCWMm</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Kim, Keunhyoung Luke</creator><creator>Lee, Jongpil</creator><creator>Kum, Sangeun</creator><creator>Park, Chae Lin</creator><creator>Nam, Juhan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-1126-0081</orcidid><orcidid>https://orcid.org/0000-0001-8167-5752</orcidid><orcidid>https://orcid.org/0000-0003-2664-2119</orcidid></search><sort><creationdate>2020</creationdate><title>Semantic Tagging of Singing Voices in Popular Music Recordings</title><author>Kim, Keunhyoung Luke ; Lee, Jongpil ; Kum, Sangeun ; Park, Chae Lin ; Nam, Juhan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Annotations</topic><topic>Artificial neural networks</topic><topic>convolutional neural networks</topic><topic>Datasets</topic><topic>Instruments</topic><topic>K-Pop</topic><topic>Marking</topic><topic>music tagging</topic><topic>Musical performances</topic><topic>Popular music</topic><topic>Qualitative analysis</topic><topic>semantic analysis</topic><topic>Singing</topic><topic>Singing voice</topic><topic>Sound sources</topic><topic>Statistical analysis</topic><topic>Tagging</topic><topic>Timbre</topic><topic>Vocabulary</topic><topic>vocal</topic><topic>Voice</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Keunhyoung Luke</creatorcontrib><creatorcontrib>Lee, Jongpil</creatorcontrib><creatorcontrib>Kum, Sangeun</creatorcontrib><creatorcontrib>Park, Chae Lin</creatorcontrib><creatorcontrib>Nam, Juhan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Keunhyoung Luke</au><au>Lee, Jongpil</au><au>Kum, Sangeun</au><au>Park, Chae Lin</au><au>Nam, Juhan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semantic Tagging of Singing Voices in Popular Music Recordings</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2020</date><risdate>2020</risdate><volume>28</volume><spage>1656</spage><epage>1668</epage><pages>1656-1668</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>Singing voice is a key sound source in popular music. As recent music streaming and entertainment services call for more intelligent solutions to retrieve songs or evaluate musical characteristics, automatic analysis of popular music targeted to singing voice has been a significant research subject. The majority of studies have focused on quantitative or objective information of singing voice such as pitch, lyrics or singer identity. However, singing voice has a wide variety of dimensions that are somewhat difficult to quantify and therefore we often describe by words. In this article, we address the qualitative analysis of singing voice as a music auto-tagging task that annotates songs with a set of tag words. To this end, we build a music tag dataset dedicated to singing voice. Specifically, we define a vocabulary that describes timbre and singing styles of K-pop vocalists and collect human annotations for individual tracks. We then conduct statistical analysis to understand the global and temporal characteristics of the tag words. Using the dataset, we train a deep neural network model to automatically predict the voice-specific tags from popular music recordings and evaluate the model in different conditions. We discuss the results by comparing them to the statistical analysis of tag words. Finally, we show potential applications of the vocal tagging system in music retrieval, music thumbnailing and singing evaluation.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2020.2993893</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-1126-0081</orcidid><orcidid>https://orcid.org/0000-0001-8167-5752</orcidid><orcidid>https://orcid.org/0000-0003-2664-2119</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2329-9290
ispartof IEEE/ACM transactions on audio, speech, and language processing, 2020, Vol.28, p.1656-1668
issn 2329-9290
2329-9304
language eng
recordid cdi_ieee_primary_9097399
source IEEE Electronic Library (IEL) Journals; Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Annotations
Artificial neural networks
convolutional neural networks
Datasets
Instruments
K-Pop
Marking
music tagging
Musical performances
Popular music
Qualitative analysis
semantic analysis
Singing
Singing voice
Sound sources
Statistical analysis
Tagging
Timbre
Vocabulary
vocal
Voice
Words (language)
title Semantic Tagging of Singing Voices in Popular Music Recordings
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T06%3A20%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semantic%20Tagging%20of%20Singing%20Voices%20in%20Popular%20Music%20Recordings&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Kim,%20Keunhyoung%20Luke&rft.date=2020&rft.volume=28&rft.spage=1656&rft.epage=1668&rft.pages=1656-1668&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2020.2993893&rft_dat=%3Cproquest_ieee_%3E2410516948%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c295t-d1ad44f63ae415796c75dc8a0e517869684d163cd70b6210bb21a0ca909873a53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2410516948&rft_id=info:pmid/&rft_ieee_id=9097399&rfr_iscdi=true