Loading…

AERO: Audio Super Resolution in the Spectral Domain

We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with UNet like skip connections. We optimize the model using both time and frequency domain loss functions. Specifically, we consider a set...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mandel, Moshe, Tal, Or, Adi, Yossi
Format:	Conference Proceeding
Language:	English
Subjects:	audio super-resolution Bandwidth bandwidth extension Codes Frequency-domain analysis High frequency Multiple signal classification Speech processing speech synthesis Superresolution
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	5
container_issue
container_start_page	1
container_title
container_volume
creator	Mandel, Moshe Tal, Or Adi, Yossi
description	We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with UNet like skip connections. We optimize the model using both time and frequency domain loss functions. Specifically, we consider a set of reconstruction losses together with perceptual ones in the form of adversarial and feature discriminator loss functions. To better handle phase information the proposed method operates over the complex-valued spectrogram using two separate channels. Unlike prior work which mainly considers low and high frequency concatenation for audio super-resolution, the proposed method directly predicts the full frequency range. We demonstrate high performance across a wide range of sample rates considering both speech and music. AERO outperforms the evaluated baselines considering Log-Spectral Distance, ViSQOL, and the subjective MUSHRA test. Audio samples and code are available {\color{Blue}\text{here}}.
doi_str_mv	10.1109/ICASSP49357.2023.10095382
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10095382</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10095382</ieee_id><sourcerecordid>10095382</sourcerecordid><originalsourceid>FETCH-LOGICAL-i1702-497f5ed6ef4b4bdabf8551622865d288896348c176ad6b67ae0ad2732f45efa43</originalsourceid><addsrcrecordid>eNo1j8tKw0AUQEdBsK3-gYvxAxJn7rzdhdpqodDSKLgrk84dHEmTkMfCv1dQV2d3OIeQe85yzpl72CyLstxLJ5TJgYHIOWNOCQsXZM4NWK4FGHNJZiCMy7hj79dkPgyfjDFrpJ0RUawOu0daTCG1tJw67OkBh7aextQ2NDV0_EBadngae1_Tp_bsU3NDrqKvB7z944K8rVevy5dsu3v-CdpmiRsGmXQmKgwao6xkFXwVrVJcA1itAlhrnRbSnrjRPuhKG4_MBzAColQYvRQLcvfrTYh47Pp09v3X8f9QfANhwUTv</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>AERO: Audio Super Resolution in the Spectral Domain</title><source>IEEE Xplore All Conference Series</source><creator>Mandel, Moshe ; Tal, Or ; Adi, Yossi</creator><creatorcontrib>Mandel, Moshe ; Tal, Or ; Adi, Yossi</creatorcontrib><description>We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with UNet like skip connections. We optimize the model using both time and frequency domain loss functions. Specifically, we consider a set of reconstruction losses together with perceptual ones in the form of adversarial and feature discriminator loss functions. To better handle phase information the proposed method operates over the complex-valued spectrogram using two separate channels. Unlike prior work which mainly considers low and high frequency concatenation for audio super-resolution, the proposed method directly predicts the full frequency range. We demonstrate high performance across a wide range of sample rates considering both speech and music. AERO outperforms the evaluated baselines considering Log-Spectral Distance, ViSQOL, and the subjective MUSHRA test. Audio samples and code are available {\color{Blue}\text{here}}.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1728163277</identifier><identifier>EISBN: 9781728163277</identifier><identifier>DOI: 10.1109/ICASSP49357.2023.10095382</identifier><language>eng</language><publisher>IEEE</publisher><subject>audio super-resolution ; Bandwidth ; bandwidth extension ; Codes ; Frequency-domain analysis ; High frequency ; Multiple signal classification ; Speech processing ; speech synthesis ; Superresolution</subject><ispartof>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, p.1-5</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10095382$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,4036,4037,23911,23912,25121,27906,54536,54913</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10095382$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Mandel, Moshe</creatorcontrib><creatorcontrib>Tal, Or</creatorcontrib><creatorcontrib>Adi, Yossi</creatorcontrib><title>AERO: Audio Super Resolution in the Spectral Domain</title><title>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with UNet like skip connections. We optimize the model using both time and frequency domain loss functions. Specifically, we consider a set of reconstruction losses together with perceptual ones in the form of adversarial and feature discriminator loss functions. To better handle phase information the proposed method operates over the complex-valued spectrogram using two separate channels. Unlike prior work which mainly considers low and high frequency concatenation for audio super-resolution, the proposed method directly predicts the full frequency range. We demonstrate high performance across a wide range of sample rates considering both speech and music. AERO outperforms the evaluated baselines considering Log-Spectral Distance, ViSQOL, and the subjective MUSHRA test. Audio samples and code are available {\color{Blue}\text{here}}.</description><subject>audio super-resolution</subject><subject>Bandwidth</subject><subject>bandwidth extension</subject><subject>Codes</subject><subject>Frequency-domain analysis</subject><subject>High frequency</subject><subject>Multiple signal classification</subject><subject>Speech processing</subject><subject>speech synthesis</subject><subject>Superresolution</subject><issn>2379-190X</issn><isbn>1728163277</isbn><isbn>9781728163277</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1j8tKw0AUQEdBsK3-gYvxAxJn7rzdhdpqodDSKLgrk84dHEmTkMfCv1dQV2d3OIeQe85yzpl72CyLstxLJ5TJgYHIOWNOCQsXZM4NWK4FGHNJZiCMy7hj79dkPgyfjDFrpJ0RUawOu0daTCG1tJw67OkBh7aextQ2NDV0_EBadngae1_Tp_bsU3NDrqKvB7z944K8rVevy5dsu3v-CdpmiRsGmXQmKgwao6xkFXwVrVJcA1itAlhrnRbSnrjRPuhKG4_MBzAColQYvRQLcvfrTYh47Pp09v3X8f9QfANhwUTv</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Mandel, Moshe</creator><creator>Tal, Or</creator><creator>Adi, Yossi</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2023</creationdate><title>AERO: Audio Super Resolution in the Spectral Domain</title><author>Mandel, Moshe ; Tal, Or ; Adi, Yossi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i1702-497f5ed6ef4b4bdabf8551622865d288896348c176ad6b67ae0ad2732f45efa43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>audio super-resolution</topic><topic>Bandwidth</topic><topic>bandwidth extension</topic><topic>Codes</topic><topic>Frequency-domain analysis</topic><topic>High frequency</topic><topic>Multiple signal classification</topic><topic>Speech processing</topic><topic>speech synthesis</topic><topic>Superresolution</topic><toplevel>online_resources</toplevel><creatorcontrib>Mandel, Moshe</creatorcontrib><creatorcontrib>Tal, Or</creatorcontrib><creatorcontrib>Adi, Yossi</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mandel, Moshe</au><au>Tal, Or</au><au>Adi, Yossi</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>AERO: Audio Super Resolution in the Spectral Domain</atitle><btitle>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2023</date><risdate>2023</risdate><spage>1</spage><epage>5</epage><pages>1-5</pages><eissn>2379-190X</eissn><eisbn>1728163277</eisbn><eisbn>9781728163277</eisbn><abstract>We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with UNet like skip connections. We optimize the model using both time and frequency domain loss functions. Specifically, we consider a set of reconstruction losses together with perceptual ones in the form of adversarial and feature discriminator loss functions. To better handle phase information the proposed method operates over the complex-valued spectrogram using two separate channels. Unlike prior work which mainly considers low and high frequency concatenation for audio super-resolution, the proposed method directly predicts the full frequency range. We demonstrate high performance across a wide range of sample rates considering both speech and music. AERO outperforms the evaluated baselines considering Log-Spectral Distance, ViSQOL, and the subjective MUSHRA test. Audio samples and code are available {\color{Blue}\text{here}}.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP49357.2023.10095382</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2379-190X
ispartof	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, p.1-5
issn	2379-190X
language	eng
recordid	cdi_ieee_primary_10095382
source	IEEE Xplore All Conference Series
subjects	audio super-resolution Bandwidth bandwidth extension Codes Frequency-domain analysis High frequency Multiple signal classification Speech processing speech synthesis Superresolution
title	AERO: Audio Super Resolution in the Spectral Domain
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T20%3A56%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=AERO:%20Audio%20Super%20Resolution%20in%20the%20Spectral%20Domain&rft.btitle=ICASSP%202023%20-%202023%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Mandel,%20Moshe&rft.date=2023&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP49357.2023.10095382&rft.eisbn=1728163277&rft.eisbn_list=9781728163277&rft_dat=%3Cieee_CHZPO%3E10095382%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i1702-497f5ed6ef4b4bdabf8551622865d288896348c176ad6b67ae0ad2732f45efa43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10095382&rfr_iscdi=true