Loading…

Hearing Anything Anywhere

Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to o...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-06
Main Authors:	Wang, Mason, Sawata, Ryosuke, Clarke, Samuel, Gao, Ruohan, Wu, Shangzhe, Wu, Jiajun
Format:	Article
Language:	English
Subjects:	Acoustic properties Acoustics Computer graphics Computer vision Directivity Image reconstruction Impulse response Mixed reality Music Rendering Sound sources
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Wang, Mason Sawata, Ryosuke Clarke, Samuel Gao, Ruohan Wu, Shangzhe Wu, Jiajun
description	Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3067020634</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3067020634</sourcerecordid><originalsourceid>FETCH-proquest_journals_30670206343</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ9EhNLMrMS1dwzKssyYAyyjNSi1J5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2MDM3MDIwMzYxJg4VQCNUypf</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3067020634</pqid></control><display><type>article</type><title>Hearing Anything Anywhere</title><source>Publicly Available Content Database</source><creator>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</creator><creatorcontrib>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</creatorcontrib><description>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acoustic properties ; Acoustics ; Computer graphics ; Computer vision ; Directivity ; Image reconstruction ; Impulse response ; Mixed reality ; Music ; Rendering ; Sound sources</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3067020634?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25751,37010,44588</link.rule.ids></links><search><creatorcontrib>Wang, Mason</creatorcontrib><creatorcontrib>Sawata, Ryosuke</creatorcontrib><creatorcontrib>Clarke, Samuel</creatorcontrib><creatorcontrib>Gao, Ruohan</creatorcontrib><creatorcontrib>Wu, Shangzhe</creatorcontrib><creatorcontrib>Wu, Jiajun</creatorcontrib><title>Hearing Anything Anywhere</title><title>arXiv.org</title><description>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</description><subject>Acoustic properties</subject><subject>Acoustics</subject><subject>Computer graphics</subject><subject>Computer vision</subject><subject>Directivity</subject><subject>Image reconstruction</subject><subject>Impulse response</subject><subject>Mixed reality</subject><subject>Music</subject><subject>Rendering</subject><subject>Sound sources</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ9EhNLMrMS1dwzKssyYAyyjNSi1J5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2MDM3MDIwMzYxJg4VQCNUypf</recordid><startdate>20240611</startdate><enddate>20240611</enddate><creator>Wang, Mason</creator><creator>Sawata, Ryosuke</creator><creator>Clarke, Samuel</creator><creator>Gao, Ruohan</creator><creator>Wu, Shangzhe</creator><creator>Wu, Jiajun</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240611</creationdate><title>Hearing Anything Anywhere</title><author>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30670206343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acoustic properties</topic><topic>Acoustics</topic><topic>Computer graphics</topic><topic>Computer vision</topic><topic>Directivity</topic><topic>Image reconstruction</topic><topic>Impulse response</topic><topic>Mixed reality</topic><topic>Music</topic><topic>Rendering</topic><topic>Sound sources</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Mason</creatorcontrib><creatorcontrib>Sawata, Ryosuke</creatorcontrib><creatorcontrib>Clarke, Samuel</creatorcontrib><creatorcontrib>Gao, Ruohan</creatorcontrib><creatorcontrib>Wu, Shangzhe</creatorcontrib><creatorcontrib>Wu, Jiajun</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Mason</au><au>Sawata, Ryosuke</au><au>Clarke, Samuel</au><au>Gao, Ruohan</au><au>Wu, Shangzhe</au><au>Wu, Jiajun</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Hearing Anything Anywhere</atitle><jtitle>arXiv.org</jtitle><date>2024-06-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-06
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3067020634
source	Publicly Available Content Database
subjects	Acoustic properties Acoustics Computer graphics Computer vision Directivity Image reconstruction Impulse response Mixed reality Music Rendering Sound sources
title	Hearing Anything Anywhere
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A44%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Hearing%20Anything%20Anywhere&rft.jtitle=arXiv.org&rft.au=Wang,%20Mason&rft.date=2024-06-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3067020634%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30670206343%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3067020634&rft_id=info:pmid/&rfr_iscdi=true