Loading…

Hearing Anything Anywhere

Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to o...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-06
Main Authors: Wang, Mason, Sawata, Ryosuke, Clarke, Samuel, Gao, Ruohan, Wu, Shangzhe, Wu, Jiajun
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Wang, Mason
Sawata, Ryosuke
Clarke, Samuel
Gao, Ruohan
Wu, Shangzhe
Wu, Jiajun
description Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3067020634</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3067020634</sourcerecordid><originalsourceid>FETCH-proquest_journals_30670206343</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ9EhNLMrMS1dwzKssyYAyyjNSi1J5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2MDM3MDIwMzYxJg4VQCNUypf</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3067020634</pqid></control><display><type>article</type><title>Hearing Anything Anywhere</title><source>Publicly Available Content Database</source><creator>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</creator><creatorcontrib>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</creatorcontrib><description>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acoustic properties ; Acoustics ; Computer graphics ; Computer vision ; Directivity ; Image reconstruction ; Impulse response ; Mixed reality ; Music ; Rendering ; Sound sources</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3067020634?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25751,37010,44588</link.rule.ids></links><search><creatorcontrib>Wang, Mason</creatorcontrib><creatorcontrib>Sawata, Ryosuke</creatorcontrib><creatorcontrib>Clarke, Samuel</creatorcontrib><creatorcontrib>Gao, Ruohan</creatorcontrib><creatorcontrib>Wu, Shangzhe</creatorcontrib><creatorcontrib>Wu, Jiajun</creatorcontrib><title>Hearing Anything Anywhere</title><title>arXiv.org</title><description>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</description><subject>Acoustic properties</subject><subject>Acoustics</subject><subject>Computer graphics</subject><subject>Computer vision</subject><subject>Directivity</subject><subject>Image reconstruction</subject><subject>Impulse response</subject><subject>Mixed reality</subject><subject>Music</subject><subject>Rendering</subject><subject>Sound sources</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ9EhNLMrMS1dwzKssyYAyyjNSi1J5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2MDM3MDIwMzYxJg4VQCNUypf</recordid><startdate>20240611</startdate><enddate>20240611</enddate><creator>Wang, Mason</creator><creator>Sawata, Ryosuke</creator><creator>Clarke, Samuel</creator><creator>Gao, Ruohan</creator><creator>Wu, Shangzhe</creator><creator>Wu, Jiajun</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240611</creationdate><title>Hearing Anything Anywhere</title><author>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30670206343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acoustic properties</topic><topic>Acoustics</topic><topic>Computer graphics</topic><topic>Computer vision</topic><topic>Directivity</topic><topic>Image reconstruction</topic><topic>Impulse response</topic><topic>Mixed reality</topic><topic>Music</topic><topic>Rendering</topic><topic>Sound sources</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Mason</creatorcontrib><creatorcontrib>Sawata, Ryosuke</creatorcontrib><creatorcontrib>Clarke, Samuel</creatorcontrib><creatorcontrib>Gao, Ruohan</creatorcontrib><creatorcontrib>Wu, Shangzhe</creatorcontrib><creatorcontrib>Wu, Jiajun</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Mason</au><au>Sawata, Ryosuke</au><au>Clarke, Samuel</au><au>Gao, Ruohan</au><au>Wu, Shangzhe</au><au>Wu, Jiajun</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Hearing Anything Anywhere</atitle><jtitle>arXiv.org</jtitle><date>2024-06-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-06
issn 2331-8422
language eng
recordid cdi_proquest_journals_3067020634
source Publicly Available Content Database
subjects Acoustic properties
Acoustics
Computer graphics
Computer vision
Directivity
Image reconstruction
Impulse response
Mixed reality
Music
Rendering
Sound sources
title Hearing Anything Anywhere
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A44%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Hearing%20Anything%20Anywhere&rft.jtitle=arXiv.org&rft.au=Wang,%20Mason&rft.date=2024-06-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3067020634%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30670206343%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3067020634&rft_id=info:pmid/&rfr_iscdi=true