Loading…
Hearing Anything Anywhere
Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to o...
Saved in:
Published in: | arXiv.org 2024-06 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Wang, Mason Sawata, Ryosuke Clarke, Samuel Gao, Ruohan Wu, Shangzhe Wu, Jiajun |
description | Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3067020634</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3067020634</sourcerecordid><originalsourceid>FETCH-proquest_journals_30670206343</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ9EhNLMrMS1dwzKssyYAyyjNSi1J5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2MDM3MDIwMzYxJg4VQCNUypf</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3067020634</pqid></control><display><type>article</type><title>Hearing Anything Anywhere</title><source>Publicly Available Content Database</source><creator>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</creator><creatorcontrib>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</creatorcontrib><description>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acoustic properties ; Acoustics ; Computer graphics ; Computer vision ; Directivity ; Image reconstruction ; Impulse response ; Mixed reality ; Music ; Rendering ; Sound sources</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3067020634?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25751,37010,44588</link.rule.ids></links><search><creatorcontrib>Wang, Mason</creatorcontrib><creatorcontrib>Sawata, Ryosuke</creatorcontrib><creatorcontrib>Clarke, Samuel</creatorcontrib><creatorcontrib>Gao, Ruohan</creatorcontrib><creatorcontrib>Wu, Shangzhe</creatorcontrib><creatorcontrib>Wu, Jiajun</creatorcontrib><title>Hearing Anything Anywhere</title><title>arXiv.org</title><description>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</description><subject>Acoustic properties</subject><subject>Acoustics</subject><subject>Computer graphics</subject><subject>Computer vision</subject><subject>Directivity</subject><subject>Image reconstruction</subject><subject>Impulse response</subject><subject>Mixed reality</subject><subject>Music</subject><subject>Rendering</subject><subject>Sound sources</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSQ9EhNLMrMS1dwzKssyYAyyjNSi1J5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2MDM3MDIwMzYxJg4VQCNUypf</recordid><startdate>20240611</startdate><enddate>20240611</enddate><creator>Wang, Mason</creator><creator>Sawata, Ryosuke</creator><creator>Clarke, Samuel</creator><creator>Gao, Ruohan</creator><creator>Wu, Shangzhe</creator><creator>Wu, Jiajun</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240611</creationdate><title>Hearing Anything Anywhere</title><author>Wang, Mason ; Sawata, Ryosuke ; Clarke, Samuel ; Gao, Ruohan ; Wu, Shangzhe ; Wu, Jiajun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30670206343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acoustic properties</topic><topic>Acoustics</topic><topic>Computer graphics</topic><topic>Computer vision</topic><topic>Directivity</topic><topic>Image reconstruction</topic><topic>Impulse response</topic><topic>Mixed reality</topic><topic>Music</topic><topic>Rendering</topic><topic>Sound sources</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Mason</creatorcontrib><creatorcontrib>Sawata, Ryosuke</creatorcontrib><creatorcontrib>Clarke, Samuel</creatorcontrib><creatorcontrib>Gao, Ruohan</creatorcontrib><creatorcontrib>Wu, Shangzhe</creatorcontrib><creatorcontrib>Wu, Jiajun</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Mason</au><au>Sawata, Ryosuke</au><au>Clarke, Samuel</au><au>Gao, Ruohan</au><au>Wu, Shangzhe</au><au>Wu, Jiajun</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Hearing Anything Anywhere</atitle><jtitle>arXiv.org</jtitle><date>2024-06-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through the space with any source audio. To evaluate our method, we collect a dataset of RIR recordings and music in four diverse, real environments. We show that our model outperforms state-ofthe-art baselines on rendering monaural and binaural RIRs and music at unseen locations, and learns physically interpretable parameters characterizing acoustic properties of the sound source and surfaces in the scene.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3067020634 |
source | Publicly Available Content Database |
subjects | Acoustic properties Acoustics Computer graphics Computer vision Directivity Image reconstruction Impulse response Mixed reality Music Rendering Sound sources |
title | Hearing Anything Anywhere |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A44%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Hearing%20Anything%20Anywhere&rft.jtitle=arXiv.org&rft.au=Wang,%20Mason&rft.date=2024-06-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3067020634%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30670206343%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3067020634&rft_id=info:pmid/&rfr_iscdi=true |