Loading…
Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information
For audio in augmented reality (AR), knowledge of the users' real acoustic environment is crucial for rendering virtual sounds that seamlessly blend into the environment. As acoustic measurements are usually not feasible in practical AR applications, information about the room needs to be infer...
Saved in:
Published in: | arXiv.org 2024-09 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Lluís, Francesc Meyer-Kahlen, Nils |
description | For audio in augmented reality (AR), knowledge of the users' real acoustic environment is crucial for rendering virtual sounds that seamlessly blend into the environment. As acoustic measurements are usually not feasible in practical AR applications, information about the room needs to be inferred from available sound sources. Then, additional sound sources can be rendered with the same room acoustic qualities. Crucially, these are placed at different positions than the sources available for estimation. Here, we propose to use an encoder network trained using a contrastive loss that maps input sounds to a low-dimensional feature space representing only room-specific information. Then, a diffusion-based spatial room impulse response generator is trained to take the latent space and generate a new response, given a new source-receiver position. We show how both room- and position-specific parameters are considered in the final output. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3108865333</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108865333</sourcerecordid><originalsourceid>FETCH-proquest_journals_31088653333</originalsourceid><addsrcrecordid>eNqNjM0KwjAQhIMgWLTvEPAcSBNbe1b86dV6FEqoCaQ02Zi07-8qPoBz2R3mm1mQTEhZsHonxIrkKQ2cc1HtRVnKjDwOo_VP2gY1WTXSxoV5TJredArg8bloryNm4KmJ4Girg0KPBIBjVH26PTKsDbq3xva08Qai-1Y2ZGkUzuW_uybb8-l-vLIQ4TXrNHUDzNFj1MmC13VVStR_1BuOkEMP</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108865333</pqid></control><display><type>article</type><title>Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Lluís, Francesc ; Meyer-Kahlen, Nils</creator><creatorcontrib>Lluís, Francesc ; Meyer-Kahlen, Nils</creatorcontrib><description>For audio in augmented reality (AR), knowledge of the users' real acoustic environment is crucial for rendering virtual sounds that seamlessly blend into the environment. As acoustic measurements are usually not feasible in practical AR applications, information about the room needs to be inferred from available sound sources. Then, additional sound sources can be rendered with the same room acoustic qualities. Crucially, these are placed at different positions than the sources available for estimation. Here, we propose to use an encoder network trained using a contrastive loss that maps input sounds to a low-dimensional feature space representing only room-specific information. Then, a diffusion-based spatial room impulse response generator is trained to take the latent space and generate a new response, given a new source-receiver position. We show how both room- and position-specific parameters are considered in the final output.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acoustic measurement ; Audio data ; Augmented reality ; Impulse response ; Sound sources ; Virtual reality</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3108865333?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25751,37010,44588</link.rule.ids></links><search><creatorcontrib>Lluís, Francesc</creatorcontrib><creatorcontrib>Meyer-Kahlen, Nils</creatorcontrib><title>Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information</title><title>arXiv.org</title><description>For audio in augmented reality (AR), knowledge of the users' real acoustic environment is crucial for rendering virtual sounds that seamlessly blend into the environment. As acoustic measurements are usually not feasible in practical AR applications, information about the room needs to be inferred from available sound sources. Then, additional sound sources can be rendered with the same room acoustic qualities. Crucially, these are placed at different positions than the sources available for estimation. Here, we propose to use an encoder network trained using a contrastive loss that maps input sounds to a low-dimensional feature space representing only room-specific information. Then, a diffusion-based spatial room impulse response generator is trained to take the latent space and generate a new response, given a new source-receiver position. We show how both room- and position-specific parameters are considered in the final output.</description><subject>Acoustic measurement</subject><subject>Audio data</subject><subject>Augmented reality</subject><subject>Impulse response</subject><subject>Sound sources</subject><subject>Virtual reality</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjM0KwjAQhIMgWLTvEPAcSBNbe1b86dV6FEqoCaQ02Zi07-8qPoBz2R3mm1mQTEhZsHonxIrkKQ2cc1HtRVnKjDwOo_VP2gY1WTXSxoV5TJredArg8bloryNm4KmJ4Girg0KPBIBjVH26PTKsDbq3xva08Qai-1Y2ZGkUzuW_uybb8-l-vLIQ4TXrNHUDzNFj1MmC13VVStR_1BuOkEMP</recordid><startdate>20240923</startdate><enddate>20240923</enddate><creator>Lluís, Francesc</creator><creator>Meyer-Kahlen, Nils</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240923</creationdate><title>Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information</title><author>Lluís, Francesc ; Meyer-Kahlen, Nils</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31088653333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acoustic measurement</topic><topic>Audio data</topic><topic>Augmented reality</topic><topic>Impulse response</topic><topic>Sound sources</topic><topic>Virtual reality</topic><toplevel>online_resources</toplevel><creatorcontrib>Lluís, Francesc</creatorcontrib><creatorcontrib>Meyer-Kahlen, Nils</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lluís, Francesc</au><au>Meyer-Kahlen, Nils</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information</atitle><jtitle>arXiv.org</jtitle><date>2024-09-23</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>For audio in augmented reality (AR), knowledge of the users' real acoustic environment is crucial for rendering virtual sounds that seamlessly blend into the environment. As acoustic measurements are usually not feasible in practical AR applications, information about the room needs to be inferred from available sound sources. Then, additional sound sources can be rendered with the same room acoustic qualities. Crucially, these are placed at different positions than the sources available for estimation. Here, we propose to use an encoder network trained using a contrastive loss that maps input sounds to a low-dimensional feature space representing only room-specific information. Then, a diffusion-based spatial room impulse response generator is trained to take the latent space and generate a new response, given a new source-receiver position. We show how both room- and position-specific parameters are considered in the final output.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-09 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3108865333 |
source | Publicly Available Content Database (Proquest) (PQ_SDU_P3) |
subjects | Acoustic measurement Audio data Augmented reality Impulse response Sound sources Virtual reality |
title | Blind Spatial Impulse Response Generation from Separate Room- and Scene-Specific Information |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T12%3A35%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Blind%20Spatial%20Impulse%20Response%20Generation%20from%20Separate%20Room-%20and%20Scene-Specific%20Information&rft.jtitle=arXiv.org&rft.au=Llu%C3%ADs,%20Francesc&rft.date=2024-09-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3108865333%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31088653333%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3108865333&rft_id=info:pmid/&rfr_iscdi=true |