Loading…
Scene-Constrained Neural Radiance Fields for High-Quality Sports Scene Rendering Based on Visual Sensor Network
Free-viewpoint videos offer audiences a more immersive and liberated way to watch sports. The rendering of sports scenes encompasses two essential elements: dynamic targets and static scenes. While much current research focuses on achieving high-quality rendering of human bodies, rendering large-sca...
Saved in:
Published in: | IEEE sensors journal 2024-11, Vol.24 (21), p.35900-35913 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Free-viewpoint videos offer audiences a more immersive and liberated way to watch sports. The rendering of sports scenes encompasses two essential elements: dynamic targets and static scenes. While much current research focuses on achieving high-quality rendering of human bodies, rendering large-scale sports scenes presents various challenges. Sports arenas are characterized by large spatial extents, restricted camera placement, uncontrollable lighting, weak textures, and repetitive patterns, all of which pose significant obstacles to achieving high-quality scene rendering. In this work, we propose a neural radiance field rendering method based on scene-prior geometric constraints. We introduce prior 3-D geometric dimensions and 2-D semantic masks, to derive high-precision ground plane depth maps from camera imaging parameters. This is a geometry-based method that does not rely on visual features, and thus, it is unaffected by insufficient textures, repetition, and reflections. Subsequently, we apply ground depth maps as geometric consistency constraints to optimize the neural rendering network, thereby reducing the impact of color inconsistencies across viewpoints. The visual sensor network we build can synchronously capture static fields and dynamic targets in sports scenes. Based on the visual sensor network, we collected multiviewpoint datasets of large-scale sports scenes at Invengo and Xidian Gymnasium for performance evaluation. Experimental results demonstrate that our method can generate high-precision and cross-viewpoint scale-consistent depth constraints and helps reduce holes and artifacts in synthesized views. Our method outperforms the state of the art (SOTA) for novel view rendering for challenging large-scale sports scenes. |
---|---|
ISSN: | 1530-437X 1558-1748 |
DOI: | 10.1109/JSEN.2024.3452436 |