Loading…

Lightweight Text Spotting for Interactive User Experience in Mixed Reality

We propose a machine learning-aided semantic understanding framework of surrounding scenes for intelligent human-computer interaction in mixed reality (MR). The proposed framework perceives semantic information from the front-view camera of MR glasses with fast and accurate machine learning-based sc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chen, Xi-Wen, Chen, Jian-Yu, Lin, Yu-Kai, Huang, Chih-Wei, Chern, Jann-Long
Format:	Conference Proceeding
Language:	English
Subjects:	Augmented reality client-server model Computational modeling Computer architecture Glass Mixed reality Semantics text spotting Virtual reality Wearable computers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We propose a machine learning-aided semantic understanding framework of surrounding scenes for intelligent human-computer interaction in mixed reality (MR). The proposed framework perceives semantic information from the front-view camera of MR glasses with fast and accurate machine learning-based scene text spotting models. Furthermore, it allows MR glasses to generate corresponding virtual objects automatically to coincide with the surrounding scenes without further user intervention. Moreover, for near real-time computing capability, scene text spotting models serve as a remote service under the client-server model in the framework to break through the computing bottleneck of wearable devices. We demonstrate the framework with Microsoft HoloLens 2, and experiment results show its feasibility in improving user experience under self-collected real-world scenarios. In addition, the proposed client-server architecture provides 0.77 seconds of computational time per frame on average, which is not only on average 11.8 times faster than the client-only architecture but also achieves near real-time computation. To investigate the usability of text spotting algorithms in real-world applications, we also compare several state-of-the-art scene text spotting approaches regarding recognition precision and computational time.
ISSN:	2158-4001
DOI:	10.1109/ICCE56470.2023.10043519