Loading…

A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production

This work addresses the challenges associated with the use of glosses in both Sign Language Translation (SLT) and Sign Language Production (SLP). While glosses have long been used as a bridge between sign language and spoken language, they come with two major limitations that impede the advancement...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-12
Main Authors:	Hwang, Eui Jun, Cho, Sukmin, Lee, Huije, Yoon, Youngwoo, Park, Jong C
Format:	Article
Language:	English
Subjects:	Annotations Effectiveness Gloss Language translation Representations Self-supervised learning Sign language
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This work addresses the challenges associated with the use of glosses in both Sign Language Translation (SLT) and Sign Language Production (SLP). While glosses have long been used as a bridge between sign language and spoken language, they come with two major limitations that impede the advancement of sign language systems. First, annotating the glosses is a labor-intensive and time-consuming process, which limits the scalability of datasets. Second, the glosses oversimplify sign language by stripping away its spatio-temporal dynamics, reducing complex signs to basic labels and missing the subtle movements essential for precise interpretation. To address these limitations, we introduce Universal Gloss-level Representation (UniGloR), a framework designed to capture the spatio-temporal features inherent in sign language, providing a more dynamic and detailed alternative to the use of the glosses. The core idea of UniGloR is simple yet effective: We derive dense spatio-temporal representations from sign keypoint sequences using self-supervised learning and seamlessly integrate them into SLT and SLP tasks. Our experiments in a keypoint-based setting demonstrate that UniGloR either outperforms or matches the performance of previous SLT and SLP methods on two widely-used datasets: PHOENIX14T and How2Sign.
ISSN:	2331-8422