Loading…

A Gloss-Free Sign Language Production with Discrete Representation

Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. Previous autoregressive SLP methods have not fully achieved true autoregression, as they often depend on ground-truth data during infere...

Full description

Saved in:
Bibliographic Details
Main Authors: Hwang, Eui Jun, Lee, Huije, Park, Jong C.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. Previous autoregressive SLP methods have not fully achieved true autoregression, as they often depend on ground-truth data during inference. To fill this gap, we introduce Sign language Vector Quantization Network (SignVQNet), leveraging discrete spatio-temporal representations of sign poses. With such a discrete representation, our method incorporates beam search, a decoding strategy widely used in Natural Language Processing. Furthermore, we align the discrete representation with linguistic features from pre-trained language models such as BERT. Our results show the superior performance of our method over prior SLP methods in generating accurate and realistic sign pose sequences. Additionally, our analysis shows that the reliability of Back-Translation and Fréchet Gesture Distance as evaluation metrics, in contrast to DTW-MJE. The code and models are available at https://github.com/eddie-euijun-hwang/SignVQNet.
ISSN:2770-8330
DOI:10.1109/FG59268.2024.10581980