Loading…
A Gloss-Free Sign Language Production with Discrete Representation
Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. Previous autoregressive SLP methods have not fully achieved true autoregression, as they often depend on ground-truth data during infere...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. Previous autoregressive SLP methods have not fully achieved true autoregression, as they often depend on ground-truth data during inference. To fill this gap, we introduce Sign language Vector Quantization Network (SignVQNet), leveraging discrete spatio-temporal representations of sign poses. With such a discrete representation, our method incorporates beam search, a decoding strategy widely used in Natural Language Processing. Furthermore, we align the discrete representation with linguistic features from pre-trained language models such as BERT. Our results show the superior performance of our method over prior SLP methods in generating accurate and realistic sign pose sequences. Additionally, our analysis shows that the reliability of Back-Translation and Fréchet Gesture Distance as evaluation metrics, in contrast to DTW-MJE. The code and models are available at https://github.com/eddie-euijun-hwang/SignVQNet. |
---|---|
ISSN: | 2770-8330 |
DOI: | 10.1109/FG59268.2024.10581980 |