Loading…
Efficient human motion prediction using temporal convolutional generative adversarial network
•We exploit TCNs to efficiently model the long-term temporal dependencies of human motion sequences.•We incorporate SN into the model to achieve reproducibility.•Two discriminators are introduced to ensure the better performance.•On the 3 human action benchmarks, our model outperforms the state-of-t...
Saved in:
Published in: | Information sciences 2021-02, Vol.545, p.427-447 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •We exploit TCNs to efficiently model the long-term temporal dependencies of human motion sequences.•We incorporate SN into the model to achieve reproducibility.•Two discriminators are introduced to ensure the better performance.•On the 3 human action benchmarks, our model outperforms the state-of-the-art methods.
Human motion prediction from its historical poses is an essential task in computer vision; it is successfully applied for human-machine interaction and intelligent driving. Recently, significant progress has been made with variants of RNNs or LSTMs. Despite alleviating the vanishing gradient problem, the chain RNN often leads to deformities and convergence to the mean pose because of its low ability to capture long-term dependencies. To address these problems, in this paper, we propose a temporal convolutional generative adversarial network (TCGAN) to forecast high-fidelity future poses. The TCGAN uses hierarchical temporal convolution to model the long-term patterns of human motion effectively. In contrast to RNNs, the hierarchical convolution structure has recently proved to be a more efficient method for sequence-to-sequence learning in computational complexity, the number of model parameters, and parallelism. Besides, instead of traditional GANs, spectral normalization (SN) is embedded in the model to alleviate mode collapse. Compared with typical recurrent methods, the proposed model is feedforward and can produce the future poses in real-time. Extensive experiments on various human activity analysis benchmarks (i.e., H3.6M, CMU, and 3DPW MoCap) demonstrate that the model consistently outperforms the state-of-the-art methods in terms of accuracy and visualization for short-term and long-term predictions. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2020.08.123 |