Loading…
The Evaluation Study of the Deep Learning Model Transformer in Speech Translation
Neural machine translation (NMT) employs the prevailing deep learning techniques to build a single deep neural network (DNN) that directly maps the input speech utterances of one language to the corresponding texts of the other language. Compared with the conventional statistical machine translation...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Neural machine translation (NMT) employs the prevailing deep learning techniques to build a single deep neural network (DNN) that directly maps the input speech utterances of one language to the corresponding texts of the other language. Compared with the conventional statistical machine translation, which separately optimizes each component model (such as acoustic models and language models) in series, NMT can learn the used DNN to directly maximize the overall translation performance. In particular, a novel encoder-decoder DNN structure termed transformer, which Google develops, has been applied in NMT and revealed outstanding translation performance. In this study, we investigate and evaluate the Transformer-based speech translation algorithm by varying the model settings in the training process of the used Transformer. The experiments follow a tutorial script provided in the Tensorflow forum, which are conducted on the TED talk dataset to translate Portuguese to English, which consists of 50,000 utterances for training, 1,100 utterances for validation, and 2,000 utterances for testing. The baseline system, which sets the encoding dimension as 128, the number of encoder/decoder layers as 4, the dropout rate as 0.1 and the negative exponent as −1.5, gives rise to 68.01% in translation accuracy. While the encoding dimension is increased to be 512, the translation accuracy can be promoted to be 76.02%. Also, changing the number of layers to be 2, the dropout rate to be 0.01 and the negative exponent to be 1 can achieve 70.98%, 80.97% and 75.40% in translation accuracy, respectively. The experimental results indicate that we can further improve the translation performance of the transformer by properly arranging the underlying hyper-parameters. |
---|---|
ISSN: | 2768-4156 |
DOI: | 10.1109/ICASI52993.2021.9568450 |