TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

In this paper, we propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks. These two tasks aim at reading and understanding scene text in images for question answering and image caption generation, respectively. In contrast to conventional vision-language pretraining that fails to...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Zhengyuan, Lu, Yijuan, Wang, Jianfeng, Yin, Xi, Florencio, Dinei, Wang, Lijuan, Zhang, Cha, Zhang, Lei, Luo, Jiebo
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!