Loading…

Prompting large language model with context and pre-answer for knowledge-based VQA

Existing studies apply Large Language Model (LLM) to knowledge-based Visual Question Answering (VQA) with encouraging results. Due to the insufficient input information, the previous methods still have shortcomings in constructing the prompt for LLM, and cannot fully activate the capacity of LLM. In...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2024-07, Vol.151, p.110399, Article 110399
Main Authors: Hu, Zhongjian, Yang, Peng, Jiang, Yuanshuang, Bai, Zijian
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Existing studies apply Large Language Model (LLM) to knowledge-based Visual Question Answering (VQA) with encouraging results. Due to the insufficient input information, the previous methods still have shortcomings in constructing the prompt for LLM, and cannot fully activate the capacity of LLM. In addition, previous works adopt GPT-3 for inference, which has expensive costs. In this paper, we propose PCPA: a framework that Prompts LLM with Context and Pre-Answer for VQA. Specifically, we adopt a vanilla VQA model to generate in-context examples and candidate answers, and add a pre-answer selection layer to generate pre-answers. We integrate in-context examples and pre-answers into the prompt to inspire the LLM. In addition, we choose LLaMA instead of GPT-3, which is an open and free model. We build a small dataset to fine-tune the LLM. Compared to existing baselines, the PCPA improves accuracy by more than 2.1 and 1.5 on OK-VQA and A-OKVQA, respectively. •We propose a novel framework that prompts LLM for knowledge-based VQA.•We add dynamic routing to vanilla VQA model to further inspire the LLM.•we add a pre-answer selection layer to generate more suitable pre-answers.•We build a small dataset for fine-tuning LLM.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110399