Loading…
PTE: Prompt tuning with ensemble verbalizers
Prompt tuning has achieved remarkable success in facilitating the performance of Pre-trained Language Models (PLMs) across various downstream NLP tasks, particularly in scenarios with limited downstream data. Reframing tasks as fill-in-the-blank questions represents an effective approach within prom...
Saved in:
Published in: | Expert systems with applications 2025-03, Vol.262, p.125600, Article 125600 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Prompt tuning has achieved remarkable success in facilitating the performance of Pre-trained Language Models (PLMs) across various downstream NLP tasks, particularly in scenarios with limited downstream data. Reframing tasks as fill-in-the-blank questions represents an effective approach within prompt tuning. However, this approach necessitates the mapping of labels through a verbalizer consisting of one or more label tokens, constrained by manually crafted prompts. Furthermore, most existing automatic crafting methods either introduce external resources or rely solely on discrete or continuous optimization strategies. To address this issue, we have proposed a methodology for optimizing discrete verbalizers based on gradient descent, which we refer to this approach as PTE. This method integrates discrete tokens into verbalizers that can be continuously optimized, combining the distinct advantages of both discrete and continuous optimization strategies. In contrast to prior approaches, ours eschews reliance on prompts generated by other models or prior knowledge, merely augmenting a matrix. This approach boasts remarkable simplicity and flexibility, enabling prompt optimization while preserving the interpretability of output label tokens without constraints imposed by discrete vocabularies. Finally, employing this method in text classification tasks, we observe that PTE achieves results comparable to, if not surpassing, previous methods even under extreme conciseness. This furnishes a simple, intuitive, and efficient solution for automatically constructing verbalizers. Moreover, through quantitative analysis of optimized verbalizers, we uncover that language models likely rely not only on semantic information but also on other features for text classification. This revelation unveils new avenues for future research and model enhancements. |
---|---|
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2024.125600 |