Loading…
SLR: A million-scale comprehensive crossword dataset for simultaneous learning and reasoning
The progress of the natural language understanding (NLU) community has put forward higher demands for the knowledge reserve and reasoning ability of the model. However, existing schemes for improving model capabilities often split them into two separate tasks. The task of enabling models to learn an...
Saved in:
Published in: | Neurocomputing (Amsterdam) 2023-10, Vol.554, p.126591, Article 126591 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The progress of the natural language understanding (NLU) community has put forward higher demands for the knowledge reserve and reasoning ability of the model. However, existing schemes for improving model capabilities often split them into two separate tasks. The task of enabling models to learn and reason about knowledge simultaneously has not received sufficient attention. In this paper, we propose a novel crossword-based NLU task that imparts knowledge information to a model by solving crossword clues and simultaneously trains the model to infer new knowledge from existing knowledge. To this end, we construct a comprehensive crossword dataset SLR containing more than 4 million unique clue-answer pairs. Compared to existing crossword datasets, SLR is more comprehensive and contains linguistic knowledge, expertise in various fields, and commonsense knowledge. Meanwhile, to evaluate the reasoning ability of the model, we design clever details in the reasoning of the answers, most clues require the solver to reason through two or more pieces of knowledge to arrive at an answer. We analyze the composition of the dataset and the similarities and differences of the various types of clues via sampling and consider various data partitioning methods to enhance the generalization ability of the training set. Furthermore, we test the performance of several different advanced models and methods on this dataset and analyze the strengths and weaknesses of each. An interesting conclusion is that even powerful language models perform poorly on tasks that require reasoning. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2023.126591 |