Loading…

SLR: A million-scale comprehensive crossword dataset for simultaneous learning and reasoning

The progress of the natural language understanding (NLU) community has put forward higher demands for the knowledge reserve and reasoning ability of the model. However, existing schemes for improving model capabilities often split them into two separate tasks. The task of enabling models to learn an...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) 2023-10, Vol.554, p.126591, Article 126591
Main Authors:	Wang, Chao, Zhu, Tinghui, Li, Zhixu, Liu, Jingping
Format:	Article
Language:	English
Subjects:	Crossword puzzle Knowledge reasoning Language model Open-domain question answering
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The progress of the natural language understanding (NLU) community has put forward higher demands for the knowledge reserve and reasoning ability of the model. However, existing schemes for improving model capabilities often split them into two separate tasks. The task of enabling models to learn and reason about knowledge simultaneously has not received sufficient attention. In this paper, we propose a novel crossword-based NLU task that imparts knowledge information to a model by solving crossword clues and simultaneously trains the model to infer new knowledge from existing knowledge. To this end, we construct a comprehensive crossword dataset SLR containing more than 4 million unique clue-answer pairs. Compared to existing crossword datasets, SLR is more comprehensive and contains linguistic knowledge, expertise in various fields, and commonsense knowledge. Meanwhile, to evaluate the reasoning ability of the model, we design clever details in the reasoning of the answers, most clues require the solver to reason through two or more pieces of knowledge to arrive at an answer. We analyze the composition of the dataset and the similarities and differences of the various types of clues via sampling and consider various data partitioning methods to enhance the generalization ability of the training set. Furthermore, we test the performance of several different advanced models and methods on this dataset and analyze the strengths and weaknesses of each. An interesting conclusion is that even powerful language models perform poorly on tasks that require reasoning.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2023.126591