Loading…
Korean football in-game conversation state tracking dataset for dialogue and turn level evaluation
Recent research in dialogue state tracking has made significant progress in tracking user goals through dialogue-level and turn-level approaches, but existing research primarily focused on predicting dialogue-level belief states. In this study, we present the KICK: Korean football In-game Conversati...
Saved in:
Published in: | Engineering applications of artificial intelligence 2025-01, Vol.139, p.109572, Article 109572 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent research in dialogue state tracking has made significant progress in tracking user goals through dialogue-level and turn-level approaches, but existing research primarily focused on predicting dialogue-level belief states. In this study, we present the KICK: Korean football In-game Conversation state tracKing dataset, which introduces a conversation-based approach. This approach leverages the roles of casters and commentators within the self-contained context of sports broadcasting to examine how utterances impact the belief state at both the dialogue-level and turn-level. Towards this end, we propose a task that aims to track the states of a specific time turn and understand conversations during the entire game. The proposed dataset comprises 228 games and 2463 events over one season, with a larger number of tokens per dialogue and turn, making it more challenging than existing datasets. Experiments revealed that the roles and interactions of casters and commentators are important for improving the zero-shot state tracking performance. By better understanding role-based utterances, we identify distinct approaches to the overall game process and events at specific turns.
•Introducing a new Korean dialogue state tracking dataset in football broadcasts.•Evaluating current large language models on this dataset with our proposed metric.•Results show models struggle with long utterances, suggesting areas for improvement. |
---|---|
ISSN: | 0952-1976 |
DOI: | 10.1016/j.engappai.2024.109572 |