Loading…

Anytime mining of sequential discriminative patterns in labeled sequences

It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of p...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge and information systems 2021-02, Vol.63 (2), p.439-476
Main Authors:	Mathonat, Romain, Nurbakova, Diana, Boulicaut, Jean-François, Kaytoue, Mehdi
Format:	Article
Language:	English
Subjects:	Algorithms Artificial Intelligence Computer Science Data mining Data Mining and Knowledge Discovery Database Management Datasets Information Storage and Retrieval Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Multi-armed bandit problems Performance prediction Regular Paper Sampling Searching Subgroups
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of patterns covering sets of objects having interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for both transactional and numerical data, discovering subgroups within labeled sequential data has been much less studied. First, we propose an anytime algorithm SeqScout that discovers interesting subgroups w.r.t. a chosen quality measure. This is a sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. For a given budget, it finds a collection of local optima in the search space of descriptions and thus, subgroups. It requires a light configuration and is independent from the quality measure used for pattern scoring. We also introduce a second anytime algorithm MCTSExtent that pushes further the idea of a better trade-off between exploration and exploitation of a sampling strategy over the search space. To the best of our knowledge, this is the first time that the Monte Carlo Tree Search framework is exploited in a sequential data mining setting. We have conducted a thorough and comprehensive evaluation of our algorithms on several datasets to illustrate their added value, and we discuss their qualitative and quantitative results.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-020-01523-7