Loading…
When Less is More: Mining Infrequent Events from Medium Sized Datasets
Data Science has assembled researchers from multiple fields, such as computer science and marketing, to develop methods, algorithms and techniques to discover knowledge from large amounts of data and solve complex problems. The main focus is to find patterns hidden within the data. In general, these...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Data Science has assembled researchers from multiple fields, such as computer science and marketing, to develop methods, algorithms and techniques to discover knowledge from large amounts of data and solve complex problems. The main focus is to find patterns hidden within the data. In general, these patterns must appear frequently to be learned using machine algorithms and reflect the standard, even if tacit, knowledge that guides decisions in the world. The uncovered knowledge may also reflect societal prejudice that will be enforced by the machines. Nevertheless, in many domains, specially when the negative impacts of bad decisions have a high cost, a few instances of patterns within the dataset suffice to warrant further investigation. Considering that decisions will be based on knowledge learned from the data, the challenge lies in determining when fewer appearances of a sequence of events are more important than very frequent patterns. In this context, we have devised an approach that uses a domain ontology to boost these infrequent, but relevant, events. In addition to guiding the search for relevant knowledge, the ontology helps users accept results and further investigate the data. It enables users to create data subsets and views, deriving new attributes from existing ones, guiding the data mining process, and providing a background layer from which even not so frequent patterns stand out and become meaningful. In this paper, we present our ontology-based data discovery approach, a system developed according to it, and preliminary results of a real life application in the oil production domain. |
---|---|
ISSN: | 2577-1655 |
DOI: | 10.1109/SMC.2018.00046 |