Loading…

Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis

The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of dat...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2023, Vol.11, p.103306-103318
Main Authors: Widad, Elouataoui, Saida, Elmendili, Gahi, Youssef
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3
cites cdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3
container_end_page 103318
container_issue
container_start_page 103306
container_title IEEE access
container_volume 11
creator Widad, Elouataoui
Saida, Elmendili
Gahi, Youssef
description The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.
doi_str_mv 10.1109/ACCESS.2023.3317354
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2023_3317354</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10256169</ieee_id><doaj_id>oai_doaj_org_article_8c35aef458c643718760bc0928921dec</doaj_id><sourcerecordid>2872459197</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</originalsourceid><addsrcrecordid>eNpNkV1PHCEYhSemJhrrL9ALkl7vysfAgHfrulYTk9qq1wSYl5V1drAwq91_L9uxjdxADuecN_BU1QnBU0KwOpvN54v7-ynFlE0ZIw3j9V51SIlQE8aZ-PLpfFAd57zCZcki8eaw2v7cmC4MWzTr49p0W3QJA7ghxB495tAv0V2CNhThFdADuKc-_N5APi92tPgzQJ93FxdhiS7NYNC_sqtk1vAW0zPyMaFf0AVjOxg9s76MySF_rfa96TIcf-xH1ePV4mF-Pbn98f1mPruduBqrYaKsNVRIC9LUzHnZWO6Vd6RuqSAWCBfCY-uBCA_SegpKSWPAyVZgDg7YUXUz9rbRrPRLCmuTtjqaoP8KMS21SUNwHWjpGDfgay6dqFlDZCOwdVhRqShpwZWub2PXS4q7fxj0Km5SeVDWVDa05oqoprjY6HIp5pzA_59KsN4h0yMyvUOmP5CV1OmYCgDwKUG5KKzYO4rok-0</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2872459197</pqid></control><display><type>article</type><title>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</title><source>IEEE Xplore Open Access Journals</source><creator>Widad, Elouataoui ; Saida, Elmendili ; Gahi, Youssef</creator><creatorcontrib>Widad, Elouataoui ; Saida, Elmendili ; Gahi, Youssef</creatorcontrib><description>The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3317354</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Anomalies ; Anomaly detection ; Big Data ; big data quality ; Data analysis ; Data integrity ; Data models ; data quality dimensions ; Datasets ; Measurement ; Organizations ; Outliers (statistics) ; quality anomaly score ; Reliability</subject><ispartof>IEEE access, 2023, Vol.11, p.103306-103318</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</citedby><cites>FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</cites><orcidid>0000-0002-2968-2389 ; 0000-0002-1938-621X ; 0000-0001-8010-9206</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10256169$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Widad, Elouataoui</creatorcontrib><creatorcontrib>Saida, Elmendili</creatorcontrib><creatorcontrib>Gahi, Youssef</creatorcontrib><title>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</title><title>IEEE access</title><addtitle>Access</addtitle><description>The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.</description><subject>Anomalies</subject><subject>Anomaly detection</subject><subject>Big Data</subject><subject>big data quality</subject><subject>Data analysis</subject><subject>Data integrity</subject><subject>Data models</subject><subject>data quality dimensions</subject><subject>Datasets</subject><subject>Measurement</subject><subject>Organizations</subject><subject>Outliers (statistics)</subject><subject>quality anomaly score</subject><subject>Reliability</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNkV1PHCEYhSemJhrrL9ALkl7vysfAgHfrulYTk9qq1wSYl5V1drAwq91_L9uxjdxADuecN_BU1QnBU0KwOpvN54v7-ynFlE0ZIw3j9V51SIlQE8aZ-PLpfFAd57zCZcki8eaw2v7cmC4MWzTr49p0W3QJA7ghxB495tAv0V2CNhThFdADuKc-_N5APi92tPgzQJ93FxdhiS7NYNC_sqtk1vAW0zPyMaFf0AVjOxg9s76MySF_rfa96TIcf-xH1ePV4mF-Pbn98f1mPruduBqrYaKsNVRIC9LUzHnZWO6Vd6RuqSAWCBfCY-uBCA_SegpKSWPAyVZgDg7YUXUz9rbRrPRLCmuTtjqaoP8KMS21SUNwHWjpGDfgay6dqFlDZCOwdVhRqShpwZWub2PXS4q7fxj0Km5SeVDWVDa05oqoprjY6HIp5pzA_59KsN4h0yMyvUOmP5CV1OmYCgDwKUG5KKzYO4rok-0</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Widad, Elouataoui</creator><creator>Saida, Elmendili</creator><creator>Gahi, Youssef</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2968-2389</orcidid><orcidid>https://orcid.org/0000-0002-1938-621X</orcidid><orcidid>https://orcid.org/0000-0001-8010-9206</orcidid></search><sort><creationdate>2023</creationdate><title>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</title><author>Widad, Elouataoui ; Saida, Elmendili ; Gahi, Youssef</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Anomalies</topic><topic>Anomaly detection</topic><topic>Big Data</topic><topic>big data quality</topic><topic>Data analysis</topic><topic>Data integrity</topic><topic>Data models</topic><topic>data quality dimensions</topic><topic>Datasets</topic><topic>Measurement</topic><topic>Organizations</topic><topic>Outliers (statistics)</topic><topic>quality anomaly score</topic><topic>Reliability</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Widad, Elouataoui</creatorcontrib><creatorcontrib>Saida, Elmendili</creatorcontrib><creatorcontrib>Gahi, Youssef</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Widad, Elouataoui</au><au>Saida, Elmendili</au><au>Gahi, Youssef</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>103306</spage><epage>103318</epage><pages>103306-103318</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3317354</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-2968-2389</orcidid><orcidid>https://orcid.org/0000-0002-1938-621X</orcidid><orcidid>https://orcid.org/0000-0001-8010-9206</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023, Vol.11, p.103306-103318
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2023_3317354
source IEEE Xplore Open Access Journals
subjects Anomalies
Anomaly detection
Big Data
big data quality
Data analysis
Data integrity
Data models
data quality dimensions
Datasets
Measurement
Organizations
Outliers (statistics)
quality anomaly score
Reliability
title Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T14%3A38%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Quality%20Anomaly%20Detection%20Using%20Predictive%20Techniques:%20An%20Extensive%20Big%20Data%20Quality%20Framework%20for%20Reliable%20Data%20Analysis&rft.jtitle=IEEE%20access&rft.au=Widad,%20Elouataoui&rft.date=2023&rft.volume=11&rft.spage=103306&rft.epage=103318&rft.pages=103306-103318&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3317354&rft_dat=%3Cproquest_cross%3E2872459197%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2872459197&rft_id=info:pmid/&rft_ieee_id=10256169&rfr_iscdi=true