Loading…
Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis
The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of dat...
Saved in:
Published in: | IEEE access 2023, Vol.11, p.103306-103318 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3 |
---|---|
cites | cdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3 |
container_end_page | 103318 |
container_issue | |
container_start_page | 103306 |
container_title | IEEE access |
container_volume | 11 |
creator | Widad, Elouataoui Saida, Elmendili Gahi, Youssef |
description | The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%. |
doi_str_mv | 10.1109/ACCESS.2023.3317354 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2023_3317354</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10256169</ieee_id><doaj_id>oai_doaj_org_article_8c35aef458c643718760bc0928921dec</doaj_id><sourcerecordid>2872459197</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</originalsourceid><addsrcrecordid>eNpNkV1PHCEYhSemJhrrL9ALkl7vysfAgHfrulYTk9qq1wSYl5V1drAwq91_L9uxjdxADuecN_BU1QnBU0KwOpvN54v7-ynFlE0ZIw3j9V51SIlQE8aZ-PLpfFAd57zCZcki8eaw2v7cmC4MWzTr49p0W3QJA7ghxB495tAv0V2CNhThFdADuKc-_N5APi92tPgzQJ93FxdhiS7NYNC_sqtk1vAW0zPyMaFf0AVjOxg9s76MySF_rfa96TIcf-xH1ePV4mF-Pbn98f1mPruduBqrYaKsNVRIC9LUzHnZWO6Vd6RuqSAWCBfCY-uBCA_SegpKSWPAyVZgDg7YUXUz9rbRrPRLCmuTtjqaoP8KMS21SUNwHWjpGDfgay6dqFlDZCOwdVhRqShpwZWub2PXS4q7fxj0Km5SeVDWVDa05oqoprjY6HIp5pzA_59KsN4h0yMyvUOmP5CV1OmYCgDwKUG5KKzYO4rok-0</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2872459197</pqid></control><display><type>article</type><title>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</title><source>IEEE Xplore Open Access Journals</source><creator>Widad, Elouataoui ; Saida, Elmendili ; Gahi, Youssef</creator><creatorcontrib>Widad, Elouataoui ; Saida, Elmendili ; Gahi, Youssef</creatorcontrib><description>The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3317354</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Anomalies ; Anomaly detection ; Big Data ; big data quality ; Data analysis ; Data integrity ; Data models ; data quality dimensions ; Datasets ; Measurement ; Organizations ; Outliers (statistics) ; quality anomaly score ; Reliability</subject><ispartof>IEEE access, 2023, Vol.11, p.103306-103318</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</citedby><cites>FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</cites><orcidid>0000-0002-2968-2389 ; 0000-0002-1938-621X ; 0000-0001-8010-9206</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10256169$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Widad, Elouataoui</creatorcontrib><creatorcontrib>Saida, Elmendili</creatorcontrib><creatorcontrib>Gahi, Youssef</creatorcontrib><title>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</title><title>IEEE access</title><addtitle>Access</addtitle><description>The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.</description><subject>Anomalies</subject><subject>Anomaly detection</subject><subject>Big Data</subject><subject>big data quality</subject><subject>Data analysis</subject><subject>Data integrity</subject><subject>Data models</subject><subject>data quality dimensions</subject><subject>Datasets</subject><subject>Measurement</subject><subject>Organizations</subject><subject>Outliers (statistics)</subject><subject>quality anomaly score</subject><subject>Reliability</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNkV1PHCEYhSemJhrrL9ALkl7vysfAgHfrulYTk9qq1wSYl5V1drAwq91_L9uxjdxADuecN_BU1QnBU0KwOpvN54v7-ynFlE0ZIw3j9V51SIlQE8aZ-PLpfFAd57zCZcki8eaw2v7cmC4MWzTr49p0W3QJA7ghxB495tAv0V2CNhThFdADuKc-_N5APi92tPgzQJ93FxdhiS7NYNC_sqtk1vAW0zPyMaFf0AVjOxg9s76MySF_rfa96TIcf-xH1ePV4mF-Pbn98f1mPruduBqrYaKsNVRIC9LUzHnZWO6Vd6RuqSAWCBfCY-uBCA_SegpKSWPAyVZgDg7YUXUz9rbRrPRLCmuTtjqaoP8KMS21SUNwHWjpGDfgay6dqFlDZCOwdVhRqShpwZWub2PXS4q7fxj0Km5SeVDWVDa05oqoprjY6HIp5pzA_59KsN4h0yMyvUOmP5CV1OmYCgDwKUG5KKzYO4rok-0</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Widad, Elouataoui</creator><creator>Saida, Elmendili</creator><creator>Gahi, Youssef</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2968-2389</orcidid><orcidid>https://orcid.org/0000-0002-1938-621X</orcidid><orcidid>https://orcid.org/0000-0001-8010-9206</orcidid></search><sort><creationdate>2023</creationdate><title>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</title><author>Widad, Elouataoui ; Saida, Elmendili ; Gahi, Youssef</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Anomalies</topic><topic>Anomaly detection</topic><topic>Big Data</topic><topic>big data quality</topic><topic>Data analysis</topic><topic>Data integrity</topic><topic>Data models</topic><topic>data quality dimensions</topic><topic>Datasets</topic><topic>Measurement</topic><topic>Organizations</topic><topic>Outliers (statistics)</topic><topic>quality anomaly score</topic><topic>Reliability</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Widad, Elouataoui</creatorcontrib><creatorcontrib>Saida, Elmendili</creatorcontrib><creatorcontrib>Gahi, Youssef</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Widad, Elouataoui</au><au>Saida, Elmendili</au><au>Gahi, Youssef</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>103306</spage><epage>103318</epage><pages>103306-103318</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The increasing reliance on Big Data analytics has highlighted the critical role of data quality in ensuring accurate and reliable results. Consequently, organizations aiming to leverage the power of Big Data recognize the crucial role of data quality as an integral component. One notable type of data quality anomaly observed in big datasets is the presence of outlier values. Detecting and addressing these outliers have become a subject of interest across diverse domains, leading to the development of numerous anomaly detection approaches. Although anomaly detection has witnessed a proliferation of practices in recent years, a significant gap remains in addressing anomalies related to the other aspects of data quality. Indeed, while most approaches focus on identifying anomalies that deviate from the expected patterns, they do not consider irregularities in data quality, such as missing, incorrect, or inconsistent data. Moreover, most of approaches are domain-correlated and lack the capability to detect anomalies in a generic manner. Thus, we aim through this paper to address this gap in the field and provide a holistic and effective solution for Big Data quality anomaly detection. To achieve this, we suggest a novel approach that allows a comprehensive detection of Big Data quality anomalies related to six quality dimensions: Accuracy, Consistency, Completeness, Conformity, Uniqueness, and Readability. Moreover, the framework allows for sophisticated detection of generic data quality anomalies through the implementation of an intelligent anomaly detection model without any correlation to a specific field. Furthermore, we introduce and measure a new metric called "Quality Anomaly Score," which refers to the degree of anomalousness of the quality anomalies of each quality dimension and the entire dataset. Through the implementation and evaluation of our framework, the suggested framework has achieved an accuracy score of up to 99.91% and an F1-score of 98.07%.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3317354</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-2968-2389</orcidid><orcidid>https://orcid.org/0000-0002-1938-621X</orcidid><orcidid>https://orcid.org/0000-0001-8010-9206</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2023, Vol.11, p.103306-103318 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2023_3317354 |
source | IEEE Xplore Open Access Journals |
subjects | Anomalies Anomaly detection Big Data big data quality Data analysis Data integrity Data models data quality dimensions Datasets Measurement Organizations Outliers (statistics) quality anomaly score Reliability |
title | Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T14%3A38%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Quality%20Anomaly%20Detection%20Using%20Predictive%20Techniques:%20An%20Extensive%20Big%20Data%20Quality%20Framework%20for%20Reliable%20Data%20Analysis&rft.jtitle=IEEE%20access&rft.au=Widad,%20Elouataoui&rft.date=2023&rft.volume=11&rft.spage=103306&rft.epage=103318&rft.pages=103306-103318&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3317354&rft_dat=%3Cproquest_cross%3E2872459197%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c409t-9bba268be8a43cf87b5f9fc14d261be1566f0bfe16fe8bf2e998aaec8d605ece3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2872459197&rft_id=info:pmid/&rft_ieee_id=10256169&rfr_iscdi=true |