Loading…

H‐type indices with applications in chemometrics II: h‐outlyingness index

An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two question...

Full description

Saved in:
Bibliographic Details
Published in:Journal of chemometrics 2021-11, Vol.35 (11), p.n/a
Main Authors: Yang, Qin, Xu, Lu, Tian, Guo‐Li, Wu, Ben‐Qing
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c2545-2ab14a8ba76c5947389fc7a76eec147d0a65005c8e0be334dd547cb65231712a3
container_end_page n/a
container_issue 11
container_start_page
container_title Journal of chemometrics
container_volume 35
creator Yang, Qin
Xu, Lu
Tian, Guo‐Li
Wu, Ben‐Qing
description An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two questions, the h‐outlyingness index (HOI) is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. For applications, HOI was used for outlier diagnosis in simulated and real data sets, and the results were compared with those obtained by some robust statistical methods. Compared with the traditional methods, HOI gained similar results. For high‐dimensional data, it was wise to compute HOI based on dimension reduction methods such as principal component analysis (PCA). HOI was demonstrated to be a simple, easy‐to‐compute, robust and effective index for outlier diagnosis. Moreover, HOI is a nonparametric method that has no underlying assumptions on data distribution, which will be useful in chemometrics for multivariate outlier diagnosis. The h‐outlyingness index (HOI) is described to perform outlier detection. HOI is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. The investigation results demonstrate that HOI is a simple, nonparametric, robust, and effective index for outlier diagnosis in chemometrics.
doi_str_mv 10.1002/cem.3375
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2599262824</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2599262824</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2545-2ab14a8ba76c5947389fc7a76eec147d0a65005c8e0be334dd547cb65231712a3</originalsourceid><addsrcrecordid>eNp10M1KAzEQB_AgCtYq-AgLXrxszecm8Sal2oLFi4K3kM1Obcp-udlS9-Yj-Iw-ian16mkY5jcz8EfokuAJwZjeOKgmjElxhEYEa50Sql6P0QgrlaWaKXaKzkLYYBxnjI_Qcv79-dUPLSS-LryDkOx8v05s25be2d43dYiTxK2hairoO-9CsljcJuu41mz7cvD1Ww1hjwr4OEcnK1sGuPirY_RyP3ueztPHp4fF9O4xdVRwkVKbE25VbmXmhOaSKb1yMnYAjnBZYJsJjIVTgHNgjBeF4NLlmaCMSEItG6Orw922a963EHqzabZdHV8aKrSmGVWUR3V9UK5rQuhgZdrOV7YbDMFmH5aJYZl9WJGmB7rzJQz_OjOdLX_9D4-dbE0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2599262824</pqid></control><display><type>article</type><title>H‐type indices with applications in chemometrics II: h‐outlyingness index</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Yang, Qin ; Xu, Lu ; Tian, Guo‐Li ; Wu, Ben‐Qing</creator><creatorcontrib>Yang, Qin ; Xu, Lu ; Tian, Guo‐Li ; Wu, Ben‐Qing</creatorcontrib><description>An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two questions, the h‐outlyingness index (HOI) is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. For applications, HOI was used for outlier diagnosis in simulated and real data sets, and the results were compared with those obtained by some robust statistical methods. Compared with the traditional methods, HOI gained similar results. For high‐dimensional data, it was wise to compute HOI based on dimension reduction methods such as principal component analysis (PCA). HOI was demonstrated to be a simple, easy‐to‐compute, robust and effective index for outlier diagnosis. Moreover, HOI is a nonparametric method that has no underlying assumptions on data distribution, which will be useful in chemometrics for multivariate outlier diagnosis. The h‐outlyingness index (HOI) is described to perform outlier detection. HOI is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. The investigation results demonstrate that HOI is a simple, nonparametric, robust, and effective index for outlier diagnosis in chemometrics.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.3375</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Chemometrics ; Data points ; Datasets ; Diagnosis ; h‐index ; h‐outlyingness index (HOI) ; outlier diagnosis ; Outliers (statistics) ; Principal components analysis ; Questions ; robust statistics ; Robustness ; Statistical methods</subject><ispartof>Journal of chemometrics, 2021-11, Vol.35 (11), p.n/a</ispartof><rights>2021 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2545-2ab14a8ba76c5947389fc7a76eec147d0a65005c8e0be334dd547cb65231712a3</cites><orcidid>0000-0003-4742-5623</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Yang, Qin</creatorcontrib><creatorcontrib>Xu, Lu</creatorcontrib><creatorcontrib>Tian, Guo‐Li</creatorcontrib><creatorcontrib>Wu, Ben‐Qing</creatorcontrib><title>H‐type indices with applications in chemometrics II: h‐outlyingness index</title><title>Journal of chemometrics</title><description>An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two questions, the h‐outlyingness index (HOI) is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. For applications, HOI was used for outlier diagnosis in simulated and real data sets, and the results were compared with those obtained by some robust statistical methods. Compared with the traditional methods, HOI gained similar results. For high‐dimensional data, it was wise to compute HOI based on dimension reduction methods such as principal component analysis (PCA). HOI was demonstrated to be a simple, easy‐to‐compute, robust and effective index for outlier diagnosis. Moreover, HOI is a nonparametric method that has no underlying assumptions on data distribution, which will be useful in chemometrics for multivariate outlier diagnosis. The h‐outlyingness index (HOI) is described to perform outlier detection. HOI is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. The investigation results demonstrate that HOI is a simple, nonparametric, robust, and effective index for outlier diagnosis in chemometrics.</description><subject>Chemometrics</subject><subject>Data points</subject><subject>Datasets</subject><subject>Diagnosis</subject><subject>h‐index</subject><subject>h‐outlyingness index (HOI)</subject><subject>outlier diagnosis</subject><subject>Outliers (statistics)</subject><subject>Principal components analysis</subject><subject>Questions</subject><subject>robust statistics</subject><subject>Robustness</subject><subject>Statistical methods</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp10M1KAzEQB_AgCtYq-AgLXrxszecm8Sal2oLFi4K3kM1Obcp-udlS9-Yj-Iw-ian16mkY5jcz8EfokuAJwZjeOKgmjElxhEYEa50Sql6P0QgrlaWaKXaKzkLYYBxnjI_Qcv79-dUPLSS-LryDkOx8v05s25be2d43dYiTxK2hairoO-9CsljcJuu41mz7cvD1Ww1hjwr4OEcnK1sGuPirY_RyP3ueztPHp4fF9O4xdVRwkVKbE25VbmXmhOaSKb1yMnYAjnBZYJsJjIVTgHNgjBeF4NLlmaCMSEItG6Orw922a963EHqzabZdHV8aKrSmGVWUR3V9UK5rQuhgZdrOV7YbDMFmH5aJYZl9WJGmB7rzJQz_OjOdLX_9D4-dbE0</recordid><startdate>202111</startdate><enddate>202111</enddate><creator>Yang, Qin</creator><creator>Xu, Lu</creator><creator>Tian, Guo‐Li</creator><creator>Wu, Ben‐Qing</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4742-5623</orcidid></search><sort><creationdate>202111</creationdate><title>H‐type indices with applications in chemometrics II: h‐outlyingness index</title><author>Yang, Qin ; Xu, Lu ; Tian, Guo‐Li ; Wu, Ben‐Qing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2545-2ab14a8ba76c5947389fc7a76eec147d0a65005c8e0be334dd547cb65231712a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Chemometrics</topic><topic>Data points</topic><topic>Datasets</topic><topic>Diagnosis</topic><topic>h‐index</topic><topic>h‐outlyingness index (HOI)</topic><topic>outlier diagnosis</topic><topic>Outliers (statistics)</topic><topic>Principal components analysis</topic><topic>Questions</topic><topic>robust statistics</topic><topic>Robustness</topic><topic>Statistical methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Qin</creatorcontrib><creatorcontrib>Xu, Lu</creatorcontrib><creatorcontrib>Tian, Guo‐Li</creatorcontrib><creatorcontrib>Wu, Ben‐Qing</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Qin</au><au>Xu, Lu</au><au>Tian, Guo‐Li</au><au>Wu, Ben‐Qing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>H‐type indices with applications in chemometrics II: h‐outlyingness index</atitle><jtitle>Journal of chemometrics</jtitle><date>2021-11</date><risdate>2021</risdate><volume>35</volume><issue>11</issue><epage>n/a</epage><issn>0886-9383</issn><eissn>1099-128X</eissn><abstract>An outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two questions, the h‐outlyingness index (HOI) is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. For applications, HOI was used for outlier diagnosis in simulated and real data sets, and the results were compared with those obtained by some robust statistical methods. Compared with the traditional methods, HOI gained similar results. For high‐dimensional data, it was wise to compute HOI based on dimension reduction methods such as principal component analysis (PCA). HOI was demonstrated to be a simple, easy‐to‐compute, robust and effective index for outlier diagnosis. Moreover, HOI is a nonparametric method that has no underlying assumptions on data distribution, which will be useful in chemometrics for multivariate outlier diagnosis. The h‐outlyingness index (HOI) is described to perform outlier detection. HOI is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. The investigation results demonstrate that HOI is a simple, nonparametric, robust, and effective index for outlier diagnosis in chemometrics.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cem.3375</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0003-4742-5623</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0886-9383
ispartof Journal of chemometrics, 2021-11, Vol.35 (11), p.n/a
issn 0886-9383
1099-128X
language eng
recordid cdi_proquest_journals_2599262824
source Wiley-Blackwell Read & Publish Collection
subjects Chemometrics
Data points
Datasets
Diagnosis
h‐index
h‐outlyingness index (HOI)
outlier diagnosis
Outliers (statistics)
Principal components analysis
Questions
robust statistics
Robustness
Statistical methods
title H‐type indices with applications in chemometrics II: h‐outlyingness index
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T15%3A01%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=H%E2%80%90type%20indices%20with%20applications%20in%20chemometrics%20II:%20h%E2%80%90outlyingness%20index&rft.jtitle=Journal%20of%20chemometrics&rft.au=Yang,%20Qin&rft.date=2021-11&rft.volume=35&rft.issue=11&rft.epage=n/a&rft.issn=0886-9383&rft.eissn=1099-128X&rft_id=info:doi/10.1002/cem.3375&rft_dat=%3Cproquest_cross%3E2599262824%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2545-2ab14a8ba76c5947389fc7a76eec147d0a65005c8e0be334dd547cb65231712a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2599262824&rft_id=info:pmid/&rfr_iscdi=true