Loading…

IDENTIFICATION OF OUTLIERS: A SIMULATION STUDY

This paper compares two approaches in identifying outliers in multivariate datasets; Mahalanobis distance (MD) and robust distance (RD). MD has been known suffering from masking and swamping effects and RD is an approach that was developed to overcome problems that arise in MD. There are two purpose...

Full description

Saved in:
Bibliographic Details
Published in:ARPN journal of engineering and applied sciences 2015-01, Vol.10 (1), p.326-330
Main Authors: Abd Mutalib, Sharifah Sakinah Syed, Ibrahim, Khlipah
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper compares two approaches in identifying outliers in multivariate datasets; Mahalanobis distance (MD) and robust distance (RD). MD has been known suffering from masking and swamping effects and RD is an approach that was developed to overcome problems that arise in MD. There are two purposes of this paper, first is to identify outliers using MD and RD and the second is to show that RD performs better than MD in identifying outliers. An observation is classified as an outlier if MD or RD is larger than a cut-off value. Outlier generating model is used to generate a set of data and MD and RD are computed from this set of data. The results showed that RD can identify outliers better than MD. However, in non-outliers data the performance for both approaches are similar. The results for RD also showed that RD can identify multivariate outliers much better when the number of dimension is large.
ISSN:1819-6608
1819-6608