Loading…
A REPORT ON REDUCING DIMENSIONS FOR BIG DATA USING KERNEL METHODS
Big-Data is very popular word to perform huge data processing; it brings so many opportunities to the academia, industry and society. Big data hold great promise for discovery of patterns and heterogeneities which are not possible with small data. Big Data faces many challenges like unique computati...
Saved in:
Published in: | Journal of Theoretical and Applied Information Technology 2015-10, Vol.80 (2), p.296-296 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Big-Data is very popular word to perform huge data processing; it brings so many opportunities to the academia, industry and society. Big data hold great promise for discovery of patterns and heterogeneities which are not possible with small data. Big Data faces many challenges like unique computational and statistical challenges including scalability and storage. Among these challenges some maybe mentioned as noise accumulation, spurious correlation, incidental endogeneity and measurement errors. Most of the problems occur based on the size of the data associated with large number of attributes. Irrelevant attributes add noise to the data and increase the size of the model. Moreover datasets with many attributes may contain groups of data that are correlated. All these attributes may be measuring the same feature. One way of dealing with this problem is to eliminate some attributes (dimensions) which do not exhibit large variance and hence do not affect the clusters. Several techniques exist to ignore certain attributes or dimensions such as Principle component analysis (PCA), Singular Value Decomposition (SVD) etc. We review these techniques in this paper with respect to clustering. We plan to use principle component analysis and Kernel methods for Dimensionality reduction which is an essential preprocessing technique for large scale data sets. It can be used to improve both the efficiency and effectiveness of classifiers. |
---|---|
ISSN: | 1817-3195 |