Loading…

Semantic Weighted Multi-View Clustering for Web Content

Clustering is a long-standing important research problem. However, it remains challenging when handling large-scale web data from different types of information resources such as user profile, comments, user preferences and so on. All these aspects can be seen as different views and often admit the...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2019, Vol.7, p.128097-128113
Main Authors: Gong, Xiaolong, Huang, Linpeng, Luo, Tiancheng, Ma, Zhiyi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering is a long-standing important research problem. However, it remains challenging when handling large-scale web data from different types of information resources such as user profile, comments, user preferences and so on. All these aspects can be seen as different views and often admit the same underlying clustering of the data. In this paper, we present a novel Semantic Weighted Non-negative Matrix Factorization ( SWNMF ) multi-view clustering framework, which can provide an efficient weighted matrix factorization framework, dexterously manipulate multi-view web content, and easily explore the sparseness problem in semantic space of data. Specifically, each view of dataset forming a huge sparse matrix, which results in the non-robust characteristic during the matrix decomposition process, and further influences the accuracy of clustering results. To address above problem, we attempt to use some preference information (e.g. rating values) given by the users as latent semantic information to handle those features that are unobserved in each data point so as to resolve the sparseness problem in all views matrices. To combine multiple views in our large corpus, the overall objective of our proposed SWNMF is to minimize the loss function of weighted non-negative matrix factorization (NMF) under the l_{2,1} -norm and the co-regularized constraint under the F -norm. Extensive experiments on our large-scale multi-view web datasets demonstrate the competitive performance of our solution.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2939334