Loading…

Applying Time Series for Background User Identification Based on Their Text Data Analysis

An approach to user identification based on deviations of their topic trends in operation with text information is presented. An approach is proposed to solve this problem; the approach implies topic analysis of the user’s past trends (behavior) in operation with text content of various (including c...

Full description

Saved in:
Bibliographic Details
Published in:Programming and computer software 2018-09, Vol.44 (5), p.353-362
Main Authors: Korolev, V. Yu, Korchagin, A. Yu, Mashechkin, I. V., Petrovskii, M. I., Tsarev, D. V.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An approach to user identification based on deviations of their topic trends in operation with text information is presented. An approach is proposed to solve this problem; the approach implies topic analysis of the user’s past trends (behavior) in operation with text content of various (including confidential) categories and forecast of their future behavior. The topic analysis of user’s operation implies determining the principal topics of their text content and calculating their respective weights at the given instants. Deviations in the behavior in the user’s operation with the content from the forecast are used to identify this user. In the framework of this approach, our own original time series forecasting method is proposed based on orthogonal non-negative matrix factorization (ONMF). Note that ONMF has not been used to solve time series forecasting problems before. The experimental research held on the example of real-world corporate emailing formed out of the Enron data set showed the proposed user identification approach to be applicable.
ISSN:0361-7688
1608-3261
DOI:10.1134/S0361768818050055