Loading…

Online URL Classification for Large-Scale Streaming Environments

Large-scale streaming URLs are the norm in many commercial software products that aim to filter URLs based on their sensitivity or risk level. In such problem scenarios, filtering is typically done by classifying a URL using either its webpage content or certain additional contextual information. Ho...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE intelligent systems 2017-03, Vol.32 (2), p.31-36
Main Authors:	Singh, Neetu, Chaudhari, Narendra S., Singh, Nidhi
Format:	Article
Language:	English
Subjects:	applications Artificial intelligence Big data Classification Computational modeling Computing methodologies Datasets Distance learning Expert systems Filtering software Filtration intelligent systems Internet Mathematical model pattern recognition Predictive models Training Uniform resource locators URLs
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Large-scale streaming URLs are the norm in many commercial software products that aim to filter URLs based on their sensitivity or risk level. In such problem scenarios, filtering is typically done by classifying a URL using either its webpage content or certain additional contextual information. However, such approaches are slow and computationally expensive, as they require gathering and processing webpage content or other contextual information for each URL. In this work, the authors propose a method for classifying URLs in large-scale streaming environments that doesn't suffer from these drawbacks. The proposed method is based on online ensemble learning, which results in lightweight prediction models that are well-suited for classification of streaming datasets. The authors illustrate the effectiveness of the proposed approach using large-scale datasets from a live, production environment and show that the proposed method results in an increase of 5 to 8 percent in terms of precision and 3 to 5.5 percent in terms of recall.
ISSN:	1541-1672 1941-1294
DOI:	10.1109/MIS.2017.39