Loading…

A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix

Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems 2023-05, Vol.40 (4), p.n/a
Main Authors:	Rajiv, S, Navaneethan, C
Format:	Article
Language:	English
Subjects:	Algorithms Artificial neural networks focused crawler GloVe Information retrieval Internet of medical things Machine learning Mathematical analysis Search algorithms Search engines Search strategies Supervised learning Support vector machines word embedding
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.
ISSN:	0266-4720 1468-0394
DOI:	10.1111/exsy.12993