Loading…

A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix

Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems 2023-05, Vol.40 (4), p.n/a
Main Authors: Rajiv, S, Navaneethan, C
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953
cites cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953
container_end_page n/a
container_issue 4
container_start_page
container_title Expert systems
container_volume 40
creator Rajiv, S
Navaneethan, C
description Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.
doi_str_mv 10.1111/exsy.12993
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2800398928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2800398928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</originalsourceid><addsrcrecordid>eNp9kM1OwzAMxyMEEmNw4QkqcUPqSJo2bY7TNGDSEAd2gFOUZO7olDUlWdl24xF4Rp6EdOWMJcuy_fOH_ghdEzwiwe5g7w8jknBOT9CApKyIMeXpKRrghLE4zRN8ji68X2OMSZ6zATLjyLcNuM_KwzIyIF1d1aufr28lu4JsGmelfo9K64LrtivuQEXayZ0J5LExs0-LqPVdujJWSRNpG1ZYrVvnoNYQbeTWVftLdFZK4-HqLw7R4n66mDzG8-eH2WQ8jzXFhMZlQZYqK7NUSs21JBnhBBhAwSTNSxrSnBIOkDMqS5WlS6oopVqpIsWYZ3SIbvq14fePFvxWrG3r6nBRJAUOghQ8KQJ121PaWe8dlKJx1Ua6gyBYdGKKTkxxFDPApId3lYHDP6SYvr689TO_IMt6lg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2800398928</pqid></control><display><type>article</type><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><source>Business Source Ultimate【Trial: -2024/12/31】【Remote access available】</source><source>Wiley</source><creator>Rajiv, S ; Navaneethan, C</creator><creatorcontrib>Rajiv, S ; Navaneethan, C</creatorcontrib><description>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.12993</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Algorithms ; Artificial neural networks ; focused crawler ; GloVe ; Information retrieval ; Internet of medical things ; Machine learning ; Mathematical analysis ; Search algorithms ; Search engines ; Search strategies ; Supervised learning ; Support vector machines ; word embedding</subject><ispartof>Expert systems, 2023-05, Vol.40 (4), p.n/a</ispartof><rights>2022 John Wiley &amp; Sons Ltd.</rights><rights>2023 John Wiley &amp; Sons, Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</citedby><cites>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</cites><orcidid>0000-0001-6678-8556 ; 0000-0001-5079-058X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Rajiv, S</creatorcontrib><creatorcontrib>Navaneethan, C</creatorcontrib><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><title>Expert systems</title><description>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>focused crawler</subject><subject>GloVe</subject><subject>Information retrieval</subject><subject>Internet of medical things</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Search algorithms</subject><subject>Search engines</subject><subject>Search strategies</subject><subject>Supervised learning</subject><subject>Support vector machines</subject><subject>word embedding</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAMxyMEEmNw4QkqcUPqSJo2bY7TNGDSEAd2gFOUZO7olDUlWdl24xF4Rp6EdOWMJcuy_fOH_ghdEzwiwe5g7w8jknBOT9CApKyIMeXpKRrghLE4zRN8ji68X2OMSZ6zATLjyLcNuM_KwzIyIF1d1aufr28lu4JsGmelfo9K64LrtivuQEXayZ0J5LExs0-LqPVdujJWSRNpG1ZYrVvnoNYQbeTWVftLdFZK4-HqLw7R4n66mDzG8-eH2WQ8jzXFhMZlQZYqK7NUSs21JBnhBBhAwSTNSxrSnBIOkDMqS5WlS6oopVqpIsWYZ3SIbvq14fePFvxWrG3r6nBRJAUOghQ8KQJ121PaWe8dlKJx1Ua6gyBYdGKKTkxxFDPApId3lYHDP6SYvr689TO_IMt6lg</recordid><startdate>202305</startdate><enddate>202305</enddate><creator>Rajiv, S</creator><creator>Navaneethan, C</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6678-8556</orcidid><orcidid>https://orcid.org/0000-0001-5079-058X</orcidid></search><sort><creationdate>202305</creationdate><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><author>Rajiv, S ; Navaneethan, C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>focused crawler</topic><topic>GloVe</topic><topic>Information retrieval</topic><topic>Internet of medical things</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Search algorithms</topic><topic>Search engines</topic><topic>Search strategies</topic><topic>Supervised learning</topic><topic>Support vector machines</topic><topic>word embedding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rajiv, S</creatorcontrib><creatorcontrib>Navaneethan, C</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rajiv, S</au><au>Navaneethan, C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</atitle><jtitle>Expert systems</jtitle><date>2023-05</date><risdate>2023</risdate><volume>40</volume><issue>4</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.12993</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-6678-8556</orcidid><orcidid>https://orcid.org/0000-0001-5079-058X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0266-4720
ispartof Expert systems, 2023-05, Vol.40 (4), p.n/a
issn 0266-4720
1468-0394
language eng
recordid cdi_proquest_journals_2800398928
source Business Source Ultimate【Trial: -2024/12/31】【Remote access available】; Wiley
subjects Algorithms
Artificial neural networks
focused crawler
GloVe
Information retrieval
Internet of medical things
Machine learning
Mathematical analysis
Search algorithms
Search engines
Search strategies
Supervised learning
Support vector machines
word embedding
title A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T00%3A09%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20supervised%20learning%E2%80%90based%20approach%20for%20focused%20web%20crawling%20for%20IoMT%20using%20global%20co%E2%80%90occurrence%20matrix&rft.jtitle=Expert%20systems&rft.au=Rajiv,%20S&rft.date=2023-05&rft.volume=40&rft.issue=4&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.12993&rft_dat=%3Cproquest_cross%3E2800398928%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2800398928&rft_id=info:pmid/&rfr_iscdi=true