Loading…

A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix

Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems 2023-05, Vol.40 (4), p.n/a
Main Authors:	Rajiv, S, Navaneethan, C
Format:	Article
Language:	English
Subjects:	Algorithms Artificial neural networks focused crawler GloVe Information retrieval Internet of medical things Machine learning Mathematical analysis Search algorithms Search engines Search strategies Supervised learning Support vector machines word embedding
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953
cites	cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953
container_end_page	n/a
container_issue	4
container_start_page
container_title	Expert systems
container_volume	40
creator	Rajiv, S Navaneethan, C
description	Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.
doi_str_mv	10.1111/exsy.12993
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2800398928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2800398928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</originalsourceid><addsrcrecordid>eNp9kM1OwzAMxyMEEmNw4QkqcUPqSJo2bY7TNGDSEAd2gFOUZO7olDUlWdl24xF4Rp6EdOWMJcuy_fOH_ghdEzwiwe5g7w8jknBOT9CApKyIMeXpKRrghLE4zRN8ji68X2OMSZ6zATLjyLcNuM_KwzIyIF1d1aufr28lu4JsGmelfo9K64LrtivuQEXayZ0J5LExs0-LqPVdujJWSRNpG1ZYrVvnoNYQbeTWVftLdFZK4-HqLw7R4n66mDzG8-eH2WQ8jzXFhMZlQZYqK7NUSs21JBnhBBhAwSTNSxrSnBIOkDMqS5WlS6oopVqpIsWYZ3SIbvq14fePFvxWrG3r6nBRJAUOghQ8KQJ121PaWe8dlKJx1Ua6gyBYdGKKTkxxFDPApId3lYHDP6SYvr689TO_IMt6lg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2800398928</pqid></control><display><type>article</type><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><source>Business Source Ultimate【Trial: -2024/12/31】【Remote access available】</source><source>Wiley</source><creator>Rajiv, S ; Navaneethan, C</creator><creatorcontrib>Rajiv, S ; Navaneethan, C</creatorcontrib><description>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.12993</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Algorithms ; Artificial neural networks ; focused crawler ; GloVe ; Information retrieval ; Internet of medical things ; Machine learning ; Mathematical analysis ; Search algorithms ; Search engines ; Search strategies ; Supervised learning ; Support vector machines ; word embedding</subject><ispartof>Expert systems, 2023-05, Vol.40 (4), p.n/a</ispartof><rights>2022 John Wiley & Sons Ltd.</rights><rights>2023 John Wiley & Sons, Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</citedby><cites>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</cites><orcidid>0000-0001-6678-8556 ; 0000-0001-5079-058X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Rajiv, S</creatorcontrib><creatorcontrib>Navaneethan, C</creatorcontrib><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><title>Expert systems</title><description>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>focused crawler</subject><subject>GloVe</subject><subject>Information retrieval</subject><subject>Internet of medical things</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Search algorithms</subject><subject>Search engines</subject><subject>Search strategies</subject><subject>Supervised learning</subject><subject>Support vector machines</subject><subject>word embedding</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAMxyMEEmNw4QkqcUPqSJo2bY7TNGDSEAd2gFOUZO7olDUlWdl24xF4Rp6EdOWMJcuy_fOH_ghdEzwiwe5g7w8jknBOT9CApKyIMeXpKRrghLE4zRN8ji68X2OMSZ6zATLjyLcNuM_KwzIyIF1d1aufr28lu4JsGmelfo9K64LrtivuQEXayZ0J5LExs0-LqPVdujJWSRNpG1ZYrVvnoNYQbeTWVftLdFZK4-HqLw7R4n66mDzG8-eH2WQ8jzXFhMZlQZYqK7NUSs21JBnhBBhAwSTNSxrSnBIOkDMqS5WlS6oopVqpIsWYZ3SIbvq14fePFvxWrG3r6nBRJAUOghQ8KQJ121PaWe8dlKJx1Ua6gyBYdGKKTkxxFDPApId3lYHDP6SYvr689TO_IMt6lg</recordid><startdate>202305</startdate><enddate>202305</enddate><creator>Rajiv, S</creator><creator>Navaneethan, C</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6678-8556</orcidid><orcidid>https://orcid.org/0000-0001-5079-058X</orcidid></search><sort><creationdate>202305</creationdate><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><author>Rajiv, S ; Navaneethan, C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>focused crawler</topic><topic>GloVe</topic><topic>Information retrieval</topic><topic>Internet of medical things</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Search algorithms</topic><topic>Search engines</topic><topic>Search strategies</topic><topic>Supervised learning</topic><topic>Support vector machines</topic><topic>word embedding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rajiv, S</creatorcontrib><creatorcontrib>Navaneethan, C</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rajiv, S</au><au>Navaneethan, C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</atitle><jtitle>Expert systems</jtitle><date>2023-05</date><risdate>2023</risdate><volume>40</volume><issue>4</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.12993</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-6678-8556</orcidid><orcidid>https://orcid.org/0000-0001-5079-058X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0266-4720
ispartof	Expert systems, 2023-05, Vol.40 (4), p.n/a
issn	0266-4720 1468-0394
language	eng
recordid	cdi_proquest_journals_2800398928
source	Business Source Ultimate【Trial: -2024/12/31】【Remote access available】; Wiley
subjects	Algorithms Artificial neural networks focused crawler GloVe Information retrieval Internet of medical things Machine learning Mathematical analysis Search algorithms Search engines Search strategies Supervised learning Support vector machines word embedding
title	A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T00%3A09%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20supervised%20learning%E2%80%90based%20approach%20for%20focused%20web%20crawling%20for%20IoMT%20using%20global%20co%E2%80%90occurrence%20matrix&rft.jtitle=Expert%20systems&rft.au=Rajiv,%20S&rft.date=2023-05&rft.volume=40&rft.issue=4&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.12993&rft_dat=%3Cproquest_cross%3E2800398928%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2800398928&rft_id=info:pmid/&rfr_iscdi=true