Loading…
A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix
Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies...
Saved in:
Published in: | Expert systems 2023-05, Vol.40 (4), p.n/a |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953 |
---|---|
cites | cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953 |
container_end_page | n/a |
container_issue | 4 |
container_start_page | |
container_title | Expert systems |
container_volume | 40 |
creator | Rajiv, S Navaneethan, C |
description | Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks. |
doi_str_mv | 10.1111/exsy.12993 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2800398928</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2800398928</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</originalsourceid><addsrcrecordid>eNp9kM1OwzAMxyMEEmNw4QkqcUPqSJo2bY7TNGDSEAd2gFOUZO7olDUlWdl24xF4Rp6EdOWMJcuy_fOH_ghdEzwiwe5g7w8jknBOT9CApKyIMeXpKRrghLE4zRN8ji68X2OMSZ6zATLjyLcNuM_KwzIyIF1d1aufr28lu4JsGmelfo9K64LrtivuQEXayZ0J5LExs0-LqPVdujJWSRNpG1ZYrVvnoNYQbeTWVftLdFZK4-HqLw7R4n66mDzG8-eH2WQ8jzXFhMZlQZYqK7NUSs21JBnhBBhAwSTNSxrSnBIOkDMqS5WlS6oopVqpIsWYZ3SIbvq14fePFvxWrG3r6nBRJAUOghQ8KQJ121PaWe8dlKJx1Ua6gyBYdGKKTkxxFDPApId3lYHDP6SYvr689TO_IMt6lg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2800398928</pqid></control><display><type>article</type><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><source>Business Source Ultimate【Trial: -2024/12/31】【Remote access available】</source><source>Wiley</source><creator>Rajiv, S ; Navaneethan, C</creator><creatorcontrib>Rajiv, S ; Navaneethan, C</creatorcontrib><description>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.12993</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Algorithms ; Artificial neural networks ; focused crawler ; GloVe ; Information retrieval ; Internet of medical things ; Machine learning ; Mathematical analysis ; Search algorithms ; Search engines ; Search strategies ; Supervised learning ; Support vector machines ; word embedding</subject><ispartof>Expert systems, 2023-05, Vol.40 (4), p.n/a</ispartof><rights>2022 John Wiley & Sons Ltd.</rights><rights>2023 John Wiley & Sons, Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</citedby><cites>FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</cites><orcidid>0000-0001-6678-8556 ; 0000-0001-5079-058X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Rajiv, S</creatorcontrib><creatorcontrib>Navaneethan, C</creatorcontrib><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><title>Expert systems</title><description>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>focused crawler</subject><subject>GloVe</subject><subject>Information retrieval</subject><subject>Internet of medical things</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Search algorithms</subject><subject>Search engines</subject><subject>Search strategies</subject><subject>Supervised learning</subject><subject>Support vector machines</subject><subject>word embedding</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAMxyMEEmNw4QkqcUPqSJo2bY7TNGDSEAd2gFOUZO7olDUlWdl24xF4Rp6EdOWMJcuy_fOH_ghdEzwiwe5g7w8jknBOT9CApKyIMeXpKRrghLE4zRN8ji68X2OMSZ6zATLjyLcNuM_KwzIyIF1d1aufr28lu4JsGmelfo9K64LrtivuQEXayZ0J5LExs0-LqPVdujJWSRNpG1ZYrVvnoNYQbeTWVftLdFZK4-HqLw7R4n66mDzG8-eH2WQ8jzXFhMZlQZYqK7NUSs21JBnhBBhAwSTNSxrSnBIOkDMqS5WlS6oopVqpIsWYZ3SIbvq14fePFvxWrG3r6nBRJAUOghQ8KQJ121PaWe8dlKJx1Ua6gyBYdGKKTkxxFDPApId3lYHDP6SYvr689TO_IMt6lg</recordid><startdate>202305</startdate><enddate>202305</enddate><creator>Rajiv, S</creator><creator>Navaneethan, C</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6678-8556</orcidid><orcidid>https://orcid.org/0000-0001-5079-058X</orcidid></search><sort><creationdate>202305</creationdate><title>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</title><author>Rajiv, S ; Navaneethan, C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>focused crawler</topic><topic>GloVe</topic><topic>Information retrieval</topic><topic>Internet of medical things</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Search algorithms</topic><topic>Search engines</topic><topic>Search strategies</topic><topic>Supervised learning</topic><topic>Support vector machines</topic><topic>word embedding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rajiv, S</creatorcontrib><creatorcontrib>Navaneethan, C</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rajiv, S</au><au>Navaneethan, C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix</atitle><jtitle>Expert systems</jtitle><date>2023-05</date><risdate>2023</risdate><volume>40</volume><issue>4</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Irrelevant search results for a given topic end up wasting search engine users' time. A learning focused web crawler downloads relevant URLs for a given topic using machine‐learning algorithms. The dynamic nature of the web is a challenge in related computation for focused web crawlers. Studies have shown that the learning focused crawler utilizes term frequency‐inverse document frequency (TF‐IDF) to compute the relevance between a web page and a given topic. The TF‐IDF detects similarity of the given topic to its co‐occurrence on the web page. The necessity of efficient mechanism to compute the relevance of URLs syntactically and semantically has led to the proposal of this paper with a word embedding approach to compute the relevance of the web page. The global vector representation cosine similarity is calculated between a topic and the web page contents. The calculated cosine similarity is provided as input to the trained random forest classifier to predict the relevancy of the web page. The evaluation results proved that the proposed crawler produced an average hrate of 0.41 and prate of 0.59, which outperformed other learning‐focused crawlers on support vector machines, Naive Bayes and artificial neural networks.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.12993</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-6678-8556</orcidid><orcidid>https://orcid.org/0000-0001-5079-058X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0266-4720 |
ispartof | Expert systems, 2023-05, Vol.40 (4), p.n/a |
issn | 0266-4720 1468-0394 |
language | eng |
recordid | cdi_proquest_journals_2800398928 |
source | Business Source Ultimate【Trial: -2024/12/31】【Remote access available】; Wiley |
subjects | Algorithms Artificial neural networks focused crawler GloVe Information retrieval Internet of medical things Machine learning Mathematical analysis Search algorithms Search engines Search strategies Supervised learning Support vector machines word embedding |
title | A supervised learning‐based approach for focused web crawling for IoMT using global co‐occurrence matrix |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T00%3A09%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20supervised%20learning%E2%80%90based%20approach%20for%20focused%20web%20crawling%20for%20IoMT%20using%20global%20co%E2%80%90occurrence%20matrix&rft.jtitle=Expert%20systems&rft.au=Rajiv,%20S&rft.date=2023-05&rft.volume=40&rft.issue=4&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.12993&rft_dat=%3Cproquest_cross%3E2800398928%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3013-f81db5f54aac9ca15191e6ee86a37f35197319ee763afb54d3b333cbb8400953%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2800398928&rft_id=info:pmid/&rfr_iscdi=true |