Loading…

Clustering Based URL Normalization Technique for Web Mining

URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is pro...

Full description

Saved in:
Bibliographic Details
Main Author: Nagwani, N K
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 351
container_issue
container_start_page 349
container_title
container_volume
creator Nagwani, N K
description URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL's and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.
doi_str_mv 10.1109/ACE.2010.47
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5532806</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5532806</ieee_id><sourcerecordid>5532806</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-4de7757ca2b96ce6fdc90b0726ba1344984d37d59bfa44dac5c46b9512348dfc3</originalsourceid><addsrcrecordid>eNotjEtLw0AURkdEUGtWLt3MH0idx70zGVzVUB-QVpCIyzKv6EiaaJIu9Ncb0G9zOIvzEXLJ2ZJzZq5X5Xop2Gygj8g508ogMCzMMcmMLjgIAM0R1CnJxvGDzQMUgOqM3JTtYZzikLo3emvHGOjLc0W3_bC3bfqxU-o7Wkf_3qWvQ6RNP9DX6OgmdXNwQU4a244x--eC1HfrunzIq6f7x3JV5cmwKYcQtUbtrXBG-aia4A1zTAvlLJcApoAgdUDjGgsQrEcPyhnkQkIRGi8X5OrvNsUYd59D2tvhe4coRcGU_AUh_Ea8</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Clustering Based URL Normalization Technique for Web Mining</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Nagwani, N K</creator><creatorcontrib>Nagwani, N K</creatorcontrib><description>URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL's and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.</description><identifier>ISBN: 9781424471546</identifier><identifier>ISBN: 1424471540</identifier><identifier>EISBN: 0769540589</identifier><identifier>EISBN: 9781424471553</identifier><identifier>EISBN: 9780769540580</identifier><identifier>EISBN: 1424471559</identifier><identifier>DOI: 10.1109/ACE.2010.47</identifier><language>eng</language><publisher>IEEE</publisher><subject>Access protocols ; Clustering ; Crawlers ; Data structures ; Indexing ; Information retrieval ; Search engines ; Uniform resource locators ; URL Normalization ; Web mining ; Web Page Forward and Redirect Similarity Tree ; Web pages ; World Wide Web</subject><ispartof>2010 International Conference on Advances in Computer Engineering, 2010, p.349-351</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5532806$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27899,54892</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5532806$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nagwani, N K</creatorcontrib><title>Clustering Based URL Normalization Technique for Web Mining</title><title>2010 International Conference on Advances in Computer Engineering</title><addtitle>ACE</addtitle><description>URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL's and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.</description><subject>Access protocols</subject><subject>Clustering</subject><subject>Crawlers</subject><subject>Data structures</subject><subject>Indexing</subject><subject>Information retrieval</subject><subject>Search engines</subject><subject>Uniform resource locators</subject><subject>URL Normalization</subject><subject>Web mining</subject><subject>Web Page Forward and Redirect Similarity Tree</subject><subject>Web pages</subject><subject>World Wide Web</subject><isbn>9781424471546</isbn><isbn>1424471540</isbn><isbn>0769540589</isbn><isbn>9781424471553</isbn><isbn>9780769540580</isbn><isbn>1424471559</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjEtLw0AURkdEUGtWLt3MH0idx70zGVzVUB-QVpCIyzKv6EiaaJIu9Ncb0G9zOIvzEXLJ2ZJzZq5X5Xop2Gygj8g508ogMCzMMcmMLjgIAM0R1CnJxvGDzQMUgOqM3JTtYZzikLo3emvHGOjLc0W3_bC3bfqxU-o7Wkf_3qWvQ6RNP9DX6OgmdXNwQU4a244x--eC1HfrunzIq6f7x3JV5cmwKYcQtUbtrXBG-aia4A1zTAvlLJcApoAgdUDjGgsQrEcPyhnkQkIRGi8X5OrvNsUYd59D2tvhe4coRcGU_AUh_Ea8</recordid><startdate>201006</startdate><enddate>201006</enddate><creator>Nagwani, N K</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201006</creationdate><title>Clustering Based URL Normalization Technique for Web Mining</title><author>Nagwani, N K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-4de7757ca2b96ce6fdc90b0726ba1344984d37d59bfa44dac5c46b9512348dfc3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Access protocols</topic><topic>Clustering</topic><topic>Crawlers</topic><topic>Data structures</topic><topic>Indexing</topic><topic>Information retrieval</topic><topic>Search engines</topic><topic>Uniform resource locators</topic><topic>URL Normalization</topic><topic>Web mining</topic><topic>Web Page Forward and Redirect Similarity Tree</topic><topic>Web pages</topic><topic>World Wide Web</topic><toplevel>online_resources</toplevel><creatorcontrib>Nagwani, N K</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nagwani, N K</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Clustering Based URL Normalization Technique for Web Mining</atitle><btitle>2010 International Conference on Advances in Computer Engineering</btitle><stitle>ACE</stitle><date>2010-06</date><risdate>2010</risdate><spage>349</spage><epage>351</epage><pages>349-351</pages><isbn>9781424471546</isbn><isbn>1424471540</isbn><eisbn>0769540589</eisbn><eisbn>9781424471553</eisbn><eisbn>9780769540580</eisbn><eisbn>1424471559</eisbn><abstract>URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities. A web mining technique for URL normalization is proposed in this paper. The proposed technique is based on content, structure and semantic similarity and web page redirection and forwarding similarity of the given set of URLs. Web page redirection and forward graphs can be used to measure the similarities between the URL's and can also be used for URL clusters. The URL clusters can be used for URL normalization. A data structure is also suggested to store the forward and redirect URL information.</abstract><pub>IEEE</pub><doi>10.1109/ACE.2010.47</doi><tpages>3</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9781424471546
ispartof 2010 International Conference on Advances in Computer Engineering, 2010, p.349-351
issn
language eng
recordid cdi_ieee_primary_5532806
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Access protocols
Clustering
Crawlers
Data structures
Indexing
Information retrieval
Search engines
Uniform resource locators
URL Normalization
Web mining
Web Page Forward and Redirect Similarity Tree
Web pages
World Wide Web
title Clustering Based URL Normalization Technique for Web Mining
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-05T02%3A39%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Clustering%20Based%20URL%20Normalization%20Technique%20for%20Web%20Mining&rft.btitle=2010%20International%20Conference%20on%20Advances%20in%20Computer%20Engineering&rft.au=Nagwani,%20N%20K&rft.date=2010-06&rft.spage=349&rft.epage=351&rft.pages=349-351&rft.isbn=9781424471546&rft.isbn_list=1424471540&rft_id=info:doi/10.1109/ACE.2010.47&rft.eisbn=0769540589&rft.eisbn_list=9781424471553&rft.eisbn_list=9780769540580&rft.eisbn_list=1424471559&rft_dat=%3Cieee_6IE%3E5532806%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-4de7757ca2b96ce6fdc90b0726ba1344984d37d59bfa44dac5c46b9512348dfc3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5532806&rfr_iscdi=true