Loading…
WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web
Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 593 |
container_issue | |
container_start_page | 590 |
container_title | |
container_volume | |
creator | Jan, Yi-Wei Tsay, Jyh-Jong Wu, Bo-Liang |
description | Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses. |
doi_str_mv | 10.1109/WI.2005.167 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_1517914</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1517914</ieee_id><sourcerecordid>31695717</sourcerecordid><originalsourceid>FETCH-LOGICAL-a270t-e0fe5499063cb1165eeaf7a732deae0d492f426f77dce8acfe51cff522ce4f8b3</originalsourceid><addsrcrecordid>eNqNj0FLxDAQhQMqqOuePHoVD0LrTNIkzVGWVQsLHlTqLaTpBKqtXZvdg__eLPUH-C5zeB_D-xi7RMgRwdzVVc4BZI5KH7Fz0MpIXqB8P2XLGD8gRRilSzhjx3X1sr5gJ8H1kZZ_d8HeHtavq6ds8_xYre43meMadhlBIFkYA0r4BlFJIhe004K35AjawvBQcBW0bj2VzicafQiSc09FKBuxYDfz3-00fu8p7uzQRU99775o3EcrMA3VqBN4NYMdEdnt1A1u-rEoURssUns7t84PthnHz2gR7EHc1pU9iNskbpupo5Dg63_A4hfX-FXL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31695717</pqid></control><display><type>conference_proceeding</type><title>WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jan, Yi-Wei ; Tsay, Jyh-Jong ; Wu, Bo-Liang</creator><creatorcontrib>Jan, Yi-Wei ; Tsay, Jyh-Jong ; Wu, Bo-Liang</creatorcontrib><description>Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.</description><identifier>ISBN: 076952415X</identifier><identifier>ISBN: 9780769524153</identifier><identifier>DOI: 10.1109/WI.2005.167</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Applied computing -- Document management and text processing ; Classification tree analysis ; Computer science ; Crawlers ; Data mining ; Education ; HTML ; Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces ; Information systems -- Information retrieval ; Information systems -- Information storage systems ; Information systems -- Information systems applications -- Data mining ; Search engines ; Web pages ; Web sites</subject><ispartof>IEEE/WIC/ACM International Conference on web intelligence, 2005, p.590-593</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1517914$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,4048,4049,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1517914$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jan, Yi-Wei</creatorcontrib><creatorcontrib>Tsay, Jyh-Jong</creatorcontrib><creatorcontrib>Wu, Bo-Liang</creatorcontrib><title>WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web</title><title>IEEE/WIC/ACM International Conference on web intelligence</title><addtitle>WI</addtitle><description>Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.</description><subject>Applied computing -- Document management and text processing</subject><subject>Classification tree analysis</subject><subject>Computer science</subject><subject>Crawlers</subject><subject>Data mining</subject><subject>Education</subject><subject>HTML</subject><subject>Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces</subject><subject>Information systems -- Information retrieval</subject><subject>Information systems -- Information storage systems</subject><subject>Information systems -- Information systems applications -- Data mining</subject><subject>Search engines</subject><subject>Web pages</subject><subject>Web sites</subject><isbn>076952415X</isbn><isbn>9780769524153</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNj0FLxDAQhQMqqOuePHoVD0LrTNIkzVGWVQsLHlTqLaTpBKqtXZvdg__eLPUH-C5zeB_D-xi7RMgRwdzVVc4BZI5KH7Fz0MpIXqB8P2XLGD8gRRilSzhjx3X1sr5gJ8H1kZZ_d8HeHtavq6ds8_xYre43meMadhlBIFkYA0r4BlFJIhe004K35AjawvBQcBW0bj2VzicafQiSc09FKBuxYDfz3-00fu8p7uzQRU99775o3EcrMA3VqBN4NYMdEdnt1A1u-rEoURssUns7t84PthnHz2gR7EHc1pU9iNskbpupo5Dg63_A4hfX-FXL</recordid><startdate>20050919</startdate><enddate>20050919</enddate><creator>Jan, Yi-Wei</creator><creator>Tsay, Jyh-Jong</creator><creator>Wu, Bo-Liang</creator><general>IEEE Computer Society</general><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050919</creationdate><title>WISE</title><author>Jan, Yi-Wei ; Tsay, Jyh-Jong ; Wu, Bo-Liang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a270t-e0fe5499063cb1165eeaf7a732deae0d492f426f77dce8acfe51cff522ce4f8b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied computing -- Document management and text processing</topic><topic>Classification tree analysis</topic><topic>Computer science</topic><topic>Crawlers</topic><topic>Data mining</topic><topic>Education</topic><topic>HTML</topic><topic>Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces</topic><topic>Information systems -- Information retrieval</topic><topic>Information systems -- Information storage systems</topic><topic>Information systems -- Information systems applications -- Data mining</topic><topic>Search engines</topic><topic>Web pages</topic><topic>Web sites</topic><toplevel>online_resources</toplevel><creatorcontrib>Jan, Yi-Wei</creatorcontrib><creatorcontrib>Tsay, Jyh-Jong</creatorcontrib><creatorcontrib>Wu, Bo-Liang</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jan, Yi-Wei</au><au>Tsay, Jyh-Jong</au><au>Wu, Bo-Liang</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web</atitle><btitle>IEEE/WIC/ACM International Conference on web intelligence</btitle><stitle>WI</stitle><date>2005-09-19</date><risdate>2005</risdate><spage>590</spage><epage>593</epage><pages>590-593</pages><isbn>076952415X</isbn><isbn>9780769524153</isbn><abstract>Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.1109/WI.2005.167</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 076952415X |
ispartof | IEEE/WIC/ACM International Conference on web intelligence, 2005, p.590-593 |
issn | |
language | eng |
recordid | cdi_ieee_primary_1517914 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Applied computing -- Document management and text processing Classification tree analysis Computer science Crawlers Data mining Education HTML Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces Information systems -- Information retrieval Information systems -- Information storage systems Information systems -- Information systems applications -- Data mining Search engines Web pages Web sites |
title | WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T17%3A16%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=WISE:%20A%20Visual%20Tool%20for%20Automatic%20Extraction%20of%20Objects%20from%20World%20Wide%20Web&rft.btitle=IEEE/WIC/ACM%20International%20Conference%20on%20web%20intelligence&rft.au=Jan,%20Yi-Wei&rft.date=2005-09-19&rft.spage=590&rft.epage=593&rft.pages=590-593&rft.isbn=076952415X&rft.isbn_list=9780769524153&rft_id=info:doi/10.1109/WI.2005.167&rft_dat=%3Cproquest_6IE%3E31695717%3C/proquest_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a270t-e0fe5499063cb1165eeaf7a732deae0d492f426f77dce8acfe51cff522ce4f8b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=31695717&rft_id=info:pmid/&rft_ieee_id=1517914&rfr_iscdi=true |