Loading…

WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web

Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web...

Full description

Saved in:
Bibliographic Details
Main Authors: Jan, Yi-Wei, Tsay, Jyh-Jong, Wu, Bo-Liang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 593
container_issue
container_start_page 590
container_title
container_volume
creator Jan, Yi-Wei
Tsay, Jyh-Jong
Wu, Bo-Liang
description Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.
doi_str_mv 10.1109/WI.2005.167
format conference_proceeding
fullrecord <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_ieee_primary_1517914</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1517914</ieee_id><sourcerecordid>31695717</sourcerecordid><originalsourceid>FETCH-LOGICAL-a270t-e0fe5499063cb1165eeaf7a732deae0d492f426f77dce8acfe51cff522ce4f8b3</originalsourceid><addsrcrecordid>eNqNj0FLxDAQhQMqqOuePHoVD0LrTNIkzVGWVQsLHlTqLaTpBKqtXZvdg__eLPUH-C5zeB_D-xi7RMgRwdzVVc4BZI5KH7Fz0MpIXqB8P2XLGD8gRRilSzhjx3X1sr5gJ8H1kZZ_d8HeHtavq6ds8_xYre43meMadhlBIFkYA0r4BlFJIhe004K35AjawvBQcBW0bj2VzicafQiSc09FKBuxYDfz3-00fu8p7uzQRU99775o3EcrMA3VqBN4NYMdEdnt1A1u-rEoURssUns7t84PthnHz2gR7EHc1pU9iNskbpupo5Dg63_A4hfX-FXL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31695717</pqid></control><display><type>conference_proceeding</type><title>WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jan, Yi-Wei ; Tsay, Jyh-Jong ; Wu, Bo-Liang</creator><creatorcontrib>Jan, Yi-Wei ; Tsay, Jyh-Jong ; Wu, Bo-Liang</creatorcontrib><description>Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.</description><identifier>ISBN: 076952415X</identifier><identifier>ISBN: 9780769524153</identifier><identifier>DOI: 10.1109/WI.2005.167</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Applied computing -- Document management and text processing ; Classification tree analysis ; Computer science ; Crawlers ; Data mining ; Education ; HTML ; Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces ; Information systems -- Information retrieval ; Information systems -- Information storage systems ; Information systems -- Information systems applications -- Data mining ; Search engines ; Web pages ; Web sites</subject><ispartof>IEEE/WIC/ACM International Conference on web intelligence, 2005, p.590-593</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1517914$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,4048,4049,27924,54919</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1517914$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jan, Yi-Wei</creatorcontrib><creatorcontrib>Tsay, Jyh-Jong</creatorcontrib><creatorcontrib>Wu, Bo-Liang</creatorcontrib><title>WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web</title><title>IEEE/WIC/ACM International Conference on web intelligence</title><addtitle>WI</addtitle><description>Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.</description><subject>Applied computing -- Document management and text processing</subject><subject>Classification tree analysis</subject><subject>Computer science</subject><subject>Crawlers</subject><subject>Data mining</subject><subject>Education</subject><subject>HTML</subject><subject>Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces</subject><subject>Information systems -- Information retrieval</subject><subject>Information systems -- Information storage systems</subject><subject>Information systems -- Information systems applications -- Data mining</subject><subject>Search engines</subject><subject>Web pages</subject><subject>Web sites</subject><isbn>076952415X</isbn><isbn>9780769524153</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2005</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNj0FLxDAQhQMqqOuePHoVD0LrTNIkzVGWVQsLHlTqLaTpBKqtXZvdg__eLPUH-C5zeB_D-xi7RMgRwdzVVc4BZI5KH7Fz0MpIXqB8P2XLGD8gRRilSzhjx3X1sr5gJ8H1kZZ_d8HeHtavq6ds8_xYre43meMadhlBIFkYA0r4BlFJIhe004K35AjawvBQcBW0bj2VzicafQiSc09FKBuxYDfz3-00fu8p7uzQRU99775o3EcrMA3VqBN4NYMdEdnt1A1u-rEoURssUns7t84PthnHz2gR7EHc1pU9iNskbpupo5Dg63_A4hfX-FXL</recordid><startdate>20050919</startdate><enddate>20050919</enddate><creator>Jan, Yi-Wei</creator><creator>Tsay, Jyh-Jong</creator><creator>Wu, Bo-Liang</creator><general>IEEE Computer Society</general><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050919</creationdate><title>WISE</title><author>Jan, Yi-Wei ; Tsay, Jyh-Jong ; Wu, Bo-Liang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a270t-e0fe5499063cb1165eeaf7a732deae0d492f426f77dce8acfe51cff522ce4f8b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied computing -- Document management and text processing</topic><topic>Classification tree analysis</topic><topic>Computer science</topic><topic>Crawlers</topic><topic>Data mining</topic><topic>Education</topic><topic>HTML</topic><topic>Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces</topic><topic>Information systems -- Information retrieval</topic><topic>Information systems -- Information storage systems</topic><topic>Information systems -- Information systems applications -- Data mining</topic><topic>Search engines</topic><topic>Web pages</topic><topic>Web sites</topic><toplevel>online_resources</toplevel><creatorcontrib>Jan, Yi-Wei</creatorcontrib><creatorcontrib>Tsay, Jyh-Jong</creatorcontrib><creatorcontrib>Wu, Bo-Liang</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jan, Yi-Wei</au><au>Tsay, Jyh-Jong</au><au>Wu, Bo-Liang</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web</atitle><btitle>IEEE/WIC/ACM International Conference on web intelligence</btitle><stitle>WI</stitle><date>2005-09-19</date><risdate>2005</risdate><spage>590</spage><epage>593</epage><pages>590-593</pages><isbn>076952415X</isbn><isbn>9780769524153</isbn><abstract>Automatic identification and extraction of interesting objects instead of just related Web pages is useful to users of the World Wide Web. For example, an instructor teaching a Data Mining course may be interested in finding all text books, lecture notes, slides, homeworks and term projects from Web pages of all Data Mining and their related courses. In this paper, we present a tool WISE for extraction of objects from web pages, which are of interest. In WISE, objects in HTML pages are defined as nodes in an extension of DOM trees. Extraction of objects is then formulated as the problem of classification of tree nodes. We develop a mechanism for object specifications, and a specific-to-general (bottom-up) search algorithm for learning object specifications. Experiment shows that WISE works well for extracting objects such as, for example, extracting items in paper lists, and extracting lists of lecture notes in Algorithms courses.</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.1109/WI.2005.167</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 076952415X
ispartof IEEE/WIC/ACM International Conference on web intelligence, 2005, p.590-593
issn
language eng
recordid cdi_ieee_primary_1517914
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Applied computing -- Document management and text processing
Classification tree analysis
Computer science
Crawlers
Data mining
Education
HTML
Human-centered computing -- Human computer interaction (HCI) -- Interaction paradigms -- Graphical user interfaces
Information systems -- Information retrieval
Information systems -- Information storage systems
Information systems -- Information systems applications -- Data mining
Search engines
Web pages
Web sites
title WISE: A Visual Tool for Automatic Extraction of Objects from World Wide Web
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T17%3A16%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=WISE:%20A%20Visual%20Tool%20for%20Automatic%20Extraction%20of%20Objects%20from%20World%20Wide%20Web&rft.btitle=IEEE/WIC/ACM%20International%20Conference%20on%20web%20intelligence&rft.au=Jan,%20Yi-Wei&rft.date=2005-09-19&rft.spage=590&rft.epage=593&rft.pages=590-593&rft.isbn=076952415X&rft.isbn_list=9780769524153&rft_id=info:doi/10.1109/WI.2005.167&rft_dat=%3Cproquest_6IE%3E31695717%3C/proquest_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a270t-e0fe5499063cb1165eeaf7a732deae0d492f426f77dce8acfe51cff522ce4f8b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=31695717&rft_id=info:pmid/&rft_ieee_id=1517914&rfr_iscdi=true