Loading…

A classification method for Web information extraction

Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bas...

Full description

Saved in:
Bibliographic Details
Published in:Wuhan University journal of natural sciences 2004-09, Vol.9 (5), p.823-827
Main Authors: Li, Xiang-Yang, Zhang, Ya-Fei, Lu, Jian-Jiang, Xu, Bao-Wen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3
cites cdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3
container_end_page 827
container_issue 5
container_start_page 823
container_title Wuhan University journal of natural sciences
container_volume 9
creator Li, Xiang-Yang
Zhang, Ya-Fei
Lu, Jian-Jiang
Xu, Bao-Wen
description Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.
doi_str_mv 10.1007/BF02831688
format article
fullrecord <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_whdxxb_e200405052</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><wanfj_id>whdxxb_e200405052</wanfj_id><sourcerecordid>whdxxb_e200405052</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</originalsourceid><addsrcrecordid>eNpdkN1LwzAUxYMoOKcv_gUFQVCo3qT5aB51OBUGvig-hjS9dR39mEnH5n9vRkXBp3vg_Djncgg5p3BDAdTt_RxYnlGZ5wdkQrXOUq51fhh1dFPKgB2TkxBWAJkWik6IvEtcY0Ooq9rZoe67pMVh2ZdJ1fvkHYuk7qJqRwt3g7duL0_JUWWbgGc_d0re5g-vs6d08fL4PLtbpI5KzdKKIwrFndXKWoRCWycFFUxKDnpPaE6VYNohR5CUltJKl1lKFRZlgS6bkqsxd2u7ynYfZtVvfBcbzXZZ7naFQQbAQYBgkb0c2bXvPzcYBtPWwWHT2A77TTAsV8BzARG8-Af-pjKVS82z-EukrkfK-T4Ej5VZ-7q1_stQMPs5zd_Y2TffG27y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2786943061</pqid></control><display><type>article</type><title>A classification method for Web information extraction</title><source>Springer Nature - Connect here FIRST to enable access</source><creator>Li, Xiang-Yang ; Zhang, Ya-Fei ; Lu, Jian-Jiang ; Xu, Bao-Wen</creator><creatorcontrib>Li, Xiang-Yang ; Zhang, Ya-Fei ; Lu, Jian-Jiang ; Xu, Bao-Wen</creatorcontrib><description>Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.</description><identifier>ISSN: 1007-1202</identifier><identifier>EISSN: 1993-4998</identifier><identifier>DOI: 10.1007/BF02831688</identifier><language>eng</language><publisher>Heidelberg: Springer Nature B.V</publisher><subject>Classification ; Fragments ; Information processing ; Information retrieval ; Webs</subject><ispartof>Wuhan University journal of natural sciences, 2004-09, Vol.9 (5), p.823-827</ispartof><rights>Springer 2004.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</citedby><cites>FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/whdxxb-e/whdxxb-e.jpg</thumbnail><link.rule.ids>314,780,784,1644,27924,27925</link.rule.ids></links><search><creatorcontrib>Li, Xiang-Yang</creatorcontrib><creatorcontrib>Zhang, Ya-Fei</creatorcontrib><creatorcontrib>Lu, Jian-Jiang</creatorcontrib><creatorcontrib>Xu, Bao-Wen</creatorcontrib><title>A classification method for Web information extraction</title><title>Wuhan University journal of natural sciences</title><description>Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.</description><subject>Classification</subject><subject>Fragments</subject><subject>Information processing</subject><subject>Information retrieval</subject><subject>Webs</subject><issn>1007-1202</issn><issn>1993-4998</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><recordid>eNpdkN1LwzAUxYMoOKcv_gUFQVCo3qT5aB51OBUGvig-hjS9dR39mEnH5n9vRkXBp3vg_Djncgg5p3BDAdTt_RxYnlGZ5wdkQrXOUq51fhh1dFPKgB2TkxBWAJkWik6IvEtcY0Ooq9rZoe67pMVh2ZdJ1fvkHYuk7qJqRwt3g7duL0_JUWWbgGc_d0re5g-vs6d08fL4PLtbpI5KzdKKIwrFndXKWoRCWycFFUxKDnpPaE6VYNohR5CUltJKl1lKFRZlgS6bkqsxd2u7ynYfZtVvfBcbzXZZ7naFQQbAQYBgkb0c2bXvPzcYBtPWwWHT2A77TTAsV8BzARG8-Af-pjKVS82z-EukrkfK-T4Ej5VZ-7q1_stQMPs5zd_Y2TffG27y</recordid><startdate>20040901</startdate><enddate>20040901</enddate><creator>Li, Xiang-Yang</creator><creator>Zhang, Ya-Fei</creator><creator>Lu, Jian-Jiang</creator><creator>Xu, Bao-Wen</creator><general>Springer Nature B.V</general><general>Institute of Communications Engineering, People's Liberation Army University of Science and Techndogy, Nanjing 210007, Jiangsu, China%Institute of Communications Engineering, People's Liberation Army University of Science and Techndogy, Nanjing 210007, Jiangsu, China</general><general>Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China%Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20040901</creationdate><title>A classification method for Web information extraction</title><author>Li, Xiang-Yang ; Zhang, Ya-Fei ; Lu, Jian-Jiang ; Xu, Bao-Wen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Classification</topic><topic>Fragments</topic><topic>Information processing</topic><topic>Information retrieval</topic><topic>Webs</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiang-Yang</creatorcontrib><creatorcontrib>Zhang, Ya-Fei</creatorcontrib><creatorcontrib>Lu, Jian-Jiang</creatorcontrib><creatorcontrib>Xu, Bao-Wen</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Wuhan University journal of natural sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiang-Yang</au><au>Zhang, Ya-Fei</au><au>Lu, Jian-Jiang</au><au>Xu, Bao-Wen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A classification method for Web information extraction</atitle><jtitle>Wuhan University journal of natural sciences</jtitle><date>2004-09-01</date><risdate>2004</risdate><volume>9</volume><issue>5</issue><spage>823</spage><epage>827</epage><pages>823-827</pages><issn>1007-1202</issn><eissn>1993-4998</eissn><abstract>Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.</abstract><cop>Heidelberg</cop><pub>Springer Nature B.V</pub><doi>10.1007/BF02831688</doi><tpages>5</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1007-1202
ispartof Wuhan University journal of natural sciences, 2004-09, Vol.9 (5), p.823-827
issn 1007-1202
1993-4998
language eng
recordid cdi_wanfang_journals_whdxxb_e200405052
source Springer Nature - Connect here FIRST to enable access
subjects Classification
Fragments
Information processing
Information retrieval
Webs
title A classification method for Web information extraction
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T07%3A27%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20classification%20method%20for%20Web%20information%20extraction&rft.jtitle=Wuhan%20University%20journal%20of%20natural%20sciences&rft.au=Li,%20Xiang-Yang&rft.date=2004-09-01&rft.volume=9&rft.issue=5&rft.spage=823&rft.epage=827&rft.pages=823-827&rft.issn=1007-1202&rft.eissn=1993-4998&rft_id=info:doi/10.1007/BF02831688&rft_dat=%3Cwanfang_jour_proqu%3Ewhdxxb_e200405052%3C/wanfang_jour_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2786943061&rft_id=info:pmid/&rft_wanfj_id=whdxxb_e200405052&rfr_iscdi=true