Loading…
A classification method for Web information extraction
Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bas...
Saved in:
Published in: | Wuhan University journal of natural sciences 2004-09, Vol.9 (5), p.823-827 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3 |
---|---|
cites | cdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3 |
container_end_page | 827 |
container_issue | 5 |
container_start_page | 823 |
container_title | Wuhan University journal of natural sciences |
container_volume | 9 |
creator | Li, Xiang-Yang Zhang, Ya-Fei Lu, Jian-Jiang Xu, Bao-Wen |
description | Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction. |
doi_str_mv | 10.1007/BF02831688 |
format | article |
fullrecord | <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_whdxxb_e200405052</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><wanfj_id>whdxxb_e200405052</wanfj_id><sourcerecordid>whdxxb_e200405052</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</originalsourceid><addsrcrecordid>eNpdkN1LwzAUxYMoOKcv_gUFQVCo3qT5aB51OBUGvig-hjS9dR39mEnH5n9vRkXBp3vg_Djncgg5p3BDAdTt_RxYnlGZ5wdkQrXOUq51fhh1dFPKgB2TkxBWAJkWik6IvEtcY0Ooq9rZoe67pMVh2ZdJ1fvkHYuk7qJqRwt3g7duL0_JUWWbgGc_d0re5g-vs6d08fL4PLtbpI5KzdKKIwrFndXKWoRCWycFFUxKDnpPaE6VYNohR5CUltJKl1lKFRZlgS6bkqsxd2u7ynYfZtVvfBcbzXZZ7naFQQbAQYBgkb0c2bXvPzcYBtPWwWHT2A77TTAsV8BzARG8-Af-pjKVS82z-EukrkfK-T4Ej5VZ-7q1_stQMPs5zd_Y2TffG27y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2786943061</pqid></control><display><type>article</type><title>A classification method for Web information extraction</title><source>Springer Nature - Connect here FIRST to enable access</source><creator>Li, Xiang-Yang ; Zhang, Ya-Fei ; Lu, Jian-Jiang ; Xu, Bao-Wen</creator><creatorcontrib>Li, Xiang-Yang ; Zhang, Ya-Fei ; Lu, Jian-Jiang ; Xu, Bao-Wen</creatorcontrib><description>Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.</description><identifier>ISSN: 1007-1202</identifier><identifier>EISSN: 1993-4998</identifier><identifier>DOI: 10.1007/BF02831688</identifier><language>eng</language><publisher>Heidelberg: Springer Nature B.V</publisher><subject>Classification ; Fragments ; Information processing ; Information retrieval ; Webs</subject><ispartof>Wuhan University journal of natural sciences, 2004-09, Vol.9 (5), p.823-827</ispartof><rights>Springer 2004.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</citedby><cites>FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/whdxxb-e/whdxxb-e.jpg</thumbnail><link.rule.ids>314,780,784,1644,27924,27925</link.rule.ids></links><search><creatorcontrib>Li, Xiang-Yang</creatorcontrib><creatorcontrib>Zhang, Ya-Fei</creatorcontrib><creatorcontrib>Lu, Jian-Jiang</creatorcontrib><creatorcontrib>Xu, Bao-Wen</creatorcontrib><title>A classification method for Web information extraction</title><title>Wuhan University journal of natural sciences</title><description>Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.</description><subject>Classification</subject><subject>Fragments</subject><subject>Information processing</subject><subject>Information retrieval</subject><subject>Webs</subject><issn>1007-1202</issn><issn>1993-4998</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><recordid>eNpdkN1LwzAUxYMoOKcv_gUFQVCo3qT5aB51OBUGvig-hjS9dR39mEnH5n9vRkXBp3vg_Djncgg5p3BDAdTt_RxYnlGZ5wdkQrXOUq51fhh1dFPKgB2TkxBWAJkWik6IvEtcY0Ooq9rZoe67pMVh2ZdJ1fvkHYuk7qJqRwt3g7duL0_JUWWbgGc_d0re5g-vs6d08fL4PLtbpI5KzdKKIwrFndXKWoRCWycFFUxKDnpPaE6VYNohR5CUltJKl1lKFRZlgS6bkqsxd2u7ynYfZtVvfBcbzXZZ7naFQQbAQYBgkb0c2bXvPzcYBtPWwWHT2A77TTAsV8BzARG8-Af-pjKVS82z-EukrkfK-T4Ej5VZ-7q1_stQMPs5zd_Y2TffG27y</recordid><startdate>20040901</startdate><enddate>20040901</enddate><creator>Li, Xiang-Yang</creator><creator>Zhang, Ya-Fei</creator><creator>Lu, Jian-Jiang</creator><creator>Xu, Bao-Wen</creator><general>Springer Nature B.V</general><general>Institute of Communications Engineering, People's Liberation Army University of Science and Techndogy, Nanjing 210007, Jiangsu, China%Institute of Communications Engineering, People's Liberation Army University of Science and Techndogy, Nanjing 210007, Jiangsu, China</general><general>Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China%Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20040901</creationdate><title>A classification method for Web information extraction</title><author>Li, Xiang-Yang ; Zhang, Ya-Fei ; Lu, Jian-Jiang ; Xu, Bao-Wen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Classification</topic><topic>Fragments</topic><topic>Information processing</topic><topic>Information retrieval</topic><topic>Webs</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiang-Yang</creatorcontrib><creatorcontrib>Zhang, Ya-Fei</creatorcontrib><creatorcontrib>Lu, Jian-Jiang</creatorcontrib><creatorcontrib>Xu, Bao-Wen</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Wuhan University journal of natural sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiang-Yang</au><au>Zhang, Ya-Fei</au><au>Lu, Jian-Jiang</au><au>Xu, Bao-Wen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A classification method for Web information extraction</atitle><jtitle>Wuhan University journal of natural sciences</jtitle><date>2004-09-01</date><risdate>2004</risdate><volume>9</volume><issue>5</issue><spage>823</spage><epage>827</epage><pages>823-827</pages><issn>1007-1202</issn><eissn>1993-4998</eissn><abstract>Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction.</abstract><cop>Heidelberg</cop><pub>Springer Nature B.V</pub><doi>10.1007/BF02831688</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1007-1202 |
ispartof | Wuhan University journal of natural sciences, 2004-09, Vol.9 (5), p.823-827 |
issn | 1007-1202 1993-4998 |
language | eng |
recordid | cdi_wanfang_journals_whdxxb_e200405052 |
source | Springer Nature - Connect here FIRST to enable access |
subjects | Classification Fragments Information processing Information retrieval Webs |
title | A classification method for Web information extraction |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T07%3A27%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20classification%20method%20for%20Web%20information%20extraction&rft.jtitle=Wuhan%20University%20journal%20of%20natural%20sciences&rft.au=Li,%20Xiang-Yang&rft.date=2004-09-01&rft.volume=9&rft.issue=5&rft.spage=823&rft.epage=827&rft.pages=823-827&rft.issn=1007-1202&rft.eissn=1993-4998&rft_id=info:doi/10.1007/BF02831688&rft_dat=%3Cwanfang_jour_proqu%3Ewhdxxb_e200405052%3C/wanfang_jour_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1692-f4ee574ca97aae0b9ac651526640916929417529ce4e0611d6a6c3a117ebdbec3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2786943061&rft_id=info:pmid/&rft_wanfj_id=whdxxb_e200405052&rfr_iscdi=true |