Loading…

Cross-language article linking with deep neural network based paragraph encoding

Cross-language article linking (CLAL), the task of generating links between articles in different languages from different encyclopedias, is critical for facilitating sharing among online knowledge bases. Some previous CLAL research has been done on creating links among Wikipedia wikis, but much of...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language 2022-03, Vol.72, p.101279, Article 101279
Main Authors: Wang, Yu-Chun, Chuang, Chia-Min, Wu, Chun-Kai, Pan, Chao-Lin, Tsai, Richard Tzong-Han
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93
cites cdi_FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93
container_end_page
container_issue
container_start_page 101279
container_title Computer speech & language
container_volume 72
creator Wang, Yu-Chun
Chuang, Chia-Min
Wu, Chun-Kai
Pan, Chao-Lin
Tsai, Richard Tzong-Han
description Cross-language article linking (CLAL), the task of generating links between articles in different languages from different encyclopedias, is critical for facilitating sharing among online knowledge bases. Some previous CLAL research has been done on creating links among Wikipedia wikis, but much of this work depends heavily on simple language patterns and encyclopedia format or metadata. In this paper, we propose a new CLAL method based on deep learning paragraph embeddings to link English Wikipedia articles with articles in Baidu Baike, the most popular online encyclopedia in mainland China. To measure article similarity for link prediction, we employ several neural networks with attention mechanisms, such as CNN and LSTM, to train paragraph encoders that create vector representations of the articles’ semantics based only on article text, rather than link structure, as input data. Using our “Deep CLAL” method, we compile a data set consisting of Baidu Baike entries and corresponding English Wikipedia entries. Our approach does not rely on linguistic or structural features and can be easily applied to other language pairs by using pre-trained word embeddings, regardless of whether the two languages are on the same encyclopedia platform. •Cross-language article linking helps create a multilingual unified knowledge base.•Using attention-based neural network that learns to attend to the vital part of articles.•The novel method that does not rely on feature engineering and is scalable to large data.
doi_str_mv 10.1016/j.csl.2021.101279
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_csl_2021_101279</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0885230821000826</els_id><sourcerecordid>S0885230821000826</sourcerecordid><originalsourceid>FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwAez8Ayl-1LEtVqjiJVWCBaytqTNJ3YYkslMq_h5HZc3qajQ6ozuHkFvOFpzx8m638KldCCb4NAttz8iMM6sKI0t5TmbMGFUIycwluUppxxgr1VLPyPsq9ikVLXTNARqkEMfgW6Rt6Paha-gxjFtaIQ60w0OENsd47OOebiBhRQeI0EQYthQ731eZuCYXNbQJb_5yTj6fHj9WL8X67fl19bAuvLB6LJTRwhvPhVSs9ug3JtdhorYclFBWgADNjM1LUSsNikkvvC-tNGJZ6o2Vc8JPd_30QMTaDTF8QfxxnLlJidu5rMRNStxJSWbuTwzmYt8Bo0s-5OJYhYh-dFUf_qF_AWmeaTM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cross-language article linking with deep neural network based paragraph encoding</title><source>ScienceDirect Journals</source><creator>Wang, Yu-Chun ; Chuang, Chia-Min ; Wu, Chun-Kai ; Pan, Chao-Lin ; Tsai, Richard Tzong-Han</creator><creatorcontrib>Wang, Yu-Chun ; Chuang, Chia-Min ; Wu, Chun-Kai ; Pan, Chao-Lin ; Tsai, Richard Tzong-Han</creatorcontrib><description>Cross-language article linking (CLAL), the task of generating links between articles in different languages from different encyclopedias, is critical for facilitating sharing among online knowledge bases. Some previous CLAL research has been done on creating links among Wikipedia wikis, but much of this work depends heavily on simple language patterns and encyclopedia format or metadata. In this paper, we propose a new CLAL method based on deep learning paragraph embeddings to link English Wikipedia articles with articles in Baidu Baike, the most popular online encyclopedia in mainland China. To measure article similarity for link prediction, we employ several neural networks with attention mechanisms, such as CNN and LSTM, to train paragraph encoders that create vector representations of the articles’ semantics based only on article text, rather than link structure, as input data. Using our “Deep CLAL” method, we compile a data set consisting of Baidu Baike entries and corresponding English Wikipedia entries. Our approach does not rely on linguistic or structural features and can be easily applied to other language pairs by using pre-trained word embeddings, regardless of whether the two languages are on the same encyclopedia platform. •Cross-language article linking helps create a multilingual unified knowledge base.•Using attention-based neural network that learns to attend to the vital part of articles.•The novel method that does not rely on feature engineering and is scalable to large data.</description><identifier>ISSN: 0885-2308</identifier><identifier>EISSN: 1095-8363</identifier><identifier>DOI: 10.1016/j.csl.2021.101279</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Convolutional neural network ; Cross-language article linking ; Deep learning ; Link discovery ; Long short-term memory ; Paragraph encoding</subject><ispartof>Computer speech &amp; language, 2022-03, Vol.72, p.101279, Article 101279</ispartof><rights>2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93</citedby><cites>FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93</cites><orcidid>0000-0003-0513-107X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Wang, Yu-Chun</creatorcontrib><creatorcontrib>Chuang, Chia-Min</creatorcontrib><creatorcontrib>Wu, Chun-Kai</creatorcontrib><creatorcontrib>Pan, Chao-Lin</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><title>Cross-language article linking with deep neural network based paragraph encoding</title><title>Computer speech &amp; language</title><description>Cross-language article linking (CLAL), the task of generating links between articles in different languages from different encyclopedias, is critical for facilitating sharing among online knowledge bases. Some previous CLAL research has been done on creating links among Wikipedia wikis, but much of this work depends heavily on simple language patterns and encyclopedia format or metadata. In this paper, we propose a new CLAL method based on deep learning paragraph embeddings to link English Wikipedia articles with articles in Baidu Baike, the most popular online encyclopedia in mainland China. To measure article similarity for link prediction, we employ several neural networks with attention mechanisms, such as CNN and LSTM, to train paragraph encoders that create vector representations of the articles’ semantics based only on article text, rather than link structure, as input data. Using our “Deep CLAL” method, we compile a data set consisting of Baidu Baike entries and corresponding English Wikipedia entries. Our approach does not rely on linguistic or structural features and can be easily applied to other language pairs by using pre-trained word embeddings, regardless of whether the two languages are on the same encyclopedia platform. •Cross-language article linking helps create a multilingual unified knowledge base.•Using attention-based neural network that learns to attend to the vital part of articles.•The novel method that does not rely on feature engineering and is scalable to large data.</description><subject>Convolutional neural network</subject><subject>Cross-language article linking</subject><subject>Deep learning</subject><subject>Link discovery</subject><subject>Long short-term memory</subject><subject>Paragraph encoding</subject><issn>0885-2308</issn><issn>1095-8363</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwAez8Ayl-1LEtVqjiJVWCBaytqTNJ3YYkslMq_h5HZc3qajQ6ozuHkFvOFpzx8m638KldCCb4NAttz8iMM6sKI0t5TmbMGFUIycwluUppxxgr1VLPyPsq9ikVLXTNARqkEMfgW6Rt6Paha-gxjFtaIQ60w0OENsd47OOebiBhRQeI0EQYthQ731eZuCYXNbQJb_5yTj6fHj9WL8X67fl19bAuvLB6LJTRwhvPhVSs9ug3JtdhorYclFBWgADNjM1LUSsNikkvvC-tNGJZ6o2Vc8JPd_30QMTaDTF8QfxxnLlJidu5rMRNStxJSWbuTwzmYt8Bo0s-5OJYhYh-dFUf_qF_AWmeaTM</recordid><startdate>202203</startdate><enddate>202203</enddate><creator>Wang, Yu-Chun</creator><creator>Chuang, Chia-Min</creator><creator>Wu, Chun-Kai</creator><creator>Pan, Chao-Lin</creator><creator>Tsai, Richard Tzong-Han</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0513-107X</orcidid></search><sort><creationdate>202203</creationdate><title>Cross-language article linking with deep neural network based paragraph encoding</title><author>Wang, Yu-Chun ; Chuang, Chia-Min ; Wu, Chun-Kai ; Pan, Chao-Lin ; Tsai, Richard Tzong-Han</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Convolutional neural network</topic><topic>Cross-language article linking</topic><topic>Deep learning</topic><topic>Link discovery</topic><topic>Long short-term memory</topic><topic>Paragraph encoding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Yu-Chun</creatorcontrib><creatorcontrib>Chuang, Chia-Min</creatorcontrib><creatorcontrib>Wu, Chun-Kai</creatorcontrib><creatorcontrib>Pan, Chao-Lin</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><collection>CrossRef</collection><jtitle>Computer speech &amp; language</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Yu-Chun</au><au>Chuang, Chia-Min</au><au>Wu, Chun-Kai</au><au>Pan, Chao-Lin</au><au>Tsai, Richard Tzong-Han</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cross-language article linking with deep neural network based paragraph encoding</atitle><jtitle>Computer speech &amp; language</jtitle><date>2022-03</date><risdate>2022</risdate><volume>72</volume><spage>101279</spage><pages>101279-</pages><artnum>101279</artnum><issn>0885-2308</issn><eissn>1095-8363</eissn><abstract>Cross-language article linking (CLAL), the task of generating links between articles in different languages from different encyclopedias, is critical for facilitating sharing among online knowledge bases. Some previous CLAL research has been done on creating links among Wikipedia wikis, but much of this work depends heavily on simple language patterns and encyclopedia format or metadata. In this paper, we propose a new CLAL method based on deep learning paragraph embeddings to link English Wikipedia articles with articles in Baidu Baike, the most popular online encyclopedia in mainland China. To measure article similarity for link prediction, we employ several neural networks with attention mechanisms, such as CNN and LSTM, to train paragraph encoders that create vector representations of the articles’ semantics based only on article text, rather than link structure, as input data. Using our “Deep CLAL” method, we compile a data set consisting of Baidu Baike entries and corresponding English Wikipedia entries. Our approach does not rely on linguistic or structural features and can be easily applied to other language pairs by using pre-trained word embeddings, regardless of whether the two languages are on the same encyclopedia platform. •Cross-language article linking helps create a multilingual unified knowledge base.•Using attention-based neural network that learns to attend to the vital part of articles.•The novel method that does not rely on feature engineering and is scalable to large data.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.csl.2021.101279</doi><orcidid>https://orcid.org/0000-0003-0513-107X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0885-2308
ispartof Computer speech & language, 2022-03, Vol.72, p.101279, Article 101279
issn 0885-2308
1095-8363
language eng
recordid cdi_crossref_primary_10_1016_j_csl_2021_101279
source ScienceDirect Journals
subjects Convolutional neural network
Cross-language article linking
Deep learning
Link discovery
Long short-term memory
Paragraph encoding
title Cross-language article linking with deep neural network based paragraph encoding
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T16%3A55%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cross-language%20article%20linking%20with%20deep%20neural%20network%20based%20paragraph%20encoding&rft.jtitle=Computer%20speech%20&%20language&rft.au=Wang,%20Yu-Chun&rft.date=2022-03&rft.volume=72&rft.spage=101279&rft.pages=101279-&rft.artnum=101279&rft.issn=0885-2308&rft.eissn=1095-8363&rft_id=info:doi/10.1016/j.csl.2021.101279&rft_dat=%3Celsevier_cross%3ES0885230821000826%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c297t-5872c8c12350fcecb854702f91a52592a2a708950f2f57a503c2cc69382467b93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true