Loading…

Paraphrase detection using LSTM networks and handcrafted features

Paraphrase detection is one of the fundamental tasks in the area of natural language processing. Paraphrase refers to those sentences or phrases that convey the same meaning but use different wording. It has a lot of applications such as machine translation, text summarization, QA systems, and plagi...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2021-02, Vol.80 (4), p.6479-6492
Main Authors:	Shahmohammadi, Hassan, Dezfoulian, MirHossein, Mansoorizadeh, Muharram
Format:	Article
Language:	English
Subjects:	Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Deep learning Machine translation Modules Multimedia Multimedia Information Systems Natural language Natural language processing Neural networks Plagiarism Semantics Sentences Special Purpose and Application-Based Systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303
cites	cdi_FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303
container_end_page	6492
container_issue	4
container_start_page	6479
container_title	Multimedia tools and applications
container_volume	80
creator	Shahmohammadi, Hassan Dezfoulian, MirHossein Mansoorizadeh, Muharram
description	Paraphrase detection is one of the fundamental tasks in the area of natural language processing. Paraphrase refers to those sentences or phrases that convey the same meaning but use different wording. It has a lot of applications such as machine translation, text summarization, QA systems, and plagiarism detection. In this research, we propose a new deep-learning based model which can generalize well despite the lack of training data for deep models. After preprocessing, our model can be divided into two separate modules. In the first one, we train a single Bi-LSTM neural network to encode the whole input by leveraging its pretrained GloVe word vectors. In the second module, three sets of handcrafted features are used to measure the similarity between each pair of sentences, some of which are introduced in this research for the first time. Our final model is formed by incorporating the handcrafted features with the output of the Bi-LSTM network. Evaluation results on MSRP and Quora datasets show that it outperforms almost all the previous works in terms of f-measure and accuracy on MSRP and achieves comparable results on Quora. On the Quora-question pair competition launched by Kaggle, our model ranked among the top 24% solutions between more than 3000 teams.
doi_str_mv	10.1007/s11042-020-09996-y
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2484419396</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2484419396</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AU8Fz9GZJG2S47L4BSsKrueQNtP9UNs1aZH991YrePMyM4fnfQcexs4RLhFAXyVEUIKDAA7W2oLvD9gEcy251gIPh1sa4DoHPGYnKW0BsMiFmrDZk49-t44-URaoo6rbtE3Wp02zyhbPy4esoe6zja8p803I1sOooq87CllNvusjpVN2VPu3RGe_e8pebq6X8zu-eLy9n88WvJJoO066BAIbSIm8ADRByVyLUAZV-lBJYS1aY2oDQRjMjZAl-VpUioTxNUqQU3Yx9u5i-9FT6ty27WMzvHRCGaXQSlsMlBipKrYpRardLm7efdw7BPetyo2q3KDK_ahy-yEkx1Aa4GZF8a_6n9QXsVJsEg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2484419396</pqid></control><display><type>article</type><title>Paraphrase detection using LSTM networks and handcrafted features</title><source>ABI/INFORM Global</source><source>Springer Nature</source><creator>Shahmohammadi, Hassan ; Dezfoulian, MirHossein ; Mansoorizadeh, Muharram</creator><creatorcontrib>Shahmohammadi, Hassan ; Dezfoulian, MirHossein ; Mansoorizadeh, Muharram</creatorcontrib><description>Paraphrase detection is one of the fundamental tasks in the area of natural language processing. Paraphrase refers to those sentences or phrases that convey the same meaning but use different wording. It has a lot of applications such as machine translation, text summarization, QA systems, and plagiarism detection. In this research, we propose a new deep-learning based model which can generalize well despite the lack of training data for deep models. After preprocessing, our model can be divided into two separate modules. In the first one, we train a single Bi-LSTM neural network to encode the whole input by leveraging its pretrained GloVe word vectors. In the second module, three sets of handcrafted features are used to measure the similarity between each pair of sentences, some of which are introduced in this research for the first time. Our final model is formed by incorporating the handcrafted features with the output of the Bi-LSTM network. Evaluation results on MSRP and Quora datasets show that it outperforms almost all the previous works in terms of f-measure and accuracy on MSRP and achieves comparable results on Quora. On the Quora-question pair competition launched by Kaggle, our model ranked among the top 24% solutions between more than 3000 teams.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-020-09996-y</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Datasets ; Deep learning ; Machine translation ; Modules ; Multimedia ; Multimedia Information Systems ; Natural language ; Natural language processing ; Neural networks ; Plagiarism ; Semantics ; Sentences ; Special Purpose and Application-Based Systems</subject><ispartof>Multimedia tools and applications, 2021-02, Vol.80 (4), p.6479-6492</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303</citedby><cites>FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303</cites><orcidid>0000-0002-7131-1047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2484419396/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2484419396?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,11667,27901,27902,36037,44339,74638</link.rule.ids></links><search><creatorcontrib>Shahmohammadi, Hassan</creatorcontrib><creatorcontrib>Dezfoulian, MirHossein</creatorcontrib><creatorcontrib>Mansoorizadeh, Muharram</creatorcontrib><title>Paraphrase detection using LSTM networks and handcrafted features</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Paraphrase detection is one of the fundamental tasks in the area of natural language processing. Paraphrase refers to those sentences or phrases that convey the same meaning but use different wording. It has a lot of applications such as machine translation, text summarization, QA systems, and plagiarism detection. In this research, we propose a new deep-learning based model which can generalize well despite the lack of training data for deep models. After preprocessing, our model can be divided into two separate modules. In the first one, we train a single Bi-LSTM neural network to encode the whole input by leveraging its pretrained GloVe word vectors. In the second module, three sets of handcrafted features are used to measure the similarity between each pair of sentences, some of which are introduced in this research for the first time. Our final model is formed by incorporating the handcrafted features with the output of the Bi-LSTM network. Evaluation results on MSRP and Quora datasets show that it outperforms almost all the previous works in terms of f-measure and accuracy on MSRP and achieves comparable results on Quora. On the Quora-question pair competition launched by Kaggle, our model ranked among the top 24% solutions between more than 3000 teams.</description><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Machine translation</subject><subject>Modules</subject><subject>Multimedia</subject><subject>Multimedia Information Systems</subject><subject>Natural language</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Plagiarism</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Special Purpose and Application-Based Systems</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kE1LxDAQhoMouK7-AU8Fz9GZJG2S47L4BSsKrueQNtP9UNs1aZH991YrePMyM4fnfQcexs4RLhFAXyVEUIKDAA7W2oLvD9gEcy251gIPh1sa4DoHPGYnKW0BsMiFmrDZk49-t44-URaoo6rbtE3Wp02zyhbPy4esoe6zja8p803I1sOooq87CllNvusjpVN2VPu3RGe_e8pebq6X8zu-eLy9n88WvJJoO066BAIbSIm8ADRByVyLUAZV-lBJYS1aY2oDQRjMjZAl-VpUioTxNUqQU3Yx9u5i-9FT6ty27WMzvHRCGaXQSlsMlBipKrYpRardLm7efdw7BPetyo2q3KDK_ahy-yEkx1Aa4GZF8a_6n9QXsVJsEg</recordid><startdate>20210201</startdate><enddate>20210201</enddate><creator>Shahmohammadi, Hassan</creator><creator>Dezfoulian, MirHossein</creator><creator>Mansoorizadeh, Muharram</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-7131-1047</orcidid></search><sort><creationdate>20210201</creationdate><title>Paraphrase detection using LSTM networks and handcrafted features</title><author>Shahmohammadi, Hassan ; Dezfoulian, MirHossein ; Mansoorizadeh, Muharram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Machine translation</topic><topic>Modules</topic><topic>Multimedia</topic><topic>Multimedia Information Systems</topic><topic>Natural language</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Plagiarism</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Special Purpose and Application-Based Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shahmohammadi, Hassan</creatorcontrib><creatorcontrib>Dezfoulian, MirHossein</creatorcontrib><creatorcontrib>Mansoorizadeh, Muharram</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shahmohammadi, Hassan</au><au>Dezfoulian, MirHossein</au><au>Mansoorizadeh, Muharram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Paraphrase detection using LSTM networks and handcrafted features</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2021-02-01</date><risdate>2021</risdate><volume>80</volume><issue>4</issue><spage>6479</spage><epage>6492</epage><pages>6479-6492</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Paraphrase detection is one of the fundamental tasks in the area of natural language processing. Paraphrase refers to those sentences or phrases that convey the same meaning but use different wording. It has a lot of applications such as machine translation, text summarization, QA systems, and plagiarism detection. In this research, we propose a new deep-learning based model which can generalize well despite the lack of training data for deep models. After preprocessing, our model can be divided into two separate modules. In the first one, we train a single Bi-LSTM neural network to encode the whole input by leveraging its pretrained GloVe word vectors. In the second module, three sets of handcrafted features are used to measure the similarity between each pair of sentences, some of which are introduced in this research for the first time. Our final model is formed by incorporating the handcrafted features with the output of the Bi-LSTM network. Evaluation results on MSRP and Quora datasets show that it outperforms almost all the previous works in terms of f-measure and accuracy on MSRP and achieves comparable results on Quora. On the Quora-question pair competition launched by Kaggle, our model ranked among the top 24% solutions between more than 3000 teams.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-020-09996-y</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-7131-1047</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1380-7501
ispartof	Multimedia tools and applications, 2021-02, Vol.80 (4), p.6479-6492
issn	1380-7501 1573-7721
language	eng
recordid	cdi_proquest_journals_2484419396
source	ABI/INFORM Global; Springer Nature
subjects	Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Deep learning Machine translation Modules Multimedia Multimedia Information Systems Natural language Natural language processing Neural networks Plagiarism Semantics Sentences Special Purpose and Application-Based Systems
title	Paraphrase detection using LSTM networks and handcrafted features
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T19%3A59%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Paraphrase%20detection%20using%20LSTM%20networks%20and%20handcrafted%20features&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Shahmohammadi,%20Hassan&rft.date=2021-02-01&rft.volume=80&rft.issue=4&rft.spage=6479&rft.epage=6492&rft.pages=6479-6492&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-020-09996-y&rft_dat=%3Cproquest_cross%3E2484419396%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-e7b0e09de4256018d43572dbd4badc32991988f80d2815823beaf2c4e28af1303%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2484419396&rft_id=info:pmid/&rfr_iscdi=true