Loading…

Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity

Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retri...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2021-01
Main Authors: Chopra, Ankush, Agrawal, Shruti, Ghosh, Sohom
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Chopra, Ankush
Agrawal, Shruti
Ghosh, Sohom
description Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.
doi_str_mv 10.48550/arxiv.2101.02351
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2476250207</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2476250207</sourcerecordid><originalsourceid>FETCH-LOGICAL-a527-e34dc2a383db87d2b5f4d5a8f22f1dd3ce486bbd1fed49ccf8f559ed812d291e3</originalsourceid><addsrcrecordid>eNotTs1qAjEYDIVCxfoAvQV6Xpt8Sdx4FGtbQShlt2fJJl_aiJtssyr69nVpT_PDMDOEPHA2lVop9mTyOZymwBmfMhCK35ARCMELLQHuyKTvd4wxmJWglBiRdtF1-0uIX7TOJvYeM92gyXFwfMp03XY5nQb1nFoTYlF1aIMPllbXmP2mq3OHOWC0SD_7IfdxxHyhhzSQ_hBSpFVow97kcLjck1tv9j1O_nFM6pdVvXwrNu-v6-ViUxgFZYFCOgtGaOEaXTpolJdOGe0BPHdOWJR61jSOe3Rybq3XXqk5Os3BwZyjGJPHv9rr95_hxXaXjjleF7cgyxkoBqwUv9WsXNI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2476250207</pqid></control><display><type>article</type><title>Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity</title><source>Publicly Available Content Database</source><creator>Chopra, Ankush ; Agrawal, Shruti ; Ghosh, Sohom</creator><creatorcontrib>Chopra, Ankush ; Agrawal, Shruti ; Ghosh, Sohom</creatorcontrib><description>Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2101.02351</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Queries ; Questions ; Search engines ; Similarity ; Support vector machines</subject><ispartof>arXiv.org, 2021-01</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2476250207?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>Chopra, Ankush</creatorcontrib><creatorcontrib>Agrawal, Shruti</creatorcontrib><creatorcontrib>Ghosh, Sohom</creatorcontrib><title>Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity</title><title>arXiv.org</title><description>Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.</description><subject>Datasets</subject><subject>Queries</subject><subject>Questions</subject><subject>Search engines</subject><subject>Similarity</subject><subject>Support vector machines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotTs1qAjEYDIVCxfoAvQV6Xpt8Sdx4FGtbQShlt2fJJl_aiJtssyr69nVpT_PDMDOEPHA2lVop9mTyOZymwBmfMhCK35ARCMELLQHuyKTvd4wxmJWglBiRdtF1-0uIX7TOJvYeM92gyXFwfMp03XY5nQb1nFoTYlF1aIMPllbXmP2mq3OHOWC0SD_7IfdxxHyhhzSQ_hBSpFVow97kcLjck1tv9j1O_nFM6pdVvXwrNu-v6-ViUxgFZYFCOgtGaOEaXTpolJdOGe0BPHdOWJR61jSOe3Rybq3XXqk5Os3BwZyjGJPHv9rr95_hxXaXjjleF7cgyxkoBqwUv9WsXNI</recordid><startdate>20210107</startdate><enddate>20210107</enddate><creator>Chopra, Ankush</creator><creator>Agrawal, Shruti</creator><creator>Ghosh, Sohom</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210107</creationdate><title>Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity</title><author>Chopra, Ankush ; Agrawal, Shruti ; Ghosh, Sohom</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a527-e34dc2a383db87d2b5f4d5a8f22f1dd3ce486bbd1fed49ccf8f559ed812d291e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Datasets</topic><topic>Queries</topic><topic>Questions</topic><topic>Search engines</topic><topic>Similarity</topic><topic>Support vector machines</topic><toplevel>online_resources</toplevel><creatorcontrib>Chopra, Ankush</creatorcontrib><creatorcontrib>Agrawal, Shruti</creatorcontrib><creatorcontrib>Ghosh, Sohom</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chopra, Ankush</au><au>Agrawal, Shruti</au><au>Ghosh, Sohom</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity</atitle><jtitle>arXiv.org</jtitle><date>2021-01-07</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Search is one of the most common platforms used to seek information. However, users mostly get overloaded with results whenever they use such a platform to resolve their queries. Nowadays, direct answers to queries are being provided as a part of the search experience. The question-answer (QA) retrieval process plays a significant role in enriching the search experience. Most off-the-shelf Semantic Textual Similarity models work fine for well-formed search queries, but their performances degrade when applied to a domain-specific setting having incomplete or grammatically ill-formed search queries in prevalence. In this paper, we discuss a framework for calculating similarities between a given input query and a set of predefined questions to retrieve the question which matches to it the most. We have used it for the financial domain, but the framework is generalized for any domain-specific search engine and can be used in other domains as well. We use Siamese network [6] over Long Short-Term Memory (LSTM) [3] models to train a classifier which generates unnormalized and normalized similarity scores for a given pair of questions. Moreover, for each of these question pairs, we calculate three other similarity scores: cosine similarity between their average word2vec embeddings [15], cosine similarity between their sentence embeddings [7] generated using RoBERTa [17] and their customized fuzzy-match score. Finally, we develop a metaclassifier using Support Vector Machines [19] for combining these five scores to detect if a given pair of questions is similar. We benchmark our model's performance against existing State Of The Art (SOTA) models on Quora Question Pairs (QQP) dataset as well as a dataset specific to the financial domain.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2101.02351</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2021-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2476250207
source Publicly Available Content Database
subjects Datasets
Queries
Questions
Search engines
Similarity
Support vector machines
title Applying Transfer Learning for Improving Domain-Specific Search Experience Using Query to Question Similarity
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T02%3A32%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Applying%20Transfer%20Learning%20for%20Improving%20Domain-Specific%20Search%20Experience%20Using%20Query%20to%20Question%20Similarity&rft.jtitle=arXiv.org&rft.au=Chopra,%20Ankush&rft.date=2021-01-07&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2101.02351&rft_dat=%3Cproquest%3E2476250207%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a527-e34dc2a383db87d2b5f4d5a8f22f1dd3ce486bbd1fed49ccf8f559ed812d291e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2476250207&rft_id=info:pmid/&rfr_iscdi=true