Loading…

AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online

Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance o...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2020-12
Main Authors:	Huzhang, Guangda, Zhen-Jia Pang, Gao, Yongqing, Liu, Yawen, Shen, Weijie, Wen-Ji, Zhou, Da, Qing, An-Xiang, Zeng, Han, Yu, Yang, Yu, Zhi-Hua Zhou
Format:	Article
Language:	English
Subjects:	Combinatorial analysis Context Data points Distance learning Electronic commerce Supervised learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Huzhang, Guangda Zhen-Jia Pang Gao, Yongqing Liu, Yawen Shen, Weijie Wen-Ji, Zhou Da, Qing An-Xiang, Zeng Han, Yu Yang, Yu Zhi-Hua Zhou
description	Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2383710967</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2383710967</sourcerecordid><originalsourceid>FETCH-proquest_journals_23837109673</originalsourceid><addsrcrecordid>eNqNir0KwjAYAIMgWLTvEHAOpIn90U2k6mBRpM4l6FdNTZOatFh8eh0EV6eDuxsgj3EekGTG2Aj5zlWUUhbFLAy5h05LJdO-seAc3oGwWuoryQ05Cn1f4Ez0spavj8N7raQGnJkLKHwAWxpbC30G_JTtzXQt3pjfNkHDUigH_pdjNF2n-WpLGmseHbi2qExn9ScVjCc8Dug8ivl_1xuc4kCb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2383710967</pqid></control><display><type>article</type><title>AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online</title><source>Publicly Available Content (ProQuest)</source><creator>Huzhang, Guangda ; Zhen-Jia Pang ; Gao, Yongqing ; Liu, Yawen ; Shen, Weijie ; Wen-Ji, Zhou ; Da, Qing ; An-Xiang, Zeng ; Han, Yu ; Yang, Yu ; Zhi-Hua Zhou</creator><creatorcontrib>Huzhang, Guangda ; Zhen-Jia Pang ; Gao, Yongqing ; Liu, Yawen ; Shen, Weijie ; Wen-Ji, Zhou ; Da, Qing ; An-Xiang, Zeng ; Han, Yu ; Yang, Yu ; Zhi-Hua Zhou</creatorcontrib><description>Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Combinatorial analysis ; Context ; Data points ; Distance learning ; Electronic commerce ; Supervised learning</subject><ispartof>arXiv.org, 2020-12</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2383710967?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Huzhang, Guangda</creatorcontrib><creatorcontrib>Zhen-Jia Pang</creatorcontrib><creatorcontrib>Gao, Yongqing</creatorcontrib><creatorcontrib>Liu, Yawen</creatorcontrib><creatorcontrib>Shen, Weijie</creatorcontrib><creatorcontrib>Wen-Ji, Zhou</creatorcontrib><creatorcontrib>Da, Qing</creatorcontrib><creatorcontrib>An-Xiang, Zeng</creatorcontrib><creatorcontrib>Han, Yu</creatorcontrib><creatorcontrib>Yang, Yu</creatorcontrib><creatorcontrib>Zhi-Hua Zhou</creatorcontrib><title>AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online</title><title>arXiv.org</title><description>Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.</description><subject>Combinatorial analysis</subject><subject>Context</subject><subject>Data points</subject><subject>Distance learning</subject><subject>Electronic commerce</subject><subject>Supervised learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNir0KwjAYAIMgWLTvEHAOpIn90U2k6mBRpM4l6FdNTZOatFh8eh0EV6eDuxsgj3EekGTG2Aj5zlWUUhbFLAy5h05LJdO-seAc3oGwWuoryQ05Cn1f4Ez0spavj8N7raQGnJkLKHwAWxpbC30G_JTtzXQt3pjfNkHDUigH_pdjNF2n-WpLGmseHbi2qExn9ScVjCc8Dug8ivl_1xuc4kCb</recordid><startdate>20201231</startdate><enddate>20201231</enddate><creator>Huzhang, Guangda</creator><creator>Zhen-Jia Pang</creator><creator>Gao, Yongqing</creator><creator>Liu, Yawen</creator><creator>Shen, Weijie</creator><creator>Wen-Ji, Zhou</creator><creator>Da, Qing</creator><creator>An-Xiang, Zeng</creator><creator>Han, Yu</creator><creator>Yang, Yu</creator><creator>Zhi-Hua Zhou</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20201231</creationdate><title>AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online</title><author>Huzhang, Guangda ; Zhen-Jia Pang ; Gao, Yongqing ; Liu, Yawen ; Shen, Weijie ; Wen-Ji, Zhou ; Da, Qing ; An-Xiang, Zeng ; Han, Yu ; Yang, Yu ; Zhi-Hua Zhou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23837109673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Combinatorial analysis</topic><topic>Context</topic><topic>Data points</topic><topic>Distance learning</topic><topic>Electronic commerce</topic><topic>Supervised learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Huzhang, Guangda</creatorcontrib><creatorcontrib>Zhen-Jia Pang</creatorcontrib><creatorcontrib>Gao, Yongqing</creatorcontrib><creatorcontrib>Liu, Yawen</creatorcontrib><creatorcontrib>Shen, Weijie</creatorcontrib><creatorcontrib>Wen-Ji, Zhou</creatorcontrib><creatorcontrib>Da, Qing</creatorcontrib><creatorcontrib>An-Xiang, Zeng</creatorcontrib><creatorcontrib>Han, Yu</creatorcontrib><creatorcontrib>Yang, Yu</creatorcontrib><creatorcontrib>Zhi-Hua Zhou</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huzhang, Guangda</au><au>Zhen-Jia Pang</au><au>Gao, Yongqing</au><au>Liu, Yawen</au><au>Shen, Weijie</au><au>Wen-Ji, Zhou</au><au>Da, Qing</au><au>An-Xiang, Zeng</au><au>Han, Yu</au><au>Yang, Yu</au><au>Zhi-Hua Zhou</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online</atitle><jtitle>arXiv.org</jtitle><date>2020-12-31</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm from offline labeled data collected from the online system. However, it has been noticed that previous LTR models can have a good validation performance over offline validation data but have a poor online performance, and vice versa, which implies a possible large inconsistency between the offline and online evaluation. We investigate and confirm in this paper that such inconsistency exists and can have a significant impact on AliExpress Search. Reasons for the inconsistency include the ignorance of item context during the learning, and the offline data set is insufficient for learning the context. Therefore, this paper proposes an evaluator-generator framework for LTR with item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning, and a discriminator that ensures the generalization of the evaluator. Extensive experiments in simulation environments and AliExpress Search online system show that, firstly, the classic data-based metrics on the offline dataset can show significant inconsistency with online performance, and can even be misleading. Secondly, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. Finally, as the consequence, our method achieves a significant improvement (\textgreater$2\%$) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2020-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2383710967
source	Publicly Available Content (ProQuest)
subjects	Combinatorial analysis Context Data points Distance learning Electronic commerce Supervised learning
title	AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T14%3A44%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=AliExpress%20Learning-To-Rank:%20Maximizing%20Online%20Model%20Performance%20without%20Going%20Online&rft.jtitle=arXiv.org&rft.au=Huzhang,%20Guangda&rft.date=2020-12-31&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2383710967%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_23837109673%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2383710967&rft_id=info:pmid/&rfr_iscdi=true