Loading…

Online Learning for Constrained Assortment Optimization Under Markov Chain Choice Model

Assortment optimization finds many important applications in both brick-and-mortar and online retailing. Decision makers select a subset of products to offer to customers from a universe of substitutable products, based on the assumption that customers purchase according to a Markov chain choice mod...

Full description

Saved in:
Bibliographic Details
Published in:Operations research 2024-05
Main Authors: Li, Shukai, Luo, Qi, Huang, Zhiyuan, Shi, Cong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Assortment optimization finds many important applications in both brick-and-mortar and online retailing. Decision makers select a subset of products to offer to customers from a universe of substitutable products, based on the assumption that customers purchase according to a Markov chain choice model, which is a very general choice model encompassing many popular models. The existing literature predominantly assumes that the customer arrival process and the Markov chain choice model parameters are given as input to the stochastic optimization model. However, in practice, decision makers may not have this information and must learn them while maximizing the total expected revenue on the fly. In “Online Learning for Constrained Assortment Optimization under the Markov Chain Choice Model,” S. Li, Q. Luo, Z. Huang, and C. Shi developed a series of online learning algorithms for Markov chain choice-based assortment optimization problems with efficiency, as well as provable performance guarantees. We study a dynamic assortment selection problem where arriving customers make purchase decisions among offered products from a universe of products under a Markov chain choice (MCC) model. The retailer only observes the assortment and the customer’s single choice per period. Given limited display capacity, resource constraints, and no a priori knowledge of problem parameters, the retailer’s objective is to sequentially learn the choice model and optimize cumulative revenues over a finite selling horizon. We develop a fast linear system based explore-then-commit (FastLinETC for short) learning algorithm that balances the tradeoff between exploration and exploitation. The algorithm can simultaneously estimate the arrival and transition probabilities in the MCC model by solving a linear system of equations and determining the near-optimal assortment based on these estimates. Furthermore, our consistent estimators offer superior computational times compared with existing heuristic estimation methods, which often suffer from inconsistency or a significant computational burden. Funding: The research of Q. Luo is partially supported by the National Science Foundation [Grant CMMI-2308750]. The research of Z. Huang is partially supported by the Shanghai Sailing Program [Grant 22YF1451100 and the Fundamental Research Funds for the Central Universities]. The research of C. Shi is partially supported by Amazon [Research Award].
ISSN:0030-364X
1526-5463
DOI:10.1287/opre.2022.0693