Loading…
Multi-Task Learning with Calibrated Mixture of Insightful Experts
Multi-task learning has been established as an important machine learning framework for leveraging shared knowledge among multiple different but related tasks, with the generalization performance of models enhanced. As a promising learning paradigm, multi-task learning has been widely adopted by var...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Multi-task learning has been established as an important machine learning framework for leveraging shared knowledge among multiple different but related tasks, with the generalization performance of models enhanced. As a promising learning paradigm, multi-task learning has been widely adopted by various real-world applications, such as recommendation systems. Multi-gate Mixture-of-Experts (MMoE), a well-received multi-task learning method in industry, based on the classic and inspiring Mixture-of-Experts (MoE) structure, explicitly models task relationships and learns task-specific functionalities, generating significant improvements. However, in our applications, negative transfer, which confuses considerable existing multi-task learning methods, is still observed to happen to MMoE. In this paper, an in-depth empirical investigation into negative transfer is launched. And it reveals that, incompetent experts, which play fundamental roles under the learning framework of MoE, are the key technique bottleneck. To tackle this dilemma, we propose the Calibrated Mixture of Insightful Experts (CMoIE), with three novel modules (Conflict Resolution, Expert Communication, and Mixture Calibration), customed for multi-task learning. Hence a group of insightful experts are constructed with enhanced diversity, communication and specialization. To validate the proposed method CMoIE, experiments are conducted on three public datasets and one real-world click-through-rate prediction dataset we construct based on traffic logs collected from a large-scale online product recommendation system. Our approach yields best performance across all of these benchmarks, demonstrating the superiority of it. |
---|---|
ISSN: | 2375-026X |
DOI: | 10.1109/ICDE53745.2022.00312 |