Loading…

Micro video recommendation in multimodality using dual-perception and gated recurrent graph neural network

With the proliferation of mobile Internet devices and the increasing speed of networks, coupled with reduced data costs, individuals now enjoy the convenience of watching films on their mobile devices at their preferred times. The widespread adoption of micro-videos has led to the emergence of numer...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2024-05, Vol.83 (17), p.51559-51588
Main Authors: Patil, Swati S., Patil, Rupali S., Kotwal, Amina
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the proliferation of mobile Internet devices and the increasing speed of networks, coupled with reduced data costs, individuals now enjoy the convenience of watching films on their mobile devices at their preferred times. The widespread adoption of micro-videos has led to the emergence of numerous micro-video platforms. The growing popularity of these platforms has spurred efforts to enhance user experience through accurate and real-time recommendation algorithms. To remain competitive, platforms now rely on advanced algorithms to effectively recommend micro-videos. While algorithms based on multimodal data have been utilized to enrich item information, they often overlook user preferences for various information modalities and fail to conduct an in-depth analysis of the inherent connections within multimodal data. Consequently, this article proposes a novel framework for the Dual-Perception and Multi-Resolution Graph Neural Networks’ (DP-MRGNN) for micro-video recommendation. The primary step in this endeavor is to jointly identify distinctive fusion patterns for each user, leveraging user-micro-video bipartite and user co-occurrence graphs. Moreover, the sheer volume of created videos renders human processing of multimedia data impractical for addressing numerous multimedia challenges. Hence, this approach proves practical for various applications, particularly with large video datasets. The study suggests employing a dual GRU Neural network to encapsulate local elements within each graph and extract features signifying interactions between matched graphs. A disentangled multi-modal representation learning module is also developed to aptly model user attention across various modalities and inductively learn multi-modal user preferences. Furthermore, a negative sampling method is implemented to ascertain modality associations and ensure effective contributions from each modality to the study. Simulation experiments are conducted using Matlab, demonstrating superior features over hand-crafted ones in real-world movie datasets and MovieLens recommendations. The model’s feasibility and effectiveness are corroborated across multiple datasets, showcasing enhanced accuracy, nDCG, and recall compared to traditional recommendation methods.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-023-17093-z