TY - JOUR
KW - Aspect-Based Multimodal Sentiment Analysis
KW - Optimal Transport
KW - Social Media Opinion Mining
AU - Linhao Zhang
AU - Li Jin
AU - Guangluan Xu
AU - Xiaoyu Li
AU - Xian Sun
AU - Zequn Zhang
AU - Yanan Zhang
AU - Qi Li
AB - Aspect-based multimodal sentiment analysis under social media scenario aims to identify the sentiment polarities of each aspect term, which are mentioned in a piece of multimodal user-generated content. Previous approaches for this interdisciplinary multimodal task mainly rely on coarse-grained fusion mechanisms from the data-level or decision-level, which have the following three shortcomings:(1) ignoring the category knowledge of the sentiment target mentioned in the text) in visual information. (2) unable to assess the importance of maintaining target interaction during the unimodal encoding process, which results in indiscriminative representations considering various aspect terms. (3) suffering from the semantic gap between multiple modalities. To tackle the above challenging issues, we propose an optimal target-oriented knowledge transportation network (OtarNet) for this task. Firstly, the visual category knowledge is explicitly transported through input space translation and reformulation. Secondly, with the reformulated knowledge containing the target and category information, the target sensitivity is well maintained in the unimodal representations through a multistage target-oriented interaction mechanism. Finally, to eliminate the distributional modality gap by integrating complementary knowledge, the target-sensitive features of multiple modalities are implicitly transported based on the optimal transport interaction module. Our model achieves state-of-theart performance on three benchmark datasets: Twitter-15, Twitter-17 and Yelp, together with the extensive ablation study demonstrating the superiority and effectiveness of OtarNet.
IS - Regular Issue
M1 - 4
N2 - Aspect-based multimodal sentiment analysis under social media scenario aims to identify the sentiment polarities of each aspect term, which are mentioned in a piece of multimodal user-generated content. Previous approaches for this interdisciplinary multimodal task mainly rely on coarse-grained fusion mechanisms from the data-level or decision-level, which have the following three shortcomings:(1) ignoring the category knowledge of the sentiment target mentioned in the text) in visual information. (2) unable to assess the importance of maintaining target interaction during the unimodal encoding process, which results in indiscriminative representations considering various aspect terms. (3) suffering from the semantic gap between multiple modalities. To tackle the above challenging issues, we propose an optimal target-oriented knowledge transportation network (OtarNet) for this task. Firstly, the visual category knowledge is explicitly transported through input space translation and reformulation. Secondly, with the reformulated knowledge containing the target and category information, the target sensitivity is well maintained in the unimodal representations through a multistage target-oriented interaction mechanism. Finally, to eliminate the distributional modality gap by integrating complementary knowledge, the target-sensitive features of multiple modalities are implicitly transported based on the optimal transport interaction module. Our model achieves state-of-theart performance on three benchmark datasets: Twitter-15, Twitter-17 and Yelp, together with the extensive ablation study demonstrating the superiority and effectiveness of OtarNet.
PY - 2025
SE - 59
SP - 59
EP - 69
T2 - International Journal of Interactive Multimedia and Artificial Intelligence
TI - Optimal Target-Oriented Knowledge Transportation For Aspect-Based Multimodal Sentiment Analysis
UR - https://www.ijimai.org/journal/bibcite/reference/3423
VL - 9
SN - 1989-1660
ER -