TY - JOUR
KW - Computer vision
KW - Deep Learning
KW - Gated Graph Neural Network
KW - HOI
KW - Image Classification
AU - Zhan Su
AU - Ruiyun Yu
AU - Shihao Zou
AU - Bingyang Guo
AU - Li Cheng
AB - Human-Object Interaction (HOI) detection focuses on human-centered visual relationship detection, which is a challenging task due to the complexity and diversity of image content. Unlike most recent HOI detection works that only rely on paired instance-level information in the union range, our proposed Spatial-aware Multilevel Parsing Network (SMPNet) uses a multi-level information detection strategy, including instance-level visual features of detected human-object pair, part-level related features of the human body, and scene-level features extracted by the graph neural network. After fusing the three levels of features, the HOI relationship is predicted. We validate our method on two public datasets, V-COCO and HICO-DET. Compared with prior works, our proposed method achieves the state-of-the-art results on both datasets in terms of mAProle, which demonstrates the effectiveness of our proposed multi-level information detection strategy.
IS - Regular issue
M1 - 2
N2 - Human-Object Interaction (HOI) detection focuses on human-centered visual relationship detection, which is a challenging task due to the complexity and diversity of image content. Unlike most recent HOI detection works that only rely on paired instance-level information in the union range, our proposed Spatial-aware Multilevel Parsing Network (SMPNet) uses a multi-level information detection strategy, including instance-level visual features of detected human-object pair, part-level related features of the human body, and scene-level features extracted by the graph neural network. After fusing the three levels of features, the HOI relationship is predicted. We validate our method on two public datasets, V-COCO and HICO-DET. Compared with prior works, our proposed method achieves the state-of-the-art results on both datasets in terms of mAProle, which demonstrates the effectiveness of our proposed multi-level information detection strategy.
PY - 2025
SE - 39
SP - 39
EP - 48
T2 - International Journal of Interactive Multimedia and Artificial Intelligence
TI - Spatial-Aware Multi-Level Parsing Network for Human-Object Interaction
UR - https://www.ijimai.org/journal/bibcite/reference/3334
VL - 9
SN - 1989-1660
ER -