Spatial-Aware Multi-Level Parsing Network for Human-Object Interaction
| Author | |
| Keywords | |
| Abstract |
Human-Object Interaction (HOI) detection focuses on human-centered visual relationship detection, which is a challenging task due to the complexity and diversity of image content. Unlike most recent HOI detection works that only rely on paired instance-level information in the union range, our proposed Spatial-aware Multilevel Parsing Network (SMPNet) uses a multi-level information detection strategy, including instance-level visual features of detected human-object pair, part-level related features of the human body, and scene-level features extracted by the graph neural network. After fusing the three levels of features, the HOI relationship is predicted. We validate our method on two public datasets, V-COCO and HICO-DET. Compared with prior works, our proposed method achieves the state-of-the-art results on both datasets in terms of mAProle, which demonstrates the effectiveness of our proposed multi-level information detection strategy.
|
| Year of Publication |
2025
|
| Journal |
International Journal of Interactive Multimedia and Artificial Intelligence
|
| Volume |
9
|
| Start Page |
39
|
| Issue |
Regular issue
|
| Number |
2
|
| Number of Pages |
39-48
|
| Date Published |
03/2025
|
| ISSN Number |
1989-1660
|
| URL | |
| DOI | |
| Attachment |
ijimai9_2_4.pdf7.7 MB
|
| Acknowledgment |
This work is supported by the National Natural Science Foundation of China (62072094) and the LiaoNing Revitalization Talents Program (XLYC2005001).
|