A Progressive Skip Reasoning Fusion Method for Multi-Modal Classification

Authors: Qian Guo, Xinyan Liang, Yuhua Qian, Zhihua Cui, Jie Wen

Abstract:

Abstract In multi-modal classification tasks, a good fusion algorithm can effectively integrate and process multi-modal data, thereby significantly improving its performance. Researchers often focus on the design of complex fusion operators and have proposed numerous fusion operators, while paying less attention to the design of feature fusion usage, specifically how features should be fused to better facilitate multi-modal classification tasks. In this article, we propose a progressive skip reasoning fusion network (PSRFN) to make some attempts to address this issue. Firstly, unlike most existing multi-modal fusion methods that only use one fusion operator in a single stage to fuse all view features, PSRFN utilizes the progressive skip reasoning (PSR) block to fuse all views with a fusion operator at each layer. Specifically, each PSR block utilizes all view features and the fused features from the previous layer to jointly obtain the fused features for the current layer. Secondly, each PSR block utilizes a dual-weighted fusion strategy with learnable parameters to adaptively allocate weights during the fusion process. The first level of weighting assigns weights to each view feature, while the second level assigns weights to the fused features from the previous layer and the fused features obtained from the first level of weighting in the current layer. This strategy ensures that the PSR block can dynamically adjust the weights based on the actual contribution of features. Finally, to enable the model to fully utilize feature information from different levels for feature fusion, the skip connections are adopted between PSR blocks. Extensive experiment results on six real multi-modal datasets show that a better usage for fusion operator is indeed able to improve performance.

Keywords:

MM2024_final.pdf

Thu Sep 12 11:53:00 CST 2024