02323nas a2200289 4500000000100000000000100001008004100002260001200043653001900055653002500074653002900099653002700128653003900155653003600194653003400230653001800264653002300282653003200305100002100337700002100358245008900379856008100468300001000549490000600559520145400565022001402019 2021 d c12/202110aHuman Activity10aActivity recognition10aHuman Detection Activity10aSupport Vector Machine10aConvolutional Neural Network (CNN)10a3D-Convolutional Neural Network10aLong Short Term Memory (LSTM)10aDeep Learning10aGenetic Algorithms10aParticle Swarm Optimization1 aKamal Kant Verma1 aBrij Mohan Singh00aDeep Multi-Model Fusion for Human Activity Recognition Using Evolutionary Algorithms uhttps://www.ijimai.org/journal/sites/default/files/2021-11/ijimai7_2_5_0.pdf a44-580 v73 aMachine recognition of the human activities is an active research area in computer vision. In previous study, either one or two types of modalities have been used to handle this task. However, the grouping of maximum information improves the recognition accuracy of human activities. Therefore, this paper proposes an automatic human activity recognition system through deep fusion of multi-streams along with decision-level score optimization using evolutionary algorithms on RGB, depth maps and 3d skeleton joint information. Our proposed approach works in three phases, 1) space-time activity learning using two 3D Convolutional Neural Network (3DCNN) and a Long Sort Term Memory (LSTM) network from RGB, Depth and skeleton joint positions 2) Training of SVM using the activities learned from previous phase for each model and score generation using trained SVM 3) Score fusion and optimization using two Evolutionary algorithm such as Genetic algorithm (GA) and Particle Swarm Optimization (PSO) algorithm. The proposed approach is validated on two 3D challenging datasets, MSRDailyActivity3D and UTKinectAction3D. Experiments on these two datasets achieved 85.94% and 96.5% accuracies, respectively. The experimental results show the usefulness of the proposed representation. Furthermore, the fusion of different modalities improves recognition accuracies rather than using one or two types of information and obtains the state-of-art results. a1989-1660