导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA ›› 2022, Vol. 43 ›› Issue (S1): 727001.doi: 10.7527/S1000-6893.2022.27001

• Swarm Intelligence and Cooperative Control • Previous Articles     Next Articles

Human-UAV swarm multi-modal intelligent interaction methods

SU Lingfei1, HUA Yongzhao2, DONG Xiwang2, REN Zhang1   

  1. 1. School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China;
    2. Institute of Artificial Intelligence, Beihang University, Beijing 100191, China
  • Received:2022-01-27 Revised:2022-02-17 Published:2022-03-04
  • Supported by:
    Defense Industrial Technology Development Program (JCKY2019601C106)

Abstract: For the problem of human-UAV swarm interactive collaborative perception, an interactive framework for collaborative control of swarm formation based on dual-model autonomous recognition of speech and gesture is constructed with the idea of deep learning. A channel fusion mechanism based on dual channel switching is proposed to realize multimodal interaction. The speech recognition model based on Streaming Multi-Layer Truncated Attention (SMLTA) provided by the Baidu cloud platform is used, and the deep learning platform is applied for self-training. The accuracy rate increases from 80.10% to 97.98%. Combining the depth information and bone information of Kinect V2, a Convolutional Neural Network (CNN) gesture recognition model based on feature fusion is constructed and trained. The average precision of the model is 98.33%, which is 1.16% higher than that of the decision tree model, and 0.33% higher than that of the traditional CNN model. Simulation and physical verification are carried out in the Robot Operating System (ROS)-Gazebo training scenario. The results show that the proposed interactive framework can effectively control UAV swarm formation, and the command execution success rate of the voice channel, gesture channel and channel switching can reach more than 90%, and has a higher interaction efficiency.

Key words: deep learning, human-computer interaction, UAV swarm, speech recognition, gesture recognition

CLC Number: