For the problem of human-UAV swarm interactive collaborative perception, an interactive framework for collaborative control of swarm formation based on dual-model autonomous recognition of speech and gesture is constructed with the idea of deep learning. A channel fusion mechanism based on dual channel switching is proposed to realize multimodal interaction. The speech recognition model based on Streaming Multi-Layer Truncated Attention (SMLTA) provided by the Baidu cloud platform is used, and the deep learning platform is applied for self-training. The accuracy rate increases from 80.10% to 97.98%. Combining the depth information and bone information of Kinect V2, a Convolutional Neural Network (CNN) gesture recognition model based on feature fusion is constructed and trained. The average precision of the model is 98.33%, which is 1.16% higher than that of the decision tree model, and 0.33% higher than that of the traditional CNN model. Simulation and physical verification are carried out in the Robot Operating System (ROS)-Gazebo training scenario. The results show that the proposed interactive framework can effectively control UAV swarm formation, and the command execution success rate of the voice channel, gesture channel and channel switching can reach more than 90%, and has a higher interaction efficiency.
SU Lingfei
,
HUA Yongzhao
,
DONG Xiwang
,
REN Zhang
. Human-UAV swarm multi-modal intelligent interaction methods[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022
, 43(S1)
: 727001
-727001
.
DOI: 10.7527/S1000-6893.2022.27001
[1] Center for Strategic and Budgetary Assessments. Mosaic warfare: Exploiting artificial intelligence and autonomous systems to implement decision-centric operations[M]. ZHU B, ZHOU J, HU Y W, et al, translated. Beijing: Military Science Information Research Center, Academy of Military Sciences, 2020: 1-6 (in Chinese). 战略与预算评估中心. 马赛克战:利用人工智能和自主系统来实施决策中心战[M]. 朱兵, 周嘉, 胡彦文, 等, 译. 北京: 军事科学院军事科学信息研究中心, 2020: 1-6.
[2] The State Council of the People's Republic of China. Circular of the State Council on printing and distributing the development plan for a new generation of artificial intelligence[M]. Beijing: The State Council of the People's Republic of China, 2017 (in Chinese). 中华人民共和国国务院. 国务院关于印发新一代人工智能发展规划的通知[M]. 北京: 中华人民共和国国务院, 2017.
[3] LI Y X, ZHANG J Q, PAN D, et al. A study of speech recognition based on RNN-RBM language model[J]. Journal of Computer Research and Development, 2014, 51(9): 1936-1944 (in Chinese). 黎亚雄, 张坚强, 潘登, 等. 基于RNN-RBM语言模型的语音识别研究[J]. 计算机研究与发展, 2014, 51(9): 1936-1944.
[4] ZHOU N, AI J L. Speech control scheme design and simulation for UAV based on HMM and RNN[J]. Journal of System Simulation, 2020, 32(3): 464-471 (in Chinese). 周楠, 艾剑良. 基于HMM和RNN的无人机语音控制方案与仿真研究[J]. 系统仿真学报, 2020, 32(3): 464-471.
[5]
[6]
[7]
[8] RAVANELLI M, BRAKEL P, OMOLOGO M, et al. Light gated recurrent units for speech recognition[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018, 2(2): 92-102.
[9]
[10] KE Q H, BENNAMOUN M, AN S J, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2018, 27(6): 2842-2855.
[11] TAO J H, YANG M H, WANG Z L, et al. Non contact multi-channel natural interactive surgical environment under sterile conditions[J]. Journal of Software, 2019, 30(10): 2986-3004 (in Chinese). 陶建华, 杨明浩, 王志良, 等. 无菌条件非接触式多通道自然交互手术环境[J]. 软件学报, 2019, 30(10): 2986-3004.
[12] WANG W, ZHAO M R, GAO H N, et al. Human-computer interaction: Intention recognition based on EEG and eye tracking[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(2): 324290 (in Chinese). 王崴, 赵敏睿, 高虹霓, 等. 基于脑电和眼动信号的人机交互意图识别[J]. 航空学报, 2021, 42(2): 324290.
[13]
[14] WU X D, LUO R L, SHI T W, et al. The design of photographic tangent system based on Baidu AI[J]. Computer Knowledge and Technology, 2021, 17(3): 199-200, 203 (in Chinese). 吴旭东, 罗荣良, 史庭蔚, 等. 基于百度人工智能的拍照切题系统设计[J]. 电脑知识与技术, 2021, 17(3): 199-200, 203.
[15]
[16]
[17]
[18] LI K, WANG X C, DAI Y T, et al. Automatic detection of the underwater stationary artificial torpedo-shaped target based on SAS image[J]. Journal of Physics: Conference Series, 2020, 1626(1): 012086.
[19] WANG L Y, JALALPOUR Y, FENG W C. Context-aware image denoising with auto-threshold canny edge detection to suppress adversarial perturbation[J/OL]. Computer Science, (2021-01-14) [2021-11-12]. https:∥arxiv.org/abs/2101.05833.
[20] SONG C, ZHAO J J, WANG K, et al. A survey of few shot learning based on intelligent perception[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S1): 723756 (in Chinese). 宋闯, 赵佳佳, 王康, 等. 面向智能感知的小样本学习研究综述[J]. 航空学报, 2020, 41(S1): 723756.
[21]