针对人与无人机集群交互式协同感知问题,借助深度学习技术,构建了基于语音和手势双模型自主识别集群编队协同控制的交互框架,并提出了一种基于双通道切换的通道融合机制,从而实现多模态交互。使用百度云平台基于流式多级截断注意力(SMLTA)的语音识别模型,采用深度学习平台进行了自训练,在应用场景下的准确率由80.10%提升至97.98%。结合Kinect V2的深度信息与骨骼信息,构建与训练了基于特征融合的卷积神经网络(CNN)手势识别模型,平均精确率为98.33%,相较于传统决策树模型提升了1.16%,相较于传统CNN模型提升了0.33%。最后,在机器人操作系统(ROS)-Gazebo训练场景下进行了仿真验证和实物验证。实验结果表明:提出的交互框架能有效控制无人机集群进行编队,语音通道、手势通道和通道切换的指令执行成功率均达90%以上,且具有较高的交互效率。
For the problem of human-UAV swarm interactive collaborative perception, an interactive framework for collaborative control of swarm formation based on dual-model autonomous recognition of speech and gesture is constructed with the idea of deep learning. A channel fusion mechanism based on dual channel switching is proposed to realize multimodal interaction. The speech recognition model based on Streaming Multi-Layer Truncated Attention (SMLTA) provided by the Baidu cloud platform is used, and the deep learning platform is applied for self-training. The accuracy rate increases from 80.10% to 97.98%. Combining the depth information and bone information of Kinect V2, a Convolutional Neural Network (CNN) gesture recognition model based on feature fusion is constructed and trained. The average precision of the model is 98.33%, which is 1.16% higher than that of the decision tree model, and 0.33% higher than that of the traditional CNN model. Simulation and physical verification are carried out in the Robot Operating System (ROS)-Gazebo training scenario. The results show that the proposed interactive framework can effectively control UAV swarm formation, and the command execution success rate of the voice channel, gesture channel and channel switching can reach more than 90%, and has a higher interaction efficiency.
[1] Center for Strategic and Budgetary Assessments. Mosaic warfare: Exploiting artificial intelligence and autonomous systems to implement decision-centric operations[M]. ZHU B, ZHOU J, HU Y W, et al, translated. Beijing: Military Science Information Research Center, Academy of Military Sciences, 2020: 1-6 (in Chinese). 战略与预算评估中心. 马赛克战:利用人工智能和自主系统来实施决策中心战[M]. 朱兵, 周嘉, 胡彦文, 等, 译. 北京: 军事科学院军事科学信息研究中心, 2020: 1-6.
[2] The State Council of the People's Republic of China. Circular of the State Council on printing and distributing the development plan for a new generation of artificial intelligence[M]. Beijing: The State Council of the People's Republic of China, 2017 (in Chinese). 中华人民共和国国务院. 国务院关于印发新一代人工智能发展规划的通知[M]. 北京: 中华人民共和国国务院, 2017.
[3] LI Y X, ZHANG J Q, PAN D, et al. A study of speech recognition based on RNN-RBM language model[J]. Journal of Computer Research and Development, 2014, 51(9): 1936-1944 (in Chinese). 黎亚雄, 张坚强, 潘登, 等. 基于RNN-RBM语言模型的语音识别研究[J]. 计算机研究与发展, 2014, 51(9): 1936-1944.
[4] ZHOU N, AI J L. Speech control scheme design and simulation for UAV based on HMM and RNN[J]. Journal of System Simulation, 2020, 32(3): 464-471 (in Chinese). 周楠, 艾剑良. 基于HMM和RNN的无人机语音控制方案与仿真研究[J]. 系统仿真学报, 2020, 32(3): 464-471.
[5]
[6]
[7]
[8] RAVANELLI M, BRAKEL P, OMOLOGO M, et al. Light gated recurrent units for speech recognition[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018, 2(2): 92-102.
[9]
[10] KE Q H, BENNAMOUN M, AN S J, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2018, 27(6): 2842-2855.
[11] TAO J H, YANG M H, WANG Z L, et al. Non contact multi-channel natural interactive surgical environment under sterile conditions[J]. Journal of Software, 2019, 30(10): 2986-3004 (in Chinese). 陶建华, 杨明浩, 王志良, 等. 无菌条件非接触式多通道自然交互手术环境[J]. 软件学报, 2019, 30(10): 2986-3004.
[12] WANG W, ZHAO M R, GAO H N, et al. Human-computer interaction: Intention recognition based on EEG and eye tracking[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(2): 324290 (in Chinese). 王崴, 赵敏睿, 高虹霓, 等. 基于脑电和眼动信号的人机交互意图识别[J]. 航空学报, 2021, 42(2): 324290.
[13]
[14] WU X D, LUO R L, SHI T W, et al. The design of photographic tangent system based on Baidu AI[J]. Computer Knowledge and Technology, 2021, 17(3): 199-200, 203 (in Chinese). 吴旭东, 罗荣良, 史庭蔚, 等. 基于百度人工智能的拍照切题系统设计[J]. 电脑知识与技术, 2021, 17(3): 199-200, 203.
[15]
[16]
[17]
[18] LI K, WANG X C, DAI Y T, et al. Automatic detection of the underwater stationary artificial torpedo-shaped target based on SAS image[J]. Journal of Physics: Conference Series, 2020, 1626(1): 012086.
[19] WANG L Y, JALALPOUR Y, FENG W C. Context-aware image denoising with auto-threshold canny edge detection to suppress adversarial perturbation[J/OL]. Computer Science, (2021-01-14) [2021-11-12]. https:∥arxiv.org/abs/2101.05833.
[20] SONG C, ZHAO J J, WANG K, et al. A survey of few shot learning based on intelligent perception[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S1): 723756 (in Chinese). 宋闯, 赵佳佳, 王康, 等. 面向智能感知的小样本学习研究综述[J]. 航空学报, 2020, 41(S1): 723756.
[21]