基于价值滤波的空战机动决策优化方法

doi:10.7527/S1000-6893.2023.28871

Abstract

Abstract:

To address the issues of low data utilization efficiency and convergence difficulty of traditional reinforcement learning algorithm in air-combat maneuvering decision optimization with large state space， the concept and principle of value filter are proposed and analyzed. A reinforcement learning algorithm named Demonstration Policy Constrain （DPC） is presented based on the value filter. A maneuvering decision optimization method based on the DPC algorithm is designed. With the value filter， the state-value based advantage data of the replay buffer and the demonstration buffer are extracted to constrain the optimization direction of the policy. Based on the JSBSim's aerodynamic model of F-16 Aircraft， the simulation results show that the convergence efficiency of the algorithm is significantly improved and the sub-optimal problem of the demonstration policy is mitigated， and the maneuvering decision method proposed achieves good intelligence.

Key words: value filter, policy constrain, maneuvering decision, reinforcement learning, imitation learning

CLC Number:

V249.12

Yupeng FU, Xiangyang DENG, Ziqiang ZHU, Limin ZHANG. Value-filter based air-combat maneuvering optimization[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 628871-628871.

Figures/Tables 12

Fig.1

Fig.2

Table 1

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Fig. 9

Fig.10

Fig.11

References 34

1	WANG Z A， LI H， WU H L， et al. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm［J］. Mathematical Problems in Engineering， 2020， 2020： 1-17.
2	马文，李辉，王壮，等. 基于深度随机博弈的近距空战机动决策［J］. 系统工程与电子技术， 2021， 43（2）： 443-451.
	MA W， LI H， WANG Z， et al. Close air combat maneuver decision based on deep stochastic game［J］. Systems Engineering and Electronics， 2021， 43（2）： 443-451 （in Chinese）.
3	李宪港，李强. 典型智能博弈系统技术分析及指控系统智能化发展展望［J］. 智能科学与技术学报， 2020， 2（1）： 36-42.
	LI X G， LI Q. Technical analysis of typical intelligent game system and development prospect of intelligent command and control system［J］. Chinese Journal of Intelligent Science and Technology， 2020， 2（1）： 36-42 （in Chinese）.
4	POPE A P， IDE J S， MIĆOVIĆ D， et al. Hierarchical reinforcement learning for air-to-air combat［C］∥2021 International Conference on Unmanned Aircraft Systems （ICUAS）. Piscataway： IEEE Press， 2021： 275-284.
5	SUFIYAN D， WIN L T S， WIN S K H， et al. A reinforcement learning approach for control of a nature-inspired aerial vehicle［C］∥2019 International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2019： 6030-6036.
6	ZHEN Y， HAO M R， SUN W D. Deep reinforcement learning attitude control of fixed-wing UAVs［C］∥2020 3rd International Conference on Unmanned Systems （ICUS）. Piscataway： IEEE Press， 2020： 239-244.
7	WANG C， YAN C， XIANG X， et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs［C］∥Asian Conference on Machine Learning. Berlin： Springer， 2020： 239-244.
8	周攀，黄江涛，章胜，等. 基于深度强化学习的智能空战决策与仿真［J］. 航空学报， 2023， 44（4）： 126731.
	ZHOU P， HUANG J T， ZHANG S， et al. Intelligent air combat decision making and simulation based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（4）： 126731 （in Chinese）.
9	吴宜珈，赖俊，陈希亮，等. 强化学习算法在超视距空战辅助决策上的应用研究［J］. 航空兵器， 2021， 28（2）： 55-61.
	WU Y J， LAI J， CHEN X L， et al. Research on the application of reinforcement learning algorithm in decision support of beyond-visual-range air combat［J］. Aero Weaponry， 2021， 28（2）： 55-61 （in Chinese）.
10	王欢，周旭，邓亦敏，等. 分层决策多机空战对抗方法［J］. 中国科学：信息科学， 2022， 52（12）： 2225-2238.
	WANG H， ZHOU X， DENG Y M， et al. A hierarchical decision-making method for multi-aircraft air combat confrontation［J］. Scientia Sinica （Informationis）， 2022， 52（12）： 2225-2238 （in Chinese）.
11	POMERLEAU D A. Alvinn： An autonomous land vehicle in a neural network［C］∥Conference and Workshop on Neural Information Processing Systems. New York： ACM， 1989： 305-313.
12	BOJARSKI M， DEL TESTA D， DWORAKOWSKI D， et al. End to end learning for self-driving cars［DB/OL］. arXiv preprint： 1604.07316. 2016.
13	GIUSTI A， GUZZI J， CIREŞAN D C， et al. A machine learning approach to visual perception of forest trails for mobile robots［J］. IEEE Robotics and Automation Letters， 2016， 1（2）： 661-667.
14	NAKANISHI J， MORIMOTO J， ENDO G， et al. Learning from demonstration and adaptation of biped locomotion［J］. Robotics and Autonomous Systems， 2004， 47（2-3）： 79-91.
15	ROSS S， GORDON G J， BAGNELL J A. A reduction of imitation learning and structured prediction to No-regret online learning［C］∥Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. New York： PMLR， 2011： 627–635.
16	NG A Y， RUSSELL S J. Algorithms for inverse reinforcement learning［C］∥Proceedings of the Seventeenth International Conference on Machine Learning. New York： ACM， 2000： 663-670.
17	ZIEBART B D， MAAS A， BAGNELL J A， et al. Maximum entropy inverse reinforcement learning［C］∥ Proceedings of the 23rd National Conference on Artificial Intelligence. New York： ACM， 2008： 1433-1438.
18	FINN C， LEVINE S， ABBEEL P. Guided cost learning： Deep inverse optimal control via policy optimization［C］∥Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York： ACM， 2016： 49-58.
19	NAIR A， MCGREW B， ANDRYCHOWICZ M， et al. Overcoming exploration in reinforcement learning with demonstrations［C］∥2018 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2018： 6292-6299.
20	XU H R， ZHAN X Y， YIN H L， et al. Discriminator-weighted offline imitation learning from suboptimal demonstrations［C］∥Proceedings of the 39th International Conference on Machine Learning. New York： ACM， 2022： 24725-24742.
21	VINYALS O， BABUSCHKIN I， CZARNECKI W M， et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning［J］. Nature， 2019， 575（7782）： 350-354.
22	WANG P， LIU D P， CHEN J Y， et al. Decision making for autonomous driving via augmented adversarial inverse reinforcement learning［C］∥2021 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2021： 1036-1042.
23	俞扬，詹德川，周志华，等. 基于模仿学习和强化学习算法的无人机飞行控制方法： CN112162564B［P］. 2021-09-28.
	YU Y， ZHAN D C， ZHOU Z H， et al. Unmanned aerial vehicle flight control method based on imitation learning and reinforcement learning algorithms： CN112162564B［P］. 2021-09-28 （in Chinese）.
24	ZHU Z D， LIN K X， DAI B， et al. Self-adaptive imitation learning： Learning tasks with delayed rewards from sub-optimal demonstrations［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2022， 36（8）： 9269-9277.
25	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［C］∥International Conference on Machine Learning. New York： ACM， 2015： 1889-1897.
26	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［DB/OL］. arXiv preprint： 1707.06347， 2017.
27	李茹杨，彭慧民，李仁刚，等. 强化学习算法与应用综述［J］. 计算机系统应用， 2020， 29（12）： 13-25.
	LI R Y， PENG H M， LI R G， et al. Overview on algorithms and applications for reinforcement learning［J］. Computer Systems and Applications， 2020， 29（12）： 13-25 （in Chinese）.
28	OH J， GUO Y， SINGH S， et al. Self-imitation learning ［C］∥Proceedings of the 35th International Conference on Machine Learning. New York： ACM， 2018： 3778-3887.
29	HAARNOJA T， TANG H R， ABBEEL P， et al. Reinforcement learning with deep energy-based policies［C］∥Proceedings of the 34th International Conference on Machine Learning-Volume 70. New York： ACM， 2017： 1352-1361.
30	LI C， WU F G， ZHAO J S. Accelerating self-imitation learning from demonstrations via policy constraints and Q-ensemble［C］∥2023 International Joint Conference on Neural Networks （IJCNN）. Piscataway： IEEE Press， 2023： 1-8.
31	SCHULMAN J， MORITZ P， LEVINE S， et al. High-dimensional continuous control using generalized advantage estimation［DB/OL］. arXiv preprint： 1506.02438， 2015.
32	KINGMA D P and BA J. Adam： A method for stochastic optimization［C］∥International Conference for Learning Representations （ICLR）. San Juan： Puerto Rico， 2015.
33	MCGREW J S， HOW J P， WILLIAMS B， et al. Air-combat strategy using approximate dynamic programming［J］. Journal of Guidance， Control， and Dynamics， 2010， 33（5）： 1641-1654.
34	Fujimoto S， Gu S S. A minimalist approach to offline reinforcement learning［C］∥Thirty-Fifth Conference on Neural Information Processing Systems （NeurIPS）. New York： ACM， 2021： 20132-20145.

名称	数值	名称	数值
Actor网络	23×（256）₄×4	#Worker N	6
Critic网络	23×（256）₄×1	K	8
$α$ ， $β$	1×10^-4	$D n o n$ size	1 024
$ε$	0.2	batch size	256
$γ$	0.998	$D E$ size	2×10⁵
$λ$	0.95	$D o f f$ size	1×10⁶
$η 0$	1	$ζ$	5×10^-6

[1]	Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136.
[2]	Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683.
[3]	Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.
[4]	Jiaxiu YANG, Xinkai LI, Hongli ZHANG, Hao WANG. Time-varying formation control for heterogeneous clusters with switching topologies via reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(10): 329166-329166.
[5]	Bing XIAO, Haichao ZHANG. Reinforcement learning robust optimal control for spacecraft attitude stabilization [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(1): 628890-628890.
[6]	Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770.
[7]	Youpeng DENG, Jiaxuan FAN, Yan ZHENG, Zhenya WANG, Yongliang LYU, Yuxiao LI. Multiagent opponent modeling with incompleted information [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729782-729782.
[8]	Weilin NI, Yonghai WANG, Cong XU, Fenghua CHI, Haizhao LIANG. Cooperative game guidance method for hypersonic vehicles based on reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729400-729400.
[9]	Zhilin FAN, Hongyong YANG, Yilin HAN. Target round-up control for multi-agent systems based on reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727487-727487.
[10]	Xiaowei FU, Zhe XU, Jindong ZHU, Nan WANG. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3 [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327083-327083.
[11]	Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762.
[12]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[13]	Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569.
[14]	Chenglei YUE, Xuechuan WANG, Xiaokui YUE, Ting SONG. A spacecraft rendezvous and docking method based on inverse reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 328420-328420.
[15]	Shuyi GAO, Defu LIN, Duo ZHENG, Xinyu HU. Intelligent cooperative interception strategy of aircraft against cluster attack [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 328301-328301.

Value-filter based air-combat maneuvering optimization

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 34

Related Articles 15

Recommended Articles

Metrics

Comments