针对超临界翼型气动修型策略的强化学习

李润泽; 张宇飞; 陈海昕

doi:10.7527/S1000-6893.2020.23810

航空学报 >

2021 , Vol. 42 >Issue 4: 523810 - 523810

DOI: https://doi.org/10.7527/S1000-6893.2020.23810

流体力学与飞行力学

针对超临界翼型气动修型策略的强化学习

李润泽 ,
张宇飞 ,
陈海昕

展开

清华大学航天航空学院, 北京 100084

收稿日期: 2020-01-08

修回日期: 2020-02-01

网络出版日期: 2020-02-21

基金资助

国家自然科学基金（11872230，91852108）；清华自主创新科研基金（2015Z22003）

收起

Reinforcement learning method for supercritical airfoil aerodynamic design

LI Runze ,
ZHANG Yufei ,
CHEN Haixin

Expand

School of Aerospace Engineering, Tsinghua University, Beijing 100084, China

Received date: 2020-01-08

Revised date: 2020-02-01

Online published: 2020-02-21

Supported by

National Natural Science Foundation of China (11872230, 91852108); Innovation Program of Tsinghua University (2015Z22003)

Fold

摘要

强化学习是一类用于学习策略的机器学习方法，通过模拟人的学习过程，与所处环境不断交互来学习动作策略，用以获得最大累积回报。以设计师在翼型气动设计中的增量修型过程为例，给出强化学习在气动优化设计中的要素定义和具体算法的实现。研究了预训练中选择不同示例对预训练和强化学习结果的影响，并将强化学习得到的策略模型在其他环境中进行了迁移测试验证。结果表明，合理的预训练能够有效提高强化学习的效率和最终策略的鲁棒性，且所形成的策略模型具有较好的迁移能力。

关键词： 强化学习; 增量修型; 近端策略优化(PPO); 预训练; 模仿学习; 迁移能力

本文引用格式

李润泽, 张宇飞, 陈海昕. 针对超临界翼型气动修型策略的强化学习[J]. 航空学报, 2021, 42(4): 523810-523810. DOI: 10.7527/S1000-6893.2020.23810

Abstract

Reinforcement learning as a machine learning method for learning policies learns in a way similar to human learning process, interacting with the environment and learning how to achieve more rewards. The elements and algorithms of reinforcement learning are defined and adjusted in this paper for the supercritical airfoil aerodynamic design process. The results of imitation learning are then studied, and the policies from the imitation learning are adopted in reinforcement learning. The influence of different pretraining processes is studied, and the final policies tested in other similar environments. The results show that pretraining can improve reinforcement learning efficiency and policy robustness. The final policies obtained in this study can also have satisfactory performance in other similar environments.

Key words： reinforcement learning; incremental modification; Proximal Policy Optimization (PPO); pretraining; imitation learning; application transferability

参考文献

[1] 李润泽, 张宇飞, 陈海昕. "人在回路"思想在飞机气动优化设计中演变与发展[J]. 空气动力学学报,2017, 35(4):529-543. LI R Z, ZHANG Y F, CHEN H X. Evolution and development of "man-in-loop" in aerodynamic optimization design[J]. Acta Aerodynamica Sinica, 2017, 35(4):529-543(in Chinese).
[2] 陈海昕, 邓凯文, 李润泽. 机器学习技术在气动优化中的应用[J]. 航空学报, 2019, 40(1):522480. CHEN H X, DENG K W, LI R Z. Utilization of machine learning technology in aerodynamic optimization[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(1):522480(in Chinese).
[3] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge:MIT Press, 2018.
[4] GARNIER P, VIQUERAT J, RABAULT J, et al. A review on deep reinforcement learning for fluid mechanics[DB/OL]. arXiv preprint:1908.04127,2019.
[5] RABAULT J, KUCHTA M, JENSEN A, et al. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control[J]. Journal of Fluid Mechanics, 2019, 865:281-302.
[6] NOVATI G, VERMA S, ALEXEEV D, et al. Synchronisation through learning for two self-propelled swimmers[J]. Bioinspiration & biomimetics, 2017, 12(3):036001.
[7] BUCCI M A, SEMERARO O, ALLAUZEN A, et al. Control of chaotic systems by deep reinforcement learning[DB/OL]. arXiv preprint:1906.07672,2019.
[8] LAMPTON A, NIKSCH A, VALASEK J. Morphing airfoils with four morphing parameters:AIAA-2008-7282[R]. Reston:AIAA, 2008.
[9] KULFAN B M. Universal parametric geometry representation method[J]. Journal of Aircraft, 2008, 45(1):142-158.
[10] STRAATHOF M H, VAN TOOREN M J, VOSKUIJL M. Aerodynamic shape parameterisation and optimisation of novel configurations[C]//Proceedings of the 2008 Royal Aeronautical Society Annual Applied Aerodynamics Research Conference, 2008.
[11] CASTONGUAY P, NADARAJAH S. Effect of shape parameterization on aerodynamic shape optimization:AIAA-2007-0059[R]. Reston:AIAA, 2007.
[12] Newton's method in optimization[EB/OL]. (2020-06-21)[2021-01-18]. https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization.
[13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347,2017.
[14] OpenAI five[EB/OL]. (2018-06-25)[2021-01-18]. https://openai.com/blog/openai-five/.
[15] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International Conference on Machine Learning, 2015:1889-1897.
[16] WANG Z, BAPST V, HEESS N, et al. Sample efficient actor-critic with experience replay[DB/OL]. arXiv preprint:1611.01224,2016.
[17] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint:1506.02438,2015.
[18] 李润泽,张宇飞,陈海昕.超临界机翼气动多目标设计的策略与方法[J].航空学报, 2020,41(5):623409. LI R Z, ZHANG Y F, CHEN H X. Strategies and methods for multi-objective aerodynamic optimization design for supercritical wings[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(5):623409(in Chinese).
[19] DE LA CRUZ G V, DU Y, TAYLOR M E. Pre-training with non-expert human demonstration for deep reinforcement learning[DB/OL]. arXiv preprint:1812.08904,2018.
[20] ZHANG X, MA H. Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations[DB/OL]. arXiv preprint:1801.10459,2018.
[21] ROSS S, GORDON G, BAGNELL D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011:627-635.

Options

文章导航

摘要
本文引用格式
Abstract
参考文献

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献