[1] 李润泽, 张宇飞, 陈海昕. "人在回路"思想在飞机气动优化设计中演变与发展[J]. 空气动力学学报,2017, 35(4):529-543. LI R Z, ZHANG Y F, CHEN H X. Evolution and development of "man-in-loop" in aerodynamic optimization design[J]. Acta Aerodynamica Sinica, 2017, 35(4):529-543(in Chinese). [2] 陈海昕, 邓凯文, 李润泽. 机器学习技术在气动优化中的应用[J]. 航空学报, 2019, 40(1):522480. CHEN H X, DENG K W, LI R Z. Utilization of machine learning technology in aerodynamic optimization[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(1):522480(in Chinese). [3] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge:MIT Press, 2018. [4] GARNIER P, VIQUERAT J, RABAULT J, et al. A review on deep reinforcement learning for fluid mechanics[DB/OL]. arXiv preprint:1908.04127,2019. [5] RABAULT J, KUCHTA M, JENSEN A, et al. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control[J]. Journal of Fluid Mechanics, 2019, 865:281-302. [6] NOVATI G, VERMA S, ALEXEEV D, et al. Synchronisation through learning for two self-propelled swimmers[J]. Bioinspiration & biomimetics, 2017, 12(3):036001. [7] BUCCI M A, SEMERARO O, ALLAUZEN A, et al. Control of chaotic systems by deep reinforcement learning[DB/OL]. arXiv preprint:1906.07672,2019. [8] LAMPTON A, NIKSCH A, VALASEK J. Morphing airfoils with four morphing parameters:AIAA-2008-7282[R]. Reston:AIAA, 2008. [9] KULFAN B M. Universal parametric geometry representation method[J]. Journal of Aircraft, 2008, 45(1):142-158. [10] STRAATHOF M H, VAN TOOREN M J, VOSKUIJL M. Aerodynamic shape parameterisation and optimisation of novel configurations[C]//Proceedings of the 2008 Royal Aeronautical Society Annual Applied Aerodynamics Research Conference, 2008. [11] CASTONGUAY P, NADARAJAH S. Effect of shape parameterization on aerodynamic shape optimization:AIAA-2007-0059[R]. Reston:AIAA, 2007. [12] Newton's method in optimization[EB/OL]. (2020-06-21)[2021-01-18]. https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization. [13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347,2017. [14] OpenAI five[EB/OL]. (2018-06-25)[2021-01-18]. https://openai.com/blog/openai-five/. [15] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International Conference on Machine Learning, 2015:1889-1897. [16] WANG Z, BAPST V, HEESS N, et al. Sample efficient actor-critic with experience replay[DB/OL]. arXiv preprint:1611.01224,2016. [17] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint:1506.02438,2015. [18] 李润泽,张宇飞,陈海昕.超临界机翼气动多目标设计的策略与方法[J].航空学报, 2020,41(5):623409. LI R Z, ZHANG Y F, CHEN H X. Strategies and methods for multi-objective aerodynamic optimization design for supercritical wings[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(5):623409(in Chinese). [19] DE LA CRUZ G V, DU Y, TAYLOR M E. Pre-training with non-expert human demonstration for deep reinforcement learning[DB/OL]. arXiv preprint:1812.08904,2018. [20] ZHANG X, MA H. Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations[DB/OL]. arXiv preprint:1801.10459,2018. [21] ROSS S, GORDON G, BAGNELL D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011:627-635. |