李广力1,2, 杜臻1,2, 赵家程1,2, 刘颖1,2, 俞峰1,2, 李奕瑾1,2, 张忠诚1,2, 崔慧敏1,2(
)
收稿日期:2024-04-19
修回日期:2024-06-20
接受日期:2024-08-13
出版日期:2024-10-25
发布日期:2024-08-21
通讯作者:
崔慧敏
E-mail:cuihm@ict.ac.cn
基金资助:
Guangli LI1,2, Zhen DU1,2, Jiacheng ZHAO1,2, Ying LIU1,2, Feng YU1,2, Yijin LI1,2, Zhongcheng ZHANG1,2, Huimin CUI1,2(
)
Received:2024-04-19
Revised:2024-06-20
Accepted:2024-08-13
Online:2024-10-25
Published:2024-08-21
Contact:
Huimin CUI
E-mail:cuihm@ict.ac.cn
Supported by:摘要:
在人工智能等新兴应用对算力需求激增的背景下,编译技术作为软件和硬件之间的关键纽带,面临着前所未有的挑战和机遇。从领域编译器的发展趋势出发,着重讨论了面向新领域的编译技术;从全程序算子融合、动态形状张量编译、软硬件协同设计、计算安全等多个方面,总结并评述了面向新型应用范式与新型体系结构的具有代表性的领域编译技术;分析了领域编译技术在适应多样化计算平台、提升程序执行效率、保障软件安全以及支持芯片设计等方面的关键作用,并探讨了其应用前景及进一步的工作。
中图分类号:
李广力, 杜臻, 赵家程, 刘颖, 俞峰, 李奕瑾, 张忠诚, 崔慧敏. 面向新型应用范式与新型体系结构的编译技术[J]. 航空学报, 2024, 45(20): 630552.
Guangli LI, Zhen DU, Jiacheng ZHAO, Ying LIU, Feng YU, Yijin LI, Zhongcheng ZHANG, Huimin CUI. Compiler technologies for emerging application paradigms and advanced computer architectures[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(20): 630552.
| 1 | 李孟源, 李丹敏, 张笈玮, 等. 大数据背景下航天设备预测性维护技术研究[J]. 中国新通信, 2023, 25(2): 25-28. |
| LI M Y, LI D M, ZHANG J W, et al. Research on predictive maintenance technology of space equipment under the background of big data[J]. China New Telecommunications, 2023, 25(2): 25-28 (in Chinese). | |
| 2 | 吴小欢, 朱金福, 葛伟. 航线网络区间鲁棒优化设计[C]∥2011年全国博士生学术论坛, 2011. |
| WU XH, ZHU JF, GE W. Robust optimization design of route network intervals[C]∥ 2011 National Doctoral Academic Forum, 2011 (in Chinese). | |
| 3 | 邹子君.人工智能技术在空中交通管理中的应用[J].中国新通信, 2016(5): 51. |
| ZHOU Z J. Application of artificial intelligence technology in air traffic management[J]. China New Telecommunications, 2016(5): 51 (in Chinese). | |
| 4 | 张新苗, 余自武, 杨雨绮. 人工智能在波音787上的应用与思考[J]. 工业工程与管理, 2017, 22(6): 169-174. |
| ZHANG X M, YU Z W, YANG Y Q. Application and consideration to boeing 787 influenced by artificial intelligence[J]. Industrial Engineering and Management, 2017, 22(6): 169-174 (in Chinese). | |
| 5 | 李原百, 徐晓雯, 赵俊杰, 等. 基于OODA模型的兵棋推演训练平台构建[C]∥第十届中国指挥控制大会论文集, 2022. |
| LI Y B, XU X W, ZHAO J J, et al. Construction of war game training platform based on OODA model[C]∥ Proceedings of the 10th Chinese Command and Control Society Conference, 2022 (in Chinese). | |
| 6 | 李彭勇. 链接时死代码删除与基于模式匹配的机器码翻译[D]. 合肥: 中国科学技术大学, 2015. |
| LI P Y. Deleting dead code in link time and translating machine code based on pattern matching[D]. Hefei: University of Science and Technology of China, 2015 (in Chinese). | |
| 7 | ACHARYA A, BONDHUGULA U, COHEN A. Effective loop fusion in polyhedral compilation using fusion conflict graphs[J]. ACM Transactions on Architecture and Code Optimization, 2020, 17(4): 1-26. |
| 8 | 夏军, 戴华东, 杨学军. 基于线性表出的非奇异循环变换局部性优化方法[J]. 计算机学报, 2003, 26(12): 1609-1620. |
| XIA J, DAI H D, YANG X J. A linear expressing based approach for optimizing locality using non-singular loop transformations[J]. Chinese Journal of Computers, 2003, 26(12): 1609-1620 (in Chinese). | |
| 9 | 彭畅, 刘青枝, 陈长波. 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学, 2023, 45(12): 2121-2134. |
| PENG C, LIU Q Z, CHEN C B. Loop permutation and auto-tuning under polyhedral model[J]. Computer Engineering & Science, 2023, 45(12): 2121-2134 (in Chinese). | |
| 10 | LOZANO R C, CARLSSON M, BLINDELL G H, et al. Combinatorial register allocation and instruction scheduling[J]. ACM Transactions on Programming Languages and Systems, 2019, 41(3): 1-53. |
| 11 | 张军超, 连瑞琦, 张兆庆. 多寄存器组网络处理器上的寄存器分配技术[J]. 计算机学报, 2006, 29(1): 66-72. |
| ZHANG J C, LIAN R Q, ZHANG Z Q. Register allocation on network processors with multiple register banks[J]. Chinese Journal of Computers, 2006, 29(1): 66-72 (in Chinese). | |
| 12 | 高猛, 赵家程, 崔慧敏, 等. 位宽感知的寄存器绑定算法[J]. 软件学报, 2024, 35(6): 2631-2647. |
| GAO M, ZHAO J C, CUI H M, et al. Bitwidth-aware register binding algorithm[J]. Journal of Software, 2024, 35(6): 2631-2647 (in Chinese). | |
| 13 | 高伟, 赵荣彩, 韩林, 等. SIMD自动向量化编译优化概述[J]. 软件学报, 2015, 26(6): 1265-1284. |
| GAO W, ZHAO R C, HAN L, et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015, 26(6): 1265-1284 (in Chinese). | |
| 14 | 李春江, 黄娟娟, 徐颖, 等. 典型编译器自动向量化效果评估与分析[J]. 计算机科学, 2013, 40(4): 41-46. |
| LI C J, HUANG J J, XU Y, et al. Evaluation and analysis of effects of auto-vectorization in typical compilers[J]. Computer Science, 2013, 40(4): 41-46 (in Chinese). | |
| 15 | 冯竞舸, 贺也平, 陶秋铭. 自动向量化: 近期进展与展望[J]. 通信学报, 2022, 43(3): 180-195. |
| FENG J G, HE Y P, TAO Q M. Auto-vectorization: Recent development and prospect[J]. Journal on Communications, 2022, 43(3): 180-195 (in Chinese). | |
| 16 | MIDKIFF S P. Automatic parallelization: An overview of fundamental compiler techniques[M]. Cham: Springer International Publishing, 2012. |
| 17 | 马春燕, 吕炳旭, 叶许姣, 等. 基于 LLVM Pass 的复杂嵌套循环自动并行化框架[J]. 软件学报, 2022, 34(7): 3022-3042. |
| MA C Y, LV B X, YE X J, et al. Automatic parallelization framework for complex nested loops based on LLVM pass [J]. Journal of Software, 2023, 34(7): 3022-3042 (in Chinese). | |
| 18 | LI M Z, LIU Y, LIU X Y, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(3): 708-727. |
| 19 | CHEN T Q, MOREAU T, JIANG Z H, et al. TVM: An automated end-to-end optimizing compiler for deep learning[DB/OL]. arXiv preprint: 1802.04799, 2018. |
| 20 | LATTNER C, AMINI M, BONDHUGULA U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]∥2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway: IEEE Press, 2021: 2-14. |
| 21 | 陈超, 齐峰. 卷积神经网络的发展及其在计算机视觉领域中的应用综述[J]. 计算机科学, 2019, 46(3): 63-73. |
| CHEN C, QI F. Review on development of convolutional neural network and its application in computer vision[J]. Computer Science, 2019, 46(3): 63-73 (in Chinese). | |
| 22 | VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: A brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018: 7068349. |
| 23 | 赵京胜, 宋梦雪, 高祥. 自然语言处理发展及应用综述[J]. 信息技术与信息化, 2019(7): 142-145. |
| ZHAO J S, SONG M X, GAO X. Review on the development and application of natural language processing[J]. Information Technology and Informatization, 2019(7): 142-145 (in Chinese). | |
| 24 | DEGUANG C, JINLIN M A, ZIPING M A, et al. Review of pre-training techniques for natural language processing[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15(8): 1359. |
| 25 | 吴艳霞, 梁楷, 刘颖, 等. 深度学习FPGA加速器的进展与趋势[J]. 计算机学报, 2019, 42(11): 2461-2480. |
| WU Y X, LIANG K, LIU Y, et al. The progress and trends of FPGA-based accelerators in deep learning[J]. Chinese Journal of Computers, 2019, 42(11): 2461-2480 (in Chinese). | |
| 26 | CHEN Y J, CHEN T S, XU Z W, et al. DianNao family[J]. Communications of the ACM, 2016, 59(11): 105-112. |
| 27 | 鲁蔚征, 张峰, 贺寅烜, 等. 华为昇腾神经网络加速器性能评测与优化[J]. 计算机学报, 2022, 45(8): 1618-1637. |
| LU W Z, ZHANG F, HE Y X, et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers, 2022, 45(8): 1618-1637 (in Chinese). | |
| 28 | PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[DB/OL]. arXiv preprint: 1912.01703, 2019. |
| 29 | ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[DB/OL]. arXiv preprint: 1605.08695, 2016. |
| 30 | STAUNSTRUP J, WOLF W. Hardware/software co-design: Principles and practice[M]. Berlin: Springer Science & Business Media, 2013. |
| 31 | NIU W, GUAN J X, WANG Y Z, et al. DNNFusion: Accelerating deep neural networks execution with advanced operator fusion[C]∥Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: ACM, 2021: 883-898. |
| 32 | ZHENG Z, YANG X D, ZHAO P Z, et al. AStitch: Enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]∥ Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2022: 359-373. |
| 33 | LI A, ZHENG B J, PEKHIMENKO G, et al. Automatic horizontal fusion for GPU kernels[C]∥2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway: IEEE Press, 2022: 14-27. |
| 34 | MA L, XIE Z, YANG Z, et al. Rammer: Enabling holistic deep learning compiler optimizations with rTasks[C]∥14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020: 881-897. |
| 35 | ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8697-8710. |
| 36 | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 5987-5995. |
| 37 | ZHAO J, GAO X, XIA R, et al. Apollo: Automatic partition-based operator fusion through layer by layer optimization[J]. Proceedings of Machine Learning and Systems, 2022, 4: 1-19. |
| 38 | LI Y, ZHAO J, QIANQI S, et al. SIRIUS: Harvesting whole-program optimization opportunities for DNNs[J]. Proceedings of Machine Learning and Systems, 2023, 5: 1-17. |
| 39 | DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[DB/OL]. arXiv preprint: 1810.04805, 2018. |
| 40 | 吴帅, 徐勇, 赵东宁. 基于深度卷积网络的目标检测综述[J]. 模式识别与人工智能, 2018, 31(4): 335-346. |
| WU S, XU Y, ZHAO D N. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(4): 335-346 (in Chinese). | |
| 41 | ZHENG Z, PAN Z F, WANG D L, et al. BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach[J]. Proceedings of the ACM on Management of Data, 2023, 1(3): 1-29. |
| 42 | ZHENG B, JIANG Z, YU C H, et al. DietCode: Automatic optimization for dynamic tensor programs[J]. Proceedings of Machine Learning and Systems, 2022, 4: 848-863. |
| 43 | YU F, LI G L, ZHAO J C, et al. Optimizing dynamic-shape neural networks on accelerators via on-the-fly micro-kernel polymerization[C]∥Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2024: 1-16. |
| 44 | BEDOUKIAN P, ADIT N, PEGUERO E, et al. Software-defined vector processing on manycore fabrics[C]∥MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2021: 392-406. |
| 45 | KRASHINSKY R, BATTEN C, HAMPTON M, et al. The vector-thread architecture[C]∥ Proceedings of 31st Annual International Symposium on Computer Architecture. Piscataway: IEEE Press, 2004: 52-63. |
| 46 | LEE Y, AVIZIENIS R, BISHARA A, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators[C]∥Proceedings of the 38th annual international symposium on Computer architecture. New York: ACM, 2011: 129-140. |
| 47 | PARK Y, PARK J J K, PARK H, et al. Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability[C]∥2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE Press, 2012: 84-95. |
| 48 | ZHANG Z C, OU Y, LIU Y, et al. Occamy: Elastically sharing a SIMD co-processor across multiple CPU cores[C]∥ Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. New York: ACM, 2023: 483-497. |
| 49 | DENG Y J, WANG C X, YU S C, et al. StrongBox: A GPU TEE on arm endpoints[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 769-783. |
| 50 | VOLOS S, VASWANI K, BRUNO R. Graviton: Trusted execution environments on GPUs[C]∥13th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 18), 2018: 681-696. |
| 51 | JANG I, TANG A, KIM T, et al. Heterogeneous isolated execution for commodity GPUs[C]∥ Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2019: 455-468. |
| 52 | JIANG J Y, QI J, SHEN T X, et al. CRONUS: Fault-isolated, secure and high-performance heterogeneous computing for trusted execution environment[C]∥2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). Piscataway: IEEE Press, 2022: 124-143. |
| 53 | MAI H, ZHAO J, ZHENG H, et al. Honeycomb: Se-cure and efficient GPU executions via static Valida-tion[C]∥17th USENIX Symposium on Operating Sys-tems Design and Implementation (OSDI 23). 2023: 155-172. |
| 54 | 赖庆宽, 吕方, 贺春林, 等. 面向理想性能空间的跨架构编译分析方法[J]. 计算机研究与发展, 2021, 58(3): 668-680. |
| LAI Q K, LÜ F, HE C L, et al. An ideal performance oriented approach for cross-framework compiler analysis[J]. Journal of Computer Research and Development, 2021, 58(3): 668-680 (in Chinese). | |
| 55 | XING J R, WANG L Y, ZHANG S, et al. Bolt: Bridging the gap between auto-tuners and hardware-native performance[DB/OL]. arXiv preprint: 2110.15238, 2021. |
| 56 | FENG S Y, HOU B H, JIN H Y, et al. TensorIR: An abstraction for automatic tensorized program optimization[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2023: 804-817. |
| 57 | YU F, ZHAO J C, CUI H M, et al. VTensor: Using virtual tensors to build a layout-oblivious AI programming framework[J]. Journal of Computer Science and Technology, 2023, 38(5): 1074-1097. |
| 58 | HUANG G Y, BAI Y, LIU L, et al. ALCOP: Automatic load-compute pipelining in deep learning compiler for AI-GPUs[DB/OL]. arXiv preprint: 2210.16691, 2022. |
| 59 | LIU C, LU J, LI G W, et al. Detecting TensorFlow program bugs in real-world industrial environment[C]∥2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE Press, 2021: 55-66. |
| 60 | LU J, LI H F, LIU C, et al. Detecting missing-permission-check vulnerabilities in distributed cloud systems[C]∥ Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 2145-2158. |
| 61 | 张贵民, 李清宝, 曾光裕, 等. 运行时代码随机化防御代码复用攻击[J]. 软件学报, 2019, 30(9): 2772-2790. |
| ZHANG G M, LI Q B, ZENG G Y, et al. Defensing code reuse attacks using live code randomization[J]. Journal of Software, 2019, 30(9): 2772-2790 (in Chinese). | |
| 62 | KO Y, REZK T, SERRANO M. SecureJS compiler: Portable memory isolation in JavaScript[C]∥Proceedings of the 36th Annual ACM Symposium on Applied Computing. New York: ACM, 2021: 1265-1274. |
| 63 | MOREAU T, CHEN T Q, VEGA L, et al. A hardware-software blueprint for flexible deep learning specialization[J]. IEEE Micro, 2019, 39(5): 8-16. |
| 64 | THIERRY M, TIANQI C, ZIHENG J, et al. VTA: An open hardware-software stack for deep learning[DB/OL]. arXiv preprint: 1807.04188, 2018. |
| [1] | 陈亮, 孟凡星, 王成波, 张音旋, 孟琳书. 数字孪生技术在飞行器强度设计中的发展及应用[J]. 航空学报, 2025, 46(19): 532252-532252. |
| [2] | 郝振洋 曹尚 张凤婷 侯兰兰. 直升机桨毂顶置主动式作动系统设计[J]. 航空学报, 0, (): 1-0. |
| [3] | 王征 赵守智 李昊田 孙征 邵静 侯丞. 深空探测用穿冰探测器发展现状[J]. 航空学报, 0, (): 1-0. |
| [4] | 陶飞 张贺 刘蔚然 张辰源 魏宇鹏 易黎 邹孝付. 空天装备数字试验验证理论与关键技术[J]. 航空学报, 0, (): 1-0. |
| [5] | 蒲钒, 陈志杰, 刘杨, 耿欣, 朱永文, 任柯锦. 数字低空融合运行空中交通管理技术[J]. 航空学报, 2025, 46(11): 531331-531331. |
| [6] | 刘泓麟, 王冠, 安帅斌, 马少捷, 刘凯. 基于在线辨识的高速变构飞行器强适应控制[J]. 航空学报, 2025, 46(17): 331654-331654. |
| [7] | 林杰, 唐志共, 钱炜祺, 王岳青, 张鹏, 徐炜遐, 刘杰. 飞行器生成式模型气动设计研究进展与展望[J]. 航空学报, 2025, 46(10): 631679-631679. |
| [8] | 陈树生, 贾苜梁, 林家豪, 金世轶, 高正红, 王岳青, 马志强, 李铮, 段辰龙, 李佳伟. 生成式模型赋能飞行器技术应用研究进展与展望[J]. 航空学报, 2025, 46(10): 631194-631194. |
| [9] | 丁希仑, 陈一同, 王成才, 徐坤. 空间机器人操作技术研究现状与展望[J]. 航空学报, 2025, 46(6): 531556-531556. |
| [10] | 金栋平, 丁鼎峰, 伍霖, 文浩, 张晓彤, 孙加亮. 堆叠式卫星系统分离动力学关键技术与展望[J]. 航空学报, 2025, 46(5): 531342-531342. |
| [11] | 余莎莎, 陈星雨. 城市空中交通领域关键技术创新与挑战[J]. 航空学报, 2024, 45(S1): 730657-730657. |
| [12] | 郝振洋, 张凤婷, 杨健, 曹鑫. 并行独立控制策略下消振电力作动器系统[J]. 航空学报, 2024, 45(13): 329573-329573. |
| [13] | 赵志浩, 杨照华, 吴云, 余远金. 弱光环境下基于深度学习的单光子计数成像去噪方法[J]. 航空学报, 2025, 46(3): 630531-630531. |
| [14] | 陈阳, 蒋驰, 王璐, 郭绍刚, 石泰峡. 面向空间频谱感知的微波光子时频参数分析技术综述[J]. 航空学报, 2025, 46(3): 630529-630529. |
| [15] | 樊会涛, 段鹏飞, 袁成. 航空颠覆性技术初探[J]. 航空学报, 2024, 45(5): 529893-529893. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||
版权所有 © 航空学报编辑部
版权所有 © 2011航空学报杂志社
主管单位:中国科学技术协会 主办单位:中国航空学会 北京航空航天大学

