Acta Aeronautica et Astronautica Sinica ›› 2024, Vol. 45 ›› Issue (20): 630552.doi: 10.7527/S1000-6893.2024.30552
• Aeronautics Computing and Simulation Technique • Previous Articles
Guangli LI1,2, Zhen DU1,2, Jiacheng ZHAO1,2, Ying LIU1,2, Feng YU1,2, Yijin LI1,2, Zhongcheng ZHANG1,2, Huimin CUI1,2()
Huimin CUI
Supported by:
CLC Number:
Guangli LI, Zhen DU, Jiacheng ZHAO, Ying LIU, Feng YU, Yijin LI, Zhongcheng ZHANG, Huimin CUI. Compiler technologies for emerging application paradigms and advanced computer architectures[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(20): 630552.
1 | 李孟源, 李丹敏, 张笈玮, 等. 大数据背景下航天设备预测性维护技术研究[J]. 中国新通信, 2023, 25(2): 25-28. |
LI M Y, LI D M, ZHANG J W, et al. Research on predictive maintenance technology of space equipment under the background of big data[J]. China New Telecommunications, 2023, 25(2): 25-28 (in Chinese). | |
2 | 吴小欢, 朱金福, 葛伟. 航线网络区间鲁棒优化设计[C]∥2011年全国博士生学术论坛, 2011. |
WU XH, ZHU JF, GE W. Robust optimization design of route network intervals[C]∥ 2011 National Doctoral Academic Forum, 2011 (in Chinese). | |
3 | 邹子君.人工智能技术在空中交通管理中的应用[J].中国新通信, 2016(5): 51. |
ZHOU Z J. Application of artificial intelligence technology in air traffic management[J]. China New Telecommunications, 2016(5): 51 (in Chinese). | |
4 | 张新苗, 余自武, 杨雨绮. 人工智能在波音787上的应用与思考[J]. 工业工程与管理, 2017, 22(6): 169-174. |
ZHANG X M, YU Z W, YANG Y Q. Application and consideration to boeing 787 influenced by artificial intelligence[J]. Industrial Engineering and Management, 2017, 22(6): 169-174 (in Chinese). | |
5 | 李原百, 徐晓雯, 赵俊杰, 等. 基于OODA模型的兵棋推演训练平台构建[C]∥第十届中国指挥控制大会论文集, 2022. |
LI Y B, XU X W, ZHAO J J, et al. Construction of war game training platform based on OODA model[C]∥ Proceedings of the 10th Chinese Command and Control Society Conference, 2022 (in Chinese). | |
6 | 李彭勇. 链接时死代码删除与基于模式匹配的机器码翻译[D]. 合肥: 中国科学技术大学, 2015. |
LI P Y. Deleting dead code in link time and translating machine code based on pattern matching[D]. Hefei: University of Science and Technology of China, 2015 (in Chinese). | |
7 | ACHARYA A, BONDHUGULA U, COHEN A. Effective loop fusion in polyhedral compilation using fusion conflict graphs[J]. ACM Transactions on Architecture and Code Optimization, 2020, 17(4): 1-26. |
8 | 夏军, 戴华东, 杨学军. 基于线性表出的非奇异循环变换局部性优化方法[J]. 计算机学报, 2003, 26(12): 1609-1620. |
XIA J, DAI H D, YANG X J. A linear expressing based approach for optimizing locality using non-singular loop transformations[J]. Chinese Journal of Computers, 2003, 26(12): 1609-1620 (in Chinese). | |
9 | 彭畅, 刘青枝, 陈长波. 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学, 2023, 45(12): 2121-2134. |
PENG C, LIU Q Z, CHEN C B. Loop permutation and auto-tuning under polyhedral model[J]. Computer Engineering & Science, 2023, 45(12): 2121-2134 (in Chinese). | |
10 | LOZANO R C, CARLSSON M, BLINDELL G H, et al. Combinatorial register allocation and instruction scheduling[J]. ACM Transactions on Programming Languages and Systems, 2019, 41(3): 1-53. |
11 | 张军超, 连瑞琦, 张兆庆. 多寄存器组网络处理器上的寄存器分配技术[J]. 计算机学报, 2006, 29(1): 66-72. |
ZHANG J C, LIAN R Q, ZHANG Z Q. Register allocation on network processors with multiple register banks[J]. Chinese Journal of Computers, 2006, 29(1): 66-72 (in Chinese). | |
12 | 高猛, 赵家程, 崔慧敏, 等. 位宽感知的寄存器绑定算法[J]. 软件学报, 2024, 35(6): 2631-2647. |
GAO M, ZHAO J C, CUI H M, et al. Bitwidth-aware register binding algorithm[J]. Journal of Software, 2024, 35(6): 2631-2647 (in Chinese). | |
13 | 高伟, 赵荣彩, 韩林, 等. SIMD自动向量化编译优化概述[J]. 软件学报, 2015, 26(6): 1265-1284. |
GAO W, ZHAO R C, HAN L, et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015, 26(6): 1265-1284 (in Chinese). | |
14 | 李春江, 黄娟娟, 徐颖, 等. 典型编译器自动向量化效果评估与分析[J]. 计算机科学, 2013, 40(4): 41-46. |
LI C J, HUANG J J, XU Y, et al. Evaluation and analysis of effects of auto-vectorization in typical compilers[J]. Computer Science, 2013, 40(4): 41-46 (in Chinese). | |
15 | 冯竞舸, 贺也平, 陶秋铭. 自动向量化: 近期进展与展望[J]. 通信学报, 2022, 43(3): 180-195. |
FENG J G, HE Y P, TAO Q M. Auto-vectorization: Recent development and prospect[J]. Journal on Communications, 2022, 43(3): 180-195 (in Chinese). | |
16 | MIDKIFF S P. Automatic parallelization: An overview of fundamental compiler techniques[M]. Cham: Springer International Publishing, 2012. |
17 | 马春燕, 吕炳旭, 叶许姣, 等. 基于 LLVM Pass 的复杂嵌套循环自动并行化框架[J]. 软件学报, 2022, 34(7): 3022-3042. |
MA C Y, LV B X, YE X J, et al. Automatic parallelization framework for complex nested loops based on LLVM pass [J]. Journal of Software, 2023, 34(7): 3022-3042 (in Chinese). | |
18 | LI M Z, LIU Y, LIU X Y, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(3): 708-727. |
19 | CHEN T Q, MOREAU T, JIANG Z H, et al. TVM: An automated end-to-end optimizing compiler for deep learning[DB/OL]. arXiv preprint: 1802.04799, 2018. |
20 | LATTNER C, AMINI M, BONDHUGULA U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]∥2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway: IEEE Press, 2021: 2-14. |
21 | 陈超, 齐峰. 卷积神经网络的发展及其在计算机视觉领域中的应用综述[J]. 计算机科学, 2019, 46(3): 63-73. |
CHEN C, QI F. Review on development of convolutional neural network and its application in computer vision[J]. Computer Science, 2019, 46(3): 63-73 (in Chinese). | |
22 | VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: A brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018: 7068349. |
23 | 赵京胜, 宋梦雪, 高祥. 自然语言处理发展及应用综述[J]. 信息技术与信息化, 2019(7): 142-145. |
ZHAO J S, SONG M X, GAO X. Review on the development and application of natural language processing[J]. Information Technology and Informatization, 2019(7): 142-145 (in Chinese). | |
24 | DEGUANG C, JINLIN M A, ZIPING M A, et al. Review of pre-training techniques for natural language processing[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15(8): 1359. |
25 | 吴艳霞, 梁楷, 刘颖, 等. 深度学习FPGA加速器的进展与趋势[J]. 计算机学报, 2019, 42(11): 2461-2480. |
WU Y X, LIANG K, LIU Y, et al. The progress and trends of FPGA-based accelerators in deep learning[J]. Chinese Journal of Computers, 2019, 42(11): 2461-2480 (in Chinese). | |
26 | CHEN Y J, CHEN T S, XU Z W, et al. DianNao family[J]. Communications of the ACM, 2016, 59(11): 105-112. |
27 | 鲁蔚征, 张峰, 贺寅烜, 等. 华为昇腾神经网络加速器性能评测与优化[J]. 计算机学报, 2022, 45(8): 1618-1637. |
LU W Z, ZHANG F, HE Y X, et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers, 2022, 45(8): 1618-1637 (in Chinese). | |
28 | PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[DB/OL]. arXiv preprint: 1912.01703, 2019. |
29 | ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[DB/OL]. arXiv preprint: 1605.08695, 2016. |
30 | STAUNSTRUP J, WOLF W. Hardware/software co-design: Principles and practice[M]. Berlin: Springer Science & Business Media, 2013. |
31 | NIU W, GUAN J X, WANG Y Z, et al. DNNFusion: Accelerating deep neural networks execution with advanced operator fusion[C]∥Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: ACM, 2021: 883-898. |
32 | ZHENG Z, YANG X D, ZHAO P Z, et al. AStitch: Enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]∥ Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2022: 359-373. |
33 | LI A, ZHENG B J, PEKHIMENKO G, et al. Automatic horizontal fusion for GPU kernels[C]∥2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway: IEEE Press, 2022: 14-27. |
34 | MA L, XIE Z, YANG Z, et al. Rammer: Enabling holistic deep learning compiler optimizations with rTasks[C]∥14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020: 881-897. |
35 | ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8697-8710. |
36 | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 5987-5995. |
37 | ZHAO J, GAO X, XIA R, et al. Apollo: Automatic partition-based operator fusion through layer by layer optimization[J]. Proceedings of Machine Learning and Systems, 2022, 4: 1-19. |
38 | LI Y, ZHAO J, QIANQI S, et al. SIRIUS: Harvesting whole-program optimization opportunities for DNNs[J]. Proceedings of Machine Learning and Systems, 2023, 5: 1-17. |
39 | DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[DB/OL]. arXiv preprint: 1810.04805, 2018. |
40 | 吴帅, 徐勇, 赵东宁. 基于深度卷积网络的目标检测综述[J]. 模式识别与人工智能, 2018, 31(4): 335-346. |
WU S, XU Y, ZHAO D N. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(4): 335-346 (in Chinese). | |
41 | ZHENG Z, PAN Z F, WANG D L, et al. BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach[J]. Proceedings of the ACM on Management of Data, 2023, 1(3): 1-29. |
42 | ZHENG B, JIANG Z, YU C H, et al. DietCode: Automatic optimization for dynamic tensor programs[J]. Proceedings of Machine Learning and Systems, 2022, 4: 848-863. |
43 | YU F, LI G L, ZHAO J C, et al. Optimizing dynamic-shape neural networks on accelerators via on-the-fly micro-kernel polymerization[C]∥Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2024: 1-16. |
44 | BEDOUKIAN P, ADIT N, PEGUERO E, et al. Software-defined vector processing on manycore fabrics[C]∥MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2021: 392-406. |
45 | KRASHINSKY R, BATTEN C, HAMPTON M, et al. The vector-thread architecture[C]∥ Proceedings of 31st Annual International Symposium on Computer Architecture. Piscataway: IEEE Press, 2004: 52-63. |
46 | LEE Y, AVIZIENIS R, BISHARA A, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators[C]∥Proceedings of the 38th annual international symposium on Computer architecture. New York: ACM, 2011: 129-140. |
47 | PARK Y, PARK J J K, PARK H, et al. Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability[C]∥2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE Press, 2012: 84-95. |
48 | ZHANG Z C, OU Y, LIU Y, et al. Occamy: Elastically sharing a SIMD co-processor across multiple CPU cores[C]∥ Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. New York: ACM, 2023: 483-497. |
49 | DENG Y J, WANG C X, YU S C, et al. StrongBox: A GPU TEE on arm endpoints[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 769-783. |
50 | VOLOS S, VASWANI K, BRUNO R. Graviton: Trusted execution environments on GPUs[C]∥13th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 18), 2018: 681-696. |
51 | JANG I, TANG A, KIM T, et al. Heterogeneous isolated execution for commodity GPUs[C]∥ Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2019: 455-468. |
52 | JIANG J Y, QI J, SHEN T X, et al. CRONUS: Fault-isolated, secure and high-performance heterogeneous computing for trusted execution environment[C]∥2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). Piscataway: IEEE Press, 2022: 124-143. |
53 | MAI H, ZHAO J, ZHENG H, et al. Honeycomb: Se-cure and efficient GPU executions via static Valida-tion[C]∥17th USENIX Symposium on Operating Sys-tems Design and Implementation (OSDI 23). 2023: 155-172. |
54 | 赖庆宽, 吕方, 贺春林, 等. 面向理想性能空间的跨架构编译分析方法[J]. 计算机研究与发展, 2021, 58(3): 668-680. |
LAI Q K, LÜ F, HE C L, et al. An ideal performance oriented approach for cross-framework compiler analysis[J]. Journal of Computer Research and Development, 2021, 58(3): 668-680 (in Chinese). | |
55 | XING J R, WANG L Y, ZHANG S, et al. Bolt: Bridging the gap between auto-tuners and hardware-native performance[DB/OL]. arXiv preprint: 2110.15238, 2021. |
56 | FENG S Y, HOU B H, JIN H Y, et al. TensorIR: An abstraction for automatic tensorized program optimization[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2023: 804-817. |
57 | YU F, ZHAO J C, CUI H M, et al. VTensor: Using virtual tensors to build a layout-oblivious AI programming framework[J]. Journal of Computer Science and Technology, 2023, 38(5): 1074-1097. |
58 | HUANG G Y, BAI Y, LIU L, et al. ALCOP: Automatic load-compute pipelining in deep learning compiler for AI-GPUs[DB/OL]. arXiv preprint: 2210.16691, 2022. |
59 | LIU C, LU J, LI G W, et al. Detecting TensorFlow program bugs in real-world industrial environment[C]∥2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE Press, 2021: 55-66. |
60 | LU J, LI H F, LIU C, et al. Detecting missing-permission-check vulnerabilities in distributed cloud systems[C]∥ Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 2145-2158. |
61 | 张贵民, 李清宝, 曾光裕, 等. 运行时代码随机化防御代码复用攻击[J]. 软件学报, 2019, 30(9): 2772-2790. |
ZHANG G M, LI Q B, ZENG G Y, et al. Defensing code reuse attacks using live code randomization[J]. Journal of Software, 2019, 30(9): 2772-2790 (in Chinese). | |
62 | KO Y, REZK T, SERRANO M. SecureJS compiler: Portable memory isolation in JavaScript[C]∥Proceedings of the 36th Annual ACM Symposium on Applied Computing. New York: ACM, 2021: 1265-1274. |
63 | MOREAU T, CHEN T Q, VEGA L, et al. A hardware-software blueprint for flexible deep learning specialization[J]. IEEE Micro, 2019, 39(5): 8-16. |
64 | THIERRY M, TIANQI C, ZIHENG J, et al. VTA: An open hardware-software stack for deep learning[DB/OL]. arXiv preprint: 1807.04188, 2018. |
[1] | SUI Dong, XING Yaping, TU Shichen. Repair optimization strategy for air route networks under severe weather conditions [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(2): 324300-324300. |
Viewed | ||||||
Full text |
Abstract |
Address: No.238, Baiyan Buiding, Beisihuan Zhonglu Road, Haidian District, Beijing, China
Postal code : 100083
Total visits: 6658907 Today visits: 1341All copyright © editorial office of Chinese Journal of Aeronautics
All copyright © editorial office of Chinese Journal of Aeronautics
Total visits: 6658907 Today visits: 1341