Aeronautics Computing and Simulation Technique

Compiler technologies for emerging application paradigms and advanced computer architectures

  • Guangli LI ,
  • Zhen DU ,
  • Jiacheng ZHAO ,
  • Ying LIU ,
  • Feng YU ,
  • Yijin LI ,
  • Zhongcheng ZHANG ,
  • Huimin CUI
Expand
  • 1.State Key Lab of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
E-mail: cuihm@ict.ac.cn

Received date: 2024-04-19

  Revised date: 2024-06-20

  Accepted date: 2024-08-13

  Online published: 2024-08-21

Supported by

National Science and Technology Major Project(2021ZD0110101);National Natural Science Foundation of China(62232015);China Postdoctoral Science Foundation(2023M733566);Innovation Funding of ICT, CAS(E361010)

Abstract

With the increasing demand for computility driven by emerging applications such as artificial intelligence, the compilation technology, serving as a crucial bridge between software and hardware, is facing unprecedented challenges and opportunities. This article focuses on the development trends of domain-specific compilers, and gives an in-depth discussion on the compilation techniques tailored for emerging domains. By examining various aspects including whole-program operator fusion, dynamic-shape tensor compilation, co-design of software and hardware, computational security, this article provides a comprehensive summary and evaluation of representative domain-specific compilation technologies for new application paradigms and architectures. The key role of domain-specific compilation technologies in adapting to diverse computing platforms, improving program execution efficiency, ensuring software security and supporting hardware design are analyzed. Its prospects for applications and future work are also discussed.

Cite this article

Guangli LI , Zhen DU , Jiacheng ZHAO , Ying LIU , Feng YU , Yijin LI , Zhongcheng ZHANG , Huimin CUI . Compiler technologies for emerging application paradigms and advanced computer architectures[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(20) : 630552 -630552 . DOI: 10.7527/S1000-6893.2024.30552

References

1 李孟源, 李丹敏, 张笈玮, 等. 大数据背景下航天设备预测性维护技术研究[J]. 中国新通信202325(2): 25-28.
  LI M Y, LI D M, ZHANG J W, et al. Research on predictive maintenance technology of space equipment under the background of big data[J]. China New Telecommunications202325(2): 25-28 (in Chinese).
2 吴小欢, 朱金福, 葛伟. 航线网络区间鲁棒优化设计[C]∥2011年全国博士生学术论坛, 2011.
  WU XH, ZHU JF, GE W. Robust optimization design of route network intervals[C]∥ 2011 National Doctoral Academic Forum, 2011 (in Chinese).
3 邹子君.人工智能技术在空中交通管理中的应用[J].中国新通信2016(5): 51.
  ZHOU Z J. Application of artificial intelligence technology in air traffic management[J]. China New Telecommunications2016(5): 51 (in Chinese).
4 张新苗, 余自武, 杨雨绮. 人工智能在波音787上的应用与思考[J]. 工业工程与管理201722(6): 169-174.
  ZHANG X M, YU Z W, YANG Y Q. Application and consideration to boeing 787 influenced by artificial intelligence[J]. Industrial Engineering and Management201722(6): 169-174 (in Chinese).
5 李原百, 徐晓雯, 赵俊杰, 等. 基于OODA模型的兵棋推演训练平台构建[C]∥第十届中国指挥控制大会论文集, 2022.
  LI Y B, XU X W, ZHAO J J, et al. Construction of war game training platform based on OODA model[C]∥ Proceedings of the 10th Chinese Command and Control Society Conference, 2022 (in Chinese).
6 李彭勇. 链接时死代码删除与基于模式匹配的机器码翻译[D]. 合肥: 中国科学技术大学, 2015.
  LI P Y. Deleting dead code in link time and translating machine code based on pattern matching[D]. Hefei: University of Science and Technology of China, 2015 (in Chinese).
7 ACHARYA A, BONDHUGULA U, COHEN A. Effective loop fusion in polyhedral compilation using fusion conflict graphs[J]. ACM Transactions on Architecture and Code Optimization202017(4): 1-26.
8 夏军, 戴华东, 杨学军. 基于线性表出的非奇异循环变换局部性优化方法[J]. 计算机学报200326(12): 1609-1620.
  XIA J, DAI H D, YANG X J. A linear expressing based approach for optimizing locality using non-singular loop transformations[J]. Chinese Journal of Computers200326(12): 1609-1620 (in Chinese).
9 彭畅, 刘青枝, 陈长波. 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学202345(12): 2121-2134.
  PENG C, LIU Q Z, CHEN C B. Loop permutation and auto-tuning under polyhedral model[J]. Computer Engineering & Science202345(12): 2121-2134 (in Chinese).
10 LOZANO R C, CARLSSON M, BLINDELL G H, et al. Combinatorial register allocation and instruction scheduling[J]. ACM Transactions on Programming Languages and Systems201941(3): 1-53.
11 张军超, 连瑞琦, 张兆庆. 多寄存器组网络处理器上的寄存器分配技术[J]. 计算机学报200629(1): 66-72.
  ZHANG J C, LIAN R Q, ZHANG Z Q. Register allocation on network processors with multiple register banks[J]. Chinese Journal of Computers200629(1): 66-72 (in Chinese).
12 高猛, 赵家程, 崔慧敏, 等. 位宽感知的寄存器绑定算法[J]. 软件学报202435(6): 2631-2647.
  GAO M, ZHAO J C, CUI H M, et al. Bitwidth-aware register binding algorithm[J]. Journal of Software202435(6): 2631-2647 (in Chinese).
13 高伟, 赵荣彩, 韩林, 等. SIMD自动向量化编译优化概述[J]. 软件学报201526(6): 1265-1284.
  GAO W, ZHAO R C, HAN L, et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software201526(6): 1265-1284 (in Chinese).
14 李春江, 黄娟娟, 徐颖, 等. 典型编译器自动向量化效果评估与分析[J]. 计算机科学201340(4): 41-46.
  LI C J, HUANG J J, XU Y, et al. Evaluation and analysis of effects of auto-vectorization in typical compilers[J]. Computer Science201340(4): 41-46 (in Chinese).
15 冯竞舸, 贺也平, 陶秋铭. 自动向量化: 近期进展与展望[J]. 通信学报202243(3): 180-195.
  FENG J G, HE Y P, TAO Q M. Auto-vectorization: Recent development and prospect[J]. Journal on Communications202243(3): 180-195 (in Chinese).
16 MIDKIFF S P. Automatic parallelization: An overview of fundamental compiler techniques[M]. Cham: Springer International Publishing, 2012.
17 马春燕, 吕炳旭, 叶许姣, 等. 基于 LLVM Pass 的复杂嵌套循环自动并行化框架[J]. 软件学报202234(7): 3022-3042.
  MA C Y, LV B X, YE X J, et al. Automatic parallelization framework for complex nested loops based on LLVM pass [J]. Journal of Software202334(7): 3022-3042 (in Chinese).
18 LI M Z, LIU Y, LIU X Y, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems202132(3): 708-727.
19 CHEN T Q, MOREAU T, JIANG Z H, et al. TVM: An automated end-to-end optimizing compiler for deep learning[DB/OL]. arXiv preprint1802.04799, 2018.
20 LATTNER C, AMINI M, BONDHUGULA U, et al. MLIR: Scaling compiler infrastructure for domain specific computation[C]∥2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway: IEEE Press, 2021: 2-14.
21 陈超, 齐峰. 卷积神经网络的发展及其在计算机视觉领域中的应用综述[J]. 计算机科学201946(3): 63-73.
  CHEN C, QI F. Review on development of convolutional neural network and its application in computer vision[J]. Computer Science201946(3): 63-73 (in Chinese).
22 VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: A brief review[J]. Computational Intelligence and Neuroscience20182018: 7068349.
23 赵京胜, 宋梦雪, 高祥. 自然语言处理发展及应用综述[J]. 信息技术与信息化2019(7): 142-145.
  ZHAO J S, SONG M X, GAO X. Review on the development and application of natural language processing[J]. Information Technology and Informatization2019(7): 142-145 (in Chinese).
24 DEGUANG C, JINLIN M A, ZIPING M A, et al. Review of pre-training techniques for natural language processing[J]. Journal of Frontiers of Computer Science & Technology202115(8): 1359.
25 吴艳霞, 梁楷, 刘颖, 等. 深度学习FPGA加速器的进展与趋势[J]. 计算机学报201942(11): 2461-2480.
  WU Y X, LIANG K, LIU Y, et al. The progress and trends of FPGA-based accelerators in deep learning[J]. Chinese Journal of Computers201942(11): 2461-2480 (in Chinese).
26 CHEN Y J, CHEN T S, XU Z W, et al. DianNao family[J]. Communications of the ACM201659(11): 105-112.
27 鲁蔚征, 张峰, 贺寅烜, 等. 华为昇腾神经网络加速器性能评测与优化[J]. 计算机学报202245(8): 1618-1637.
  LU W Z, ZHANG F, HE Y X, et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers202245(8): 1618-1637 (in Chinese).
28 PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[DB/OL]. arXiv preprint1912.01703, 2019.
29 ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[DB/OL]. arXiv preprint: 1605.08695, 2016.
30 STAUNSTRUP J, WOLF W. Hardware/software co-design: Principles and practice[M]. Berlin: Springer Science & Business Media, 2013.
31 NIU W, GUAN J X, WANG Y Z, et al. DNNFusion: Accelerating deep neural networks execution with advanced operator fusion[C]∥Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: ACM, 2021: 883-898.
32 ZHENG Z, YANG X D, ZHAO P Z, et al. AStitch: Enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures[C]∥ Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2022: 359-373.
33 LI A, ZHENG B J, PEKHIMENKO G, et al. Automatic horizontal fusion for GPU kernels[C]∥2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). Piscataway: IEEE Press, 2022: 14-27.
34 MA L, XIE Z, YANG Z, et al. Rammer: Enabling holistic deep learning compiler optimizations with rTasks[C]∥14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020: 881-897.
35 ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8697-8710.
36 XIE S N, GIRSHICK R, DOLLáR P, et al. Aggregated residual transformations for deep neural networks[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 5987-5995.
37 ZHAO J, GAO X, XIA R, et al. Apollo: Automatic partition-based operator fusion through layer by layer optimization[J]. Proceedings of Machine Learning and Systems20224: 1-19.
38 LI Y, ZHAO J, QIANQI S, et al. SIRIUS: Harvesting whole-program optimization opportunities for DNNs[J]. Proceedings of Machine Learning and Systems20235: 1-17.
39 DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[DB/OL]. arXiv preprint1810.04805, 2018.
40 吴帅, 徐勇, 赵东宁. 基于深度卷积网络的目标检测综述[J]. 模式识别与人工智能201831(4): 335-346.
  WU S, XU Y, ZHAO D N. Survey of object detection based on deep convolutional network[J]. Pattern Recognition and Artificial Intelligence201831(4): 335-346 (in Chinese).
41 ZHENG Z, PAN Z F, WANG D L, et al. BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach[J]. Proceedings of the ACM on Management of Data20231(3): 1-29.
42 ZHENG B, JIANG Z, YU C H, et al. DietCode: Automatic optimization for dynamic tensor programs[J]. Proceedings of Machine Learning and Systems20224: 848-863.
43 YU F, LI G L, ZHAO J C, et al. Optimizing dynamic-shape neural networks on accelerators via on-the-fly micro-kernel polymerization[C]∥Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2024: 1-16.
44 BEDOUKIAN P, ADIT N, PEGUERO E, et al. Software-defined vector processing on manycore fabrics[C]∥MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2021: 392-406.
45 KRASHINSKY R, BATTEN C, HAMPTON M, et al. The vector-thread architecture[C]∥ Proceedings of 31st Annual International Symposium on Computer Architecture. Piscataway: IEEE Press, 2004: 52-63.
46 LEE Y, AVIZIENIS R, BISHARA A, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators[C]∥Proceedings of the 38th annual international symposium on Computer architecture. New York: ACM, 2011: 129-140.
47 PARK Y, PARK J J K, PARK H, et al. Libra: Tailoring SIMD execution using heterogeneous hardware and dynamic configurability[C]∥2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE Press, 2012: 84-95.
48 ZHANG Z C, OU Y, LIU Y, et al. Occamy: Elastically sharing a SIMD co-processor across multiple CPU cores[C]∥ Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. New York: ACM, 2023: 483-497.
49 DENG Y J, WANG C X, YU S C, et al. StrongBox: A GPU TEE on arm endpoints[C]∥Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 769-783.
50 VOLOS S, VASWANI K, BRUNO R. Graviton: Trusted execution environments on GPUs[C]∥13th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 18), 2018: 681-696.
51 JANG I, TANG A, KIM T, et al. Heterogeneous isolated execution for commodity GPUs[C]∥ Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2019: 455-468.
52 JIANG J Y, QI J, SHEN T X, et al. CRONUS: Fault-isolated, secure and high-performance heterogeneous computing for trusted execution environment[C]∥2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). Piscataway: IEEE Press, 2022: 124-143.
53 MAI H, ZHAO J, ZHENG H, et al. Honeycomb: Se-cure and efficient GPU executions via static Valida-tion[C]∥17th USENIX Symposium on Operating Sys-tems Design and Implementation (OSDI 23). 2023: 155-172.
54 赖庆宽, 吕方, 贺春林, 等. 面向理想性能空间的跨架构编译分析方法[J]. 计算机研究与发展202158(3): 668-680.
  LAI Q K, Lü F, HE C L, et al. An ideal performance oriented approach for cross-framework compiler analysis[J]. Journal of Computer Research and Development202158(3): 668-680 (in Chinese).
55 XING J R, WANG L Y, ZHANG S, et al. Bolt: Bridging the gap between auto-tuners and hardware-native performance[DB/OL]. arXiv preprint: 2110.15238, 2021.
56 FENG S Y, HOU B H, JIN H Y, et al. TensorIR: An abstraction for automatic tensorized program optimization[C]∥Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York: ACM, 2023: 804-817.
57 YU F, ZHAO J C, CUI H M, et al. VTensor: Using virtual tensors to build a layout-oblivious AI programming framework[J]. Journal of Computer Science and Technology202338(5): 1074-1097.
58 HUANG G Y, BAI Y, LIU L, et al. ALCOP: Automatic load-compute pipelining in deep learning compiler for AI-GPUs[DB/OL]. arXiv preprint: 2210.16691, 2022.
59 LIU C, LU J, LI G W, et al. Detecting TensorFlow program bugs in real-world industrial environment[C]∥2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE Press, 2021: 55-66.
60 LU J, LI H F, LIU C, et al. Detecting missing-permission-check vulnerabilities in distributed cloud systems[C]∥ Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 2145-2158.
61 张贵民, 李清宝, 曾光裕, 等. 运行时代码随机化防御代码复用攻击[J]. 软件学报201930(9): 2772-2790.
  ZHANG G M, LI Q B, ZENG G Y, et al. Defensing code reuse attacks using live code randomization[J]. Journal of Software201930(9): 2772-2790 (in Chinese).
62 KO Y, REZK T, SERRANO M. SecureJS compiler: Portable memory isolation in JavaScript[C]∥Proceedings of the 36th Annual ACM Symposium on Applied Computing. New York: ACM, 2021: 1265-1274.
63 MOREAU T, CHEN T Q, VEGA L, et al. A hardware-software blueprint for flexible deep learning specialization[J]. IEEE Micro201939(5): 8-16.
64 THIERRY M, TIANQI C, ZIHENG J, et al. VTA: An open hardware-software stack for deep learning[DB/OL]. arXiv preprint1807.04188, 2018.
Outlines

/