适用于空间通信的LDPC码GPU高速译码架构

doi:10.7527/S1000-6893.2016.0126

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

适用于空间通信的LDPC码GPU高速译码架构

侯毅, 刘荣科, 彭皓, 赵岭, 熊庆旭

北京航空航天大学电子信息工程学院, 北京 100083

收稿日期:2016-01-25 修回日期:2016-04-25 出版日期:2017-01-15 发布日期:2016-05-05
通讯作者: 刘荣科,E-mail:rongke_liu@buaa.edu.cn E-mail:rongke_liu@buaa.edu.cn
基金资助:
国家自然科学基金(91438116)

High-throughput GPU-based LDPC decoder architecture for space communication

HOU Yi, LIU Rongke, PENG Hao, ZHAO Ling, XIONG Qingxu

School of Electronics and Information Engineering, Beihang University, Beijing 100083, China

Received:2016-01-25 Revised:2016-04-25 Online:2017-01-15 Published:2016-05-05
Supported by:
National Natural Science Foundation of China (91438116)

摘要/Abstract

摘要：

鉴于目前空间通信对高速、可重配置信道译码器的需求，利用图形处理器（GPU）的并行化运算特点，提出了一种低密度奇偶校验（LDPC）码软件高速译码架构。通过优化Turbo消息传递译码（TDMP）算法节点更新运算线程块内和块间并行度、减少非规则行重造成的线程分支、降低线程对节点更新信息存储资源的访问延时以及合理量化译码器存储信息来提升译码内核函数的执行效率。并在此基础上引入异步统一计算设备构架（CUDA）流处理机制，设计优化的译码器输入输出数据传输和内核函数之间的执行调度方式以及CUDA流上的译码线程资源配置方式，最大化译码吞吐率的同时降低译码延时。在Nvidia最新的Tesla K20和GTX980平台上对国际空间数据系统咨询委员会（CCSDS）遥测标准LDPC码进行的TDMP译码实验结果表明，本架构进行10次迭代译码的吞吐率最高可达约500 Mbps，平均译码延时约为2 ms左右。与现有结果相比，本架构在保持软件架构配置灵活性的同时更加有效的兼顾了译码吞吐率和延时性能。

关键词: 低密度奇偶校验码, 图形处理器, 软件译码架构, Turbo消息传递译码算法, 高吞吐率, 低延时

Abstract:

In view of the current requirements for high-speed reconfigurable channel decoder for space communications, a high-throughput low-density parity-check(LDPC) software decoding architecture is proposed by exploiting the graphics processing units (GPU)'s parallel operating characteristics. The efficiency of the decoding kernel functions is improved by optimizing the inter-block and intra-block thread parallelism for the nodes' updating operations in software decoding architecture; turbo-decoding message passing (TDMP) algorithm, reducing the thread branch induced by the irregularity of row-weight, lowering the memory access latency for the updating information by threads, and reasonably quantizing the stored information to. The asynchronous compute unified device architecture (CUDA) stream processing mechanism, which includes designing an optimized execution scheduling between decoder's input/output data transfers and kernel functions, and setting a thread resource allocation method on CUDA streams, is also introduced to maximize the decoding throughput and at the same time reduce the decoding latency. The experimental results from the decoding simulations of the Consultative Committee for Space Data System (CCSDS) telemetry standard's LDPC codes on the Nvidia's latest Tesla K20 and GTX980 platforms demonstrate that the proposed architecture achieves about 500 Mbps maximum throughput and about 2 ms average latency by using TDMP algorithm with 10 iterations. In comparison with the existing results, the proposed architecture can improve both the decoding throughput and latency performance, and maintain the configuration flexibility of software architecture.

Key words: low-density parity-check codes, graphics processing units, software decoding architecture, Turbo-decoding message passing algorithm, high-throughput, low latency

中图分类号:

侯毅, 刘荣科, 彭皓, 赵岭, 熊庆旭. 适用于空间通信的LDPC码GPU高速译码架构[J]. 航空学报, 2017, 38(1): 320107-320107.

HOU Yi, LIU Rongke, PENG Hao, ZHAO Ling, XIONG Qingxu. High-throughput GPU-based LDPC decoder architecture for space communication[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2017, 38(1): 320107-320107.

参考文献

[1] MACKAY D J C, NEAL R M. Near Shannon limit performance of low density parity check codes[J]. Electronics Letters, 1996, 32(18):1645-1646.
[2] NASA. Space telecommunications radio system (STRS) architecture standard:NASA-STD-4009[S]. Washington, D. C.:NASA, 2014.
[3] LAY N, ARGUETA A, TKACENKO A, et al. Reconfigurable wideband ground receiver field testing:IPN progress report 42-191[R]. Pasadena:Jet Propulsion Laboratory, 2012.
[4] CHEUNG K M, ABRAHAM D, ARROYO B, et al. Next-generation ground network architecture for communications and tracking of interplanetary smallsats:IPN progress report 42-202[R]. Pasadena:Jet Propulsion Laboratory, 2015.
[5] WANG Y Q, LIU D L, SUN L, et al. Real-time implementation for reduced-complexity LDPC decoder in satellite communication[J]. China Communications, 2014, 11(12):94-104.
[6] LECHNER G, SAYIR J, RUPP M. Efficient DSP implementation of an LDPC decoder[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Piscataway, NJ:IEEE Press, 2004:iv-665-iv-668.
[7] GAL B L, JEGO C. High-throughput multi-core LDPC decoders based on x86 processor[J/OL]. IEEE Transactions on Parallel and Distributed Systems, (2015-05-20)[2016-01-11].http://doi.ieeecomputersociety.org/10.1109/TPDS.2015.2435787.
[8] GAL B L, JEGO C. High-throughput LDPC decoder on low-power embedded processors[J]. IEEE Communications Letters, 2015, 19(11):1861-1864.
[9] FALCAO G, SOUSA L, SILVA V. Massively LDPC decoding on multicore architectures[J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(2):309-322.
[10] JI H W, CHO J H, SUNG W Y. Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU[J]. Journal of Signal Processing Systems, 2010, 64(1):149-159.
[11] WANG G H, WU M, YIN B, et al. High throughput low latency LDPC decoding on GPU for SDR systems[C]//Proceedings of 2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP). Piscataway, NJ:IEEE Press, 2013:1258-1261.
[12] HONG J H, CHUNG K S. Parallel LDPC decoding on a GPU using OpenCL and global memory for accelerators[C]//Proceedings of 2015 IEEE International Conference on Networking, Architecture and Storage (NAS). Piscataway, NJ:IEEE Press, 2015:353-354.
[13] FALCAO G, ANDRADE J, SILVA V, et al. GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection[J]. Electronics Letters, 2011, 47(9):542-543.
[14] XIE W, JIAO X J, PEKKA J, et al. A high throughput LDPC decoder using a mid-range GPU[C]//Proceedings of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ:IEEE Press, 2014:7515-7519.
[15] LIN Y, NIU W S. High throughput LDPC decoder on GPU[J]. IEEE Communications Letters, 2014, 18(2):344-347.
[16] GALLAGER R. Low-density parity-check codes[J]. IRE Transactions on Information Theory, 1962, 8(1):21-28.
[17] MANSOUR M, SHANBHAG N. Turbo decoder architectures for low-density parity-check codes[C]//Proceedings of 2002 IEEE Global Telecommunications Conference (GLOBECOM). Piscataway, NJ:IEEE Press, 2002:1383-1388.
[18] ZHANG J T, FOSSORIER M P C. Shuffled iterative decoding[J]. IEEE Transactions on Communications, 2005, 53(2):209-213.
[19] NVIDIA Corporation. CUDA C programming guide version 7.5[EB/OL]. (2015-09-01)[2016-01-11]. http://docs.nvidia.com/cuda/cuda-c-programming-guide.
[20] CCSDS. Synchronization and channel coding, Issue 2:131.0-B-2 TM[S]. Washington, D.C.:CCSDS, 2011.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

适用于空间通信的LDPC码GPU高速译码架构

High-throughput GPU-based LDPC decoder architecture for space communication

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价

[1]	王旭;张璐;杨新;毛维;谭祥升. 飞行监控系统中基于GPU的地形渲染[J]. 航空学报, 2010, 31(6): 1230-1238.
[2]	张兵;韩景龙. 基于GPU和隐式格式的CFD并行计算方法[J]. 航空学报, 2010, 31(2): 249-256.
[3]	赵岭;张晓林. 一种准循环低密度校验码部分并行编码结构设计[J]. 航空学报, 2009, 30(1): 109-114.