ACTA AERONAUTICAET ASTRONAUTICA SINICA >
High-throughput GPU-based LDPC decoder architecture for space communication
Received date: 2016-01-25
Revised date: 2016-04-25
Online published: 2016-05-05
Supported by
National Natural Science Foundation of China (91438116)
In view of the current requirements for high-speed reconfigurable channel decoder for space communications, a high-throughput low-density parity-check(LDPC) software decoding architecture is proposed by exploiting the graphics processing units (GPU)'s parallel operating characteristics. The efficiency of the decoding kernel functions is improved by optimizing the inter-block and intra-block thread parallelism for the nodes' updating operations in software decoding architecture; turbo-decoding message passing (TDMP) algorithm, reducing the thread branch induced by the irregularity of row-weight, lowering the memory access latency for the updating information by threads, and reasonably quantizing the stored information to. The asynchronous compute unified device architecture (CUDA) stream processing mechanism, which includes designing an optimized execution scheduling between decoder's input/output data transfers and kernel functions, and setting a thread resource allocation method on CUDA streams, is also introduced to maximize the decoding throughput and at the same time reduce the decoding latency. The experimental results from the decoding simulations of the Consultative Committee for Space Data System (CCSDS) telemetry standard's LDPC codes on the Nvidia's latest Tesla K20 and GTX980 platforms demonstrate that the proposed architecture achieves about 500 Mbps maximum throughput and about 2 ms average latency by using TDMP algorithm with 10 iterations. In comparison with the existing results, the proposed architecture can improve both the decoding throughput and latency performance, and maintain the configuration flexibility of software architecture.
HOU Yi , LIU Rongke , PENG Hao , ZHAO Ling , XIONG Qingxu . High-throughput GPU-based LDPC decoder architecture for space communication[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2017 , 38(1) : 320107 -320107 . DOI: 10.7527/S1000-6893.2016.0126
[1] MACKAY D J C, NEAL R M. Near Shannon limit performance of low density parity check codes[J]. Electronics Letters, 1996, 32(18):1645-1646.
[2] NASA. Space telecommunications radio system (STRS) architecture standard:NASA-STD-4009[S]. Washington, D. C.:NASA, 2014.
[3] LAY N, ARGUETA A, TKACENKO A, et al. Reconfigurable wideband ground receiver field testing:IPN progress report 42-191[R]. Pasadena:Jet Propulsion Laboratory, 2012.
[4] CHEUNG K M, ABRAHAM D, ARROYO B, et al. Next-generation ground network architecture for communications and tracking of interplanetary smallsats:IPN progress report 42-202[R]. Pasadena:Jet Propulsion Laboratory, 2015.
[5] WANG Y Q, LIU D L, SUN L, et al. Real-time implementation for reduced-complexity LDPC decoder in satellite communication[J]. China Communications, 2014, 11(12):94-104.
[6] LECHNER G, SAYIR J, RUPP M. Efficient DSP implementation of an LDPC decoder[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Piscataway, NJ:IEEE Press, 2004:iv-665-iv-668.
[7] GAL B L, JEGO C. High-throughput multi-core LDPC decoders based on x86 processor[J/OL]. IEEE Transactions on Parallel and Distributed Systems, (2015-05-20)[2016-01-11].http://doi.ieeecomputersociety.org/10.1109/TPDS.2015.2435787.
[8] GAL B L, JEGO C. High-throughput LDPC decoder on low-power embedded processors[J]. IEEE Communications Letters, 2015, 19(11):1861-1864.
[9] FALCAO G, SOUSA L, SILVA V. Massively LDPC decoding on multicore architectures[J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(2):309-322.
[10] JI H W, CHO J H, SUNG W Y. Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU[J]. Journal of Signal Processing Systems, 2010, 64(1):149-159.
[11] WANG G H, WU M, YIN B, et al. High throughput low latency LDPC decoding on GPU for SDR systems[C]//Proceedings of 2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP). Piscataway, NJ:IEEE Press, 2013:1258-1261.
[12] HONG J H, CHUNG K S. Parallel LDPC decoding on a GPU using OpenCL and global memory for accelerators[C]//Proceedings of 2015 IEEE International Conference on Networking, Architecture and Storage (NAS). Piscataway, NJ:IEEE Press, 2015:353-354.
[13] FALCAO G, ANDRADE J, SILVA V, et al. GPU-based DVB-S2 LDPC decoder with high throughput and fast error floor detection[J]. Electronics Letters, 2011, 47(9):542-543.
[14] XIE W, JIAO X J, PEKKA J, et al. A high throughput LDPC decoder using a mid-range GPU[C]//Proceedings of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ:IEEE Press, 2014:7515-7519.
[15] LIN Y, NIU W S. High throughput LDPC decoder on GPU[J]. IEEE Communications Letters, 2014, 18(2):344-347.
[16] GALLAGER R. Low-density parity-check codes[J]. IRE Transactions on Information Theory, 1962, 8(1):21-28.
[17] MANSOUR M, SHANBHAG N. Turbo decoder architectures for low-density parity-check codes[C]//Proceedings of 2002 IEEE Global Telecommunications Conference (GLOBECOM). Piscataway, NJ:IEEE Press, 2002:1383-1388.
[18] ZHANG J T, FOSSORIER M P C. Shuffled iterative decoding[J]. IEEE Transactions on Communications, 2005, 53(2):209-213.
[19] NVIDIA Corporation. CUDA C programming guide version 7.5[EB/OL]. (2015-09-01)[2016-01-11]. http://docs.nvidia.com/cuda/cuda-c-programming-guide.
[20] CCSDS. Synchronization and channel coding, Issue 2:131.0-B-2 TM[S]. Washington, D.C.:CCSDS, 2011.
/
〈 | 〉 |