导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA ›› 2017, Vol. 38 ›› Issue (1): 320107-320107.doi: 10.7527/S1000-6893.2016.0126

• Electronics and Electrical Engineering and Control • Previous Articles     Next Articles

High-throughput GPU-based LDPC decoder architecture for space communication

HOU Yi, LIU Rongke, PENG Hao, ZHAO Ling, XIONG Qingxu   

  1. School of Electronics and Information Engineering, Beihang University, Beijing 100083, China
  • Received:2016-01-25 Revised:2016-04-25 Online:2017-01-15 Published:2016-05-05
  • Supported by:

    National Natural Science Foundation of China (91438116)

Abstract:

In view of the current requirements for high-speed reconfigurable channel decoder for space communications, a high-throughput low-density parity-check(LDPC) software decoding architecture is proposed by exploiting the graphics processing units (GPU)'s parallel operating characteristics. The efficiency of the decoding kernel functions is improved by optimizing the inter-block and intra-block thread parallelism for the nodes' updating operations in software decoding architecture; turbo-decoding message passing (TDMP) algorithm, reducing the thread branch induced by the irregularity of row-weight, lowering the memory access latency for the updating information by threads, and reasonably quantizing the stored information to. The asynchronous compute unified device architecture (CUDA) stream processing mechanism, which includes designing an optimized execution scheduling between decoder's input/output data transfers and kernel functions, and setting a thread resource allocation method on CUDA streams, is also introduced to maximize the decoding throughput and at the same time reduce the decoding latency. The experimental results from the decoding simulations of the Consultative Committee for Space Data System (CCSDS) telemetry standard's LDPC codes on the Nvidia's latest Tesla K20 and GTX980 platforms demonstrate that the proposed architecture achieves about 500 Mbps maximum throughput and about 2 ms average latency by using TDMP algorithm with 10 iterations. In comparison with the existing results, the proposed architecture can improve both the decoding throughput and latency performance, and maintain the configuration flexibility of software architecture.

Key words: low-density parity-check codes, graphics processing units, software decoding architecture, Turbo-decoding message passing algorithm, high-throughput, low latency

CLC Number: