A RECONFIGURABLE ARCHITECTURE OF TURBO DECODER FOR MIMO-HIGH SPEED DOWNLINK PACKET ACCESS

A novel channel based rescheduling scheme for modern turbo convolution code is proposed by the inclusion of suboptimal and low-complex max-log-MAP algorithm. Demands for dedicated custom solutions in mobile communications and its related applications leads to a reconfigurable architecture for Turbo convolution code. This study comprises the design and performance evolution of the proposed reconfigurable architecture for channel coding scheme in MIMO-High Speed Downlink Packet Access (MIMO-HSDPA). To attain effective performance close to shannon limit in a multi channel system, flexible reconfigurable architecture is realized with 28 nm cyclone V GX 5CGXFC5C6 FPGA. We achieved throughput of 13.5 Mbps compared with the conventional HSDPA standards while consuming 53 mW.


INTRODUCTION
To reduce latency and to provide higher peak rates, HSPDA has been introduced in release 5 of the universal mobile telecommunication system ( Table 1). A large fraction of downlink resources is allocated to a specific user and this is done by the HSDPA which contains a High-Speed Downlink Shared Channel (HS-DSCH).
The main objective focuses on high peak data rate for data centric 3G applications (3GPP TS 25.201, 2001;3GPP TS 25.308 V5.7.0, 2004;3GPP TSG RAN WGI, 2002). Moreover, HSDPA has backward compatible with WCDMA wireless standard and the convolutional turbo coder ensures reliable communication for smart phones, tablets and other hand held devices. Scheduling is the key aspect of HSDPA system that defines the general behavior of the system.
Hence, a novel channel based rescheduling scheme for turbo coder is developed and employed with maxlog-MAP algorithm. The proposed reconfigurable architecture has enough ability to meet the data rates close to Shannon Limit.
The existing 1/3 turbo coder comprises two 1/2 rate recursive systematic convolutional encoders (Fig. 2). The functionality of such encoder in divided into two parts, the first part encodes the received data bits (uncoded) in natural order and generates a set of parity bit. The later part encodes the permutation of the data bits from a block interleaver and generates another set of parity bits. These set of parity bits are Processed over the communication channel (Benkeser et al., 2009). Considering everything the challenges faced in the conventional HSDPA systems are throughput and power consumption. In this study we introduce a new reconfigurable architecture as shown in Fig. 3 for turbo coder in section 2. In section 3 Low complex max-log-MAP algorithm is employed in HSDPA to achieve the desired throughput. The implementation and simulation results are compared with the existing systems in section 4. Section 5 concludes this proposal.

Encoding Scheme
In wireless communication standards, the encoding of the data bits is performed by using 1/3 rate recursive systematic convolutional coders which means that for every input there will be three outputs.
These three output signals are termed as systematic original signal and redundant signals (two parity bits). The recursive systematic convolutional code encoders are connected through Interleaver block of block size 40 to 5114 bits.

Decoding Scheme
The architecture proposed ( Fig. 1) is designed to process, on average, one trellis step per clock cycle using three state metric recursion units. In order to overcome the bottle-neck during the memory access, similar cache memory architecture is used with reference. The reconfigurable architecture utilizes a Last Level Cache with Dynamic Insertion Policy to exhibit the memory access issues and bandwidth demands in the decoding scheme.

Cache Architecture
The HSDPA proposed in (Benkeser et al., 2009) uses Quadruple-pole cache memory architecture (Lin et al., 2006) to overcome the bottleneck issues. The novel hybrid last level cache architecture (Fig. 4) adopts to different modulation schemes such as QPSK, 16-QAM and 64-QAM.To improve the throughput of the system, a reconfigurable system is developed using the novel cache technique. In certain Memory access cases, the memory locations may be free and unused, but it can't be used because of incompatible packet sizes. Even the locality reference has a point in calculating performance evaluation, which means that a very fast memory access can be made if the accessed blocks are in the adjacent memory locations. In the critical case if the memory blocks belonging to the same resource is scattered around the memory space, far away from each other leads to performance degradation. An address which has Impacts during Cache misses displays repetitive patterns by the temporal locality inheritance in the access stream. When an allocated memory location is not released after its use, results in memory leak and it can be never used again. This hybrid Last Level cache architecture divides the total bandwidth memory access system into a conventional cache with a dual-tag and data block, which is utilized by the Bloom filter in combination with a channeled priority heap to identify and retain the blocks that is recurrently missed in the other cache systems.

MAX-LOG-MAP DECODER
The Maximum-A-Posteriori (MAP) algorithm proposed is used for Soft Input Soft Output components in Turbo-decoder. The computational complexity of MAP algorithm is very high and makes the hardware realization complex. Hence, a low complex max-log-MAP algorithm (Benkeser et al., 2009) is used. As discussed in Section 2.2 the state metric units carry out forward, backward and dummY-backward manipulations in parallel.
In contrast to Quadruple-port cache memory the reconfigurable architecture uses a Last Level Cache technique to overcome the bottleneck issues faced in the memory access management issues. With a decision bit u k = i the forward state metrics a k and backward state metrics b k are calculated by the recursive Equation 1 and 2, where s is the trellis state that is reached from s' state. A dummy-backward recursion is performed on some trellis steps in advance to generate a reliable set of state metrics as starting points.
Max-log-MAP algorithm computes the posteriori probability of a bit being zero or a one. Hence, it is a symbol-by-symbol decoding algorithm by which the probability of bit error is minimized:

IMPLEMTATION RESULTS
The introduction of MIMO and multi-carrier architectures calls for time-multiplexing and multichannel implementations. When considering turbodecoder with max-log-MAP algorithm, it possible to achieve higher gains when the link is not limited to high rate of errors.
Similarly, in the case of Incremental Redundancy dynamically changes the FEC pattern for independent retransmitted data frames, this focus that FEC scheme performs a higher roll when we discuss about the resultant throughput.  (Frigon et al., 2007)   The throughput of the turbo decoder is estimated as 21.37 Mb/s which is 5.7% higher than reference. On the other hand, throughput of the overall system (MIMO HSDPA) also finds a remarkable improvement i.e., 13.5 Mb/s; whereas (Benkeser et al., 2009) achieves 10.8 Mb/s ( Table 3).
The channel coding scheme is developed and the area utilization is tested in Cyclone V FPGA series. Logical Elements required for each of the CC scheme listed in the Table 2 gives a detailed description of individual blocks. In another case Memory requirement is also tested with same criteria as listed in the Table 2.
The sequence number must be encoded separately from the data and must be very reliable to overcome whatever errors the channel conditions have induced in the data. When considering the power estimation, it is done by using quartus powerplay analyzer tool which provided by altera and calculated as 53 mW.
In future, when the design is developed in ASIC platform it is possible improve the throughput and area utilization can also be effectively achieved.

CONCLUSION
In this study, we detailed the design of common MIMO-HSDPA downlink reconfigurable architecture associated with independent turbo coding. The proposed VLSI Downlink architecture for HSDPA achieves a peak throughput rate of 13.5 Mbps at the II operating band (UL: 1850-1910MHZ and DL: 1930-1990. This technique is targeted on Cyclone V series FPGA developed on 28 nm. When compared with recent applications the proposed architecture is capable of attaining minimized delay. Power estimation for the turbo decoder with hybrid cache architecture is estimated as 53mw. In addition to setting a record in turbodecoding throughput (21.37 Mb/s), ultra low-power effectiveness has been demonstrated by the presented implementation concept.