On-chip Implementation of High Resolution High Speed Floating Point Adder/Subtractor with Reducing Mean Latency for OFDM

,


Recently
Orthogonal Frequency Division Multiplexing [1] (OFDM) techniques have received immense attention in high-speed data communication systems such as Wireless local Area Network (WAN), digital audio/video broadcasting and beyond 3G research [2] .
OFDM is a multi-carrier transmission technique, which divides the available spectrum into many carriers, each one being modulated by a low rate data stream.Then Inverse Fast Fourier Transform (IFFT) or Fast Fourier Transform (FFT) transforms are used to get the signal in time domain or spectrum frequency.FFT reduces the complexity of the OFDM.Hence the performance of the FFT has highly effect on the OFDM system result.
High resolution, high speed and low latency, FFT would not be achieved unless with efficient elements in particular such the butterfly component.The performance of butterfly is greatly depending on its adders and subtractors.Hence, this research with emphasis of the simple structures, focused on design and implementation of efficient high performance floating point adder/subtractor for FFT algorithm.
Narasimhan [3] in 1993 proposed 8 stage floating point adder for FPGA.The specifications for the adder stated that it should work at 62.5 MHz.
In 2004, Thompson [4] presented a decimal floatingpoint adder with 5 stages that compiled with the current draft revision of IEEE-754 Standard.The adder supported operations on 64 bit (16 digit) decimal floating-point operands.Initial synthesis testing and evaluation was done and performed using Synopsys Design Compiler and LSI Logic's Gflxp 0.11 micron CMOS standard cell library.
In 2005, Huang et al. [5] presented high speed double precision floating point adder using customdesigned macro modules and other advanced optimization technology.Based on SMIC six-layermetal CMOS process, he achieved a 4 stage pipelined double precision floating point adder which could complete a floating point addition in 7.72 ns.The total gate count in this system is 37977 gates.
All the results show that the effort is taken to achieve low latency and area, high resolution and speed by introducing the new algorithms.In this study the advanced floating point adder with pipelined structure is presented.
The similar work also represented within year 2005 to 2009 [9][10][11] , to show engineers effort for increasing the capability of floating point calculation.However still there are not enough available resources regarding floating point arithmetic.
Design and implementation: Based on IEEE-754 standard for floating point [6] arithmetic 32 bit data register is considered to allocate mantissa, exponent and sign bit in a portion of 23, 8 and 1 bit respectively.The advantage of this adder is that it can be easily switched to 12 bit mantissa 8 bit power and 1 bit sign bit arithmetic calculations.Additionally the floating-point adder unit performs the addition and subtraction using substantially the same hardware as used for floating-point operations.This advantage causes saving the core area by minimizing the number of elements.Fig. 1 shows the new structure of the floating point adder when it is divided to the four separate blocks.The purpose is that to share total critical path delay into three equal blocks.These blocks calculate the arithmetic function within 1 clock cycle.However the propagation delay can be associated with continues assignment [7] so that to increase the overall critical path delay and the reason of slowing down the throughput.Hence in this study, the effort is taken to reduce the worst effect of delay for arithmetic calculations.
Based on combinational circuit design, the output of each stage is only depending on its input value at the time.As shown in the Fig. 1, each block creates the output within 1 clock cycles at the time.The unique structure of this adder enables us to feed the output result in pipeline registers after every clock cycles.Hence, the sequential structure is applied for overall pipelined add/subtractor algorithm to combine the stages.
The pipeline floating point adder structure consists of compare stage, aligned mantissa, add/subtractor stage and finally normalized stage.
While the effective operation is determined the blocks, compare stage and aligned mantissa lead us to align the mantissa and exponent accordingly to make sure having the same exponent in two operands.

Fig. 1: Block diagram of the proposed adder
The significant point is to design the function so that all calculation performs within one clock cycle.The basic operation of the aligned mantissa and normalized block is shifting.Every shifting need one clock cycle and it causes huge delay to align 32 bit operand.Hence, the advanced algorithm is applied to avoid having many delays on aligning stage.This is the third advantage of the mentioned adder.In addition to save power consumption, the proposed architecture also offers power savings due to the simplification of data paths.Ignoring the shifting blocks and combine the adder and subtractor has the significant role to reduce power-effective in the overall system.
The total number of gate count in the system is 6691 which is given by synthesis report of Xilinx ISE synthesis software.This total gate count proves the low area and low power consumption for proposed floating point adder/subtractor, whereas the almost similar project implements the adder with the equivalent gate count of 37977 [5] .

MATERIALS AND METHODS
The Floating point adder designed according to the IEEE 754 standard.The design process was to create software model of efficient pipeline floating point adder algorithm using Verilog HDL programming language.The system was simulated by Modelsim and its function was fully covered accordingly.
The design was emulated the following characteristics; Floating point arithmetic computation, Pipelined addition and subtractions and finally data and factor precision.The proposed adder was fabricated on FPGA Virtex-II chip and the layout core was submitted respectively.

RESULTS AND DISCUSSION
Implementation result: The implementation was modeled in Verilog HDL code and simulated by Modelsim software.The design was synthesized by Xilinx-ISE software and downloaded to FPGA Virtex II.
Furthermore, the minimum clock periods increased sharply after add/sub stage was applied.This issue can be explained by utilizing FPGA adder to calculate fixed point arithmetic.However the speed result is high enough to cover the subject and for future study, enhancing the maximum frequency will be obtained by applying high speed prefix adder [8] to calculate fix point arithmetic.The chip layout on Virtex II FPGA board has been shown in Fig. 7.

CONCLUSION
Efficient algorithm of high resolution high speed low area floating point adder/subtractor with reducing mean latency for OFDM applications was designed and investigated.The structure of the adder is consisting of 4 parts called compare stage, aligned stage, add/sub stage and finally normalized stage.Each block calculates the arithmetic operation within 1 clock cycle.The result will be inputted to the pipeline register and it is the reason to reduce the latency.The unique structure of this adder ignores all the shifter cells and replace it with multiplexer to reduce delay propagation through each cells.The evaluation indicates that the proposed pipelined adder is attractive due to high resolution (32 bit floating point), low area (6691 gate count) and low latency of 4 clock cycle.The maximum frequency for this adder is 278.428MHz.To increase the speed, prefix adder can be replaced with FPGA structural adder to calculate.

Table 1 :
Compare stage specification HDL synthesis report -

Table 2 :
Align stage specification HDL synthesis report -

Table 3 :
Add/sub stage specification HDL synthesis report -

Table 4 :
Normalized stage specification HDL synthesis report -