Field Programmable Gate Arrays based Design, Implementation and Delay Study of Braun’s Multipliers

:Problem statement: Parallel array multipliers are required to achieve high execution speed for Digital Signal Processing (DSP) applications. Approach: The purpose of this article is to investigate Field Programmable Gate Arrays (FPGAs) implementation of standard Braun’s multipliers on Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGAs using Very high speed integrated circuit Hardware Description Language (VHDL). The delay study was analyzed using Analysis Of Variance (ANOVA) method using the software Statistical Package for Social Science (SPSS) with a 0.05 confidence level was used to compare the FPGA devices. Results: The FPGA resource utilization by Virtex-5 is the lowest in value for 4×4, 6×6, 8×8 and 12×12-bit Braun’s multipliers as compared to Spartan-3AN, Virtex-2 and Virtex-4 FPGAs. The average connection delays in Virtex-2 shows consistency and gradual increase in value as the size of multiplier increased. Virtex-2 FPGA demonstrates lower average connection delays as compared to Spartan-3AN, Virtex-4 and Virtex-5 FPGAs. For the maximum pin delay same observations are obtained for Virtex-2 FPGA. The anomalies in maximum pin delay and average connection delay are observed in Virtex-5, Virtex-4 and Spartan-3AN FPGAs. FPGA devices also demonstrate that as the size of multipliers increases their mean latency value is also increases. Conclusion: The FPGA resource utilization by Virtex-5 is the lowest in value for 4×4, 6×6, 8×8 and 12×12-bit Braun’s multipliers as compared to Spartan-3AN, Virtex-2 and Virtex-4 FPGAs. Even value obtained for Virtex-5 FPGA for 4×4 bit standard Braun’s multiplier for number of occupied slices and look up tables are lower in value than reported in literature.


INTRODUCTION
Computational complexities of algorithms require fast and efficient parallel multipliers for Digital Signal Processing (DSP) applications. Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT) involve multiplication intensive algorithms commonly implemented by DSP hardware. Implementation of DSP applications require the algorithms should be verified and optimized before realization. Multiplication has always been hardware-time-and power-consuming computation in airthematic operations. In many cases implementation of DSP algorithm demands using Application Specific Integrated Circuits (ASICs). This is especially required for image processing applications. Since development costs for ASICs are high, algorithms should be verified and optimized before implementation. Field Programmable Gate Arrays (FPGAs)can provide such optimization and implementation of algorithms in real time frame.
The objective of this study is to present study of standard Braun's multipliers (Yeo and Roy, 2005) using Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGA devices. And also to present the statistical evaluation effect of the Braun's multipliers delays in FPGA devices using Analysis Of Variance (ANOVA) and post hoc Tukey's test using the Statistical Package for Social Science (SPSS).

MATERIALS AND METHODS
Architecture platform: FPGAs especially find applications in algorithms that can make use of the massive parallelism offered by their architecture. Significant speedup in computation time can be achieved by assigning computation intensive tasks to hardware and by exploiting the parallelism in algorithms. FPGAs enable a high degree of parallelism and can achieve orders of magnitude speedup over General Purpose Processors (GPPs). This is a result of increasing embedded resources available on FPGA. The inherent parallelism of the logic resources on an FPGA allows for considerable computational throughput even at a low MHz clock rates. The flexibility of the FPGA allows for even higher performance by trading off precision and range in the number format for an increased number of parallel arithmetic units. This has driven a new type of processing called reconfigurable computing, where time intensive tasks are offloaded from software to FPGAs.

Spartan-3 FPGAs:
The Spartan-3 FPGA (Xilinx, 2009) is specifically designed to meet the needs of high volume, low unit cost electronic systems. The family consists of eight member offering densities ranging from 50,000 to five million system gates. The Spartan-3 FPGA consists of five fundamental programmable functional elements: CLBs, IOBs, Block RAMs, dedicated multipliers (18×18) and Digital Clock Managers (DCMs). Spartan-3 family includes Spartan-3L, Spartan-3E, Spartan-3A, Spartan-3A DSP, Spartan-3AN and the extended Spartan-3A FPGAs. Particularly, the Spartan-3AN is used as a target technology in this study.

Virtex-2 FPGAs:
The Virtex-2 FPGA family Xilinx, 2007, is a platform developed for high performance from low-density to high-density designs that are based on IP cores and customized modules. The family delivers complete solutions for telecommunication, wireless, networking, video and DSP applications, including PCI, LVDS and Double-Data-Rate (DDR) interfaces. The leading-edge 0.15-0.12µm CMOS 8layer metal process and the Virtex-2 architecture are optimized for high speed with low power consumption. Combining a wide variety of flexible features and a large range of densities up to 10 million system gates, the Virtex-2 family enhances programmable logic design capabilities and is a powerful alternative to mask-programmed gates arrays. The Virtex-2 family comprises 11 members, ranging from 40K-8M system gates. The Virtex-2 architecture is optimized for highdensity and high-performance logic designs.
The Virtex-2 FPGA family consists of four major elements such as Configurable Logic Blocks (CLBs), Block Select RAM, 18×18-bit dedicated multipliers and Digital Clock Manager (DCM). Xilinx, 2007 consists of three platform families i.e., LX, SX and FX. Virtex-4 devices consumes approximately 50% the power of respective Virtex-2 Pro devices due to static and dynamic power reduction enabled by triple-oxide technology and reduced core voltage and capacitance respectively. The Virtex-4 FPGA family comprises of CLBs, Block RAMs, Xtreme DSP Slices and DCMs.

Virtex-5 FPGAs:
The Virtex-5 FPGA devices Xilinx, 2007 are a programmable alternative to custom ASIC technology. Virtex-5 family provides power-optimized high speed serial transceiver blocks for enhanced serial connectivity, tri-mode Ethernet MACs and highperformance PPC 440 microprocessor embedded blocks. Virtex-5 devices also use triple-oxide technology for reducing the static power consumption. Their 1.0V core voltage and 65nm implementation process leads also to dynamic power consumption reduction as compared to Virtex-4 devices. Advanced DSP48E slices are available in Virtex-5 FPGAs that helps in accelerating computation intensive DSP and image processing algorithms.
Braun's multipliers: Braun's multiplier is an m×n bit parallel multiplier and generally known as carry save multiplier and is constructed with m× (n-1) adders and m×n and gates. The Braun's multiplier has a glitching problem which is due to the ripple carry adder in the last stage of the multiplier.
Mathematical basis of braun's multiplier: Consider a generic m by n multiplication of two unsigned n-bit numbers Y = Y m-1 …. Y 0 and X = X n-1… X 0 : The product P = P 2n-1 …P 1 P 0 , which results from multiplying the multiplicand Y by the multiplier X, can be written as follows:
Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGA devices demonstrate that as the size of multipliers increases their mean latency value is also increases. The same finding is obtained here as reported in Al Mijalli (2011a and 2011b); Rais and Al Mijalli (2011c) that as the size of multiplier increases the mean delay time also increases.
The one-way ANOVA on Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGAs are shown in Table 6. There is a statistically significant difference at the0.05 level in delay time for the four devices [F (3, 76) = 19.546, p = 0.000] compared by using ANOVA and post-hoc Tukey HSD multiple comparison tests at the 0.05 significance level. The test indicates that the mean of delay time for Virtex-5 (Mean = 9.03, Standard Deviation = 2.36) is significantly different from the other three devices; Spartan-3AN (Mean = 15.65, Standard Deviation = 3.26), Virtex-2 (Mean = 11.77, Standard Deviation = 3.26) and, Virtex-4 (Mean = 11.92, Standard Deviation = 1.84). However, there is no statistically significant difference in mean delay times of the devices between Virtex-2 and Virtex-4.
Average connection delays are much lower in Virtex-2 FPGA as compared to Spartan-3AN, Virtex-4 and Virtex-5 FPGAs. As well as there is consistency and gradual increase is seen in Virtex-2 average connection delay. For the maximum pin delay same observation is obtained for Virtex-2 FPGA that the values are gradually increasing as the size of multiplier increases. The anomalies in maximum pin delay and average connection delay are observed in Virtex-5, Virtex-4 and Spartan-3AN FPGAs. Figure 1 shows the average value of mean delay time for Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGAs devices for 4×4, 6×6, 8×8 and 12×12 Braun's multipliers, which clearly indicate that the average value of mean delay time for Virtex-5 is much lower in value than Spartan-3AN, Virtex-4 and Virtex-2 FPGAs.       Spartan-3AN XC3S700AN, Virtex-2 XC2V40, Virtex-4 XC4VLX40 and Virtex-5 XC5VLX50 FPGA devices using the ISE 9.2i design tool. The FPGA resource utilization by Virtex-5 is the lowest in value for 4×4, 6×6, 8×8 and 12×12-bit Braun's multipliers as compared to Spartan-3AN, Virtex-2 and Virtex-4 FPGAs. Even value obtained for Virtex-5 FPGA for 4×4 bit standard Braun's multiplier for number of occupied slices and look up tables are lower in value than reported in literature. Average connection delays are much lower in Virtex-2 FPGA as compared to Spartan-3AN, Virtex-4 and Virtex-5 FPGAs. As well as there is consistency and gradual increase is seen in Virtex-2 average connection delay. For the maximum pin delay same observation is obtained for Virtex-2 FPGA that the values are gradually increased as the size of multiplier increases. The anomalies in maximum pin delay and average connection delay are observed in Virtex-5, Virtex-4 and Spartan-3AN FPGAs.
There is a statistically significant difference at the 0.05 level in delay time for the four devices compared by using ANOVA and post-hoc Tukey HSD multiple comparison tests at the0.05 significance level. The test indicates that the mean of delay time for Virtex-5 and Spartan-3AN are significantly different from the other two devices; Virtex-2 and Virtex-4 FPGAs. However, there is no statistically significant difference in mean delay times of the devices between Virtex-2 and Virtex-4.
Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGA devices also demonstrates that as the size of multipliers increases their mean latency value is also increases.