Field Programmable Gate Arrays Based Realization of Truncated Multipliers

: Problem statement: Due to high cost and non reconfiguration of Application Specific Integrated Circuits (ASICs) in image processing applications, for example MPEG video compression used in CT scan frames requires real time conditions and the algorithms should be verified and optimized before implementation. Approach: Field Programmable Gate Array (FPGA) provides reconfiguration and implementation at the same time. Results: The implementation results of truncated multipliers on Sparatn-3An FPGA showed significant improvement as compared to Virtex and Virtex-E FPGA devices. Conclusion: Truncated multipliers can be used in medical imaging technology such as CT scan.


INTRODUCTION
For large value operands in airthematic operations, multiplication has always been hardware-, time-and power-consuming computation. This is more pronounced in digital signal processing (DSP) applications that constitute large number of multiplications. In the DSP computational complexity of algorithms has increased to such extent that they require fast and efficient parallel multipliers (Agostini et al., 2007;Gierenz et al., 2010;Kong et al., 2008;Zemva and Verderber, 2007;Rais, 2009a;2009b;Rais, 2010;Rais et al., 2010).
Realization of DSP algorithm requires the algorithms should be verified and optimized before implementation. For this purpose, the Field Programmable Gate Arrays (FPGAs) have emerged as a platform of choice for efficient hardware implementation and an attractive alternative to Application Specific Integrated Circuits (ASICs) (Rais, 2009a;2009b;Rais, 2010;Rais et al., 2010).
Truncated multipliers do not form all of the leastsignificant columns. The delay, area and power consumption of the arithmetic unit significantly reduced as more columns are eliminated. The basic idea of this technique is to reject some of the less significant partial products. In place of removed partial products a compensation circuit is introduced that to a certain extent compensates for the dropped terms, thus reducing approximation error.
To achieve hardware efficient realization of a truncated multiplier several research efforts have been presented in literature (Rais, 2009a;2009b;Rais, 2010).

MATERIALS AND METHODS
Architecture platform: FPGAs are an ideal platform for the implementation of computationally intensive and massively parallel architecture, as they are parallel in nature and have high frequency. Here brief introductions about Spartan-3, Virtex and Virtex-E FPGAs from Xilinx are presented.

Spartan-3 FPGAs:
The Spartan-3 FPGA is from the fifth generation of Xilinx family. Particularly, it is designed to meet the needs of high volume, low unit cost electronic systems. The family includes eight member offering densities ranging from 50,000 to five million system gates (Xilinx, 2009). The Spartan-3 FPGA consists of five fundamental programmable functional elements: Configurable logic blocks (CLBs), input/output blocks (IOBs), Block RAMs, dedicated multipliers (18×18) and Digital Clock Managers (DCMs).
Virtex FPGAs: Virtex devices feature a flexible, regular architecture that comprises an array of CLBs, surrounded by programmable IOBs, all interconnected by a rich hierarchy of fast, versatile routing resources.
The Virtex family comprises of nine members offering densities ranging from 57,906-1,124,022 system gates (Xilinx, 2001). The abundance of routing resources permits the Virtex family to accommodate even the largest and most complex designs.
Virtex FPGAs are SRAM-based and are customized by loading configuration data into internal memory cells. In some modes, the FPGA reads its own configuration data from an external PROM (master serial mode). Virtex devices provide better performance than previous generations of FPGA. Designs can achieve synchronous system clock rates up to 200 MHz including I/O.

Virtex-E FPGAs:
The Virtex-E FPGA family delivers high-performance high-capacity programmable logic solutions. The Virtex-E family offers up to 43,200 logic cells in devices up to 30% faster than the Virtex family.
The Virtex-E family delivers a high-speed and high-capacity programmable logic solution. The Virtex-E family comprises the eleven members offering densities ranging from 71,693-4,074,387 system gates (Xilinx, 2002).
Virtex-E devices have up to 640 Kb of faster (250 MHz) block SelectRAM, but the individual RAMs are the same size and structure as in the Virtex family. They also have eight DLLs instead of the four in Virtex devices. Each individual DLL is slightly improved with easier clock mirroring and 4x frequency multiplication. The Virtex-E devices built aggressive 6-layer metal 0.18 µm CMOS process.
Virtex-E devices feature a flexible, regular architecture that comprises an array of CLBs surrounded by programmable IOBs, all interconnected by a rich hierarchy of fast, versatile routing resources. Virtex-E FPGAs are SRAM-based and are customized by loading configuration data into internal memory cells. Designs can achieve synchronous system clock rates up to 240 MHz including I/O or 622 Mb/s using Source Synchronous data transmission architectures.

FPGA Design and implementation:
The design of standard and truncated multipliers are done using VHDL and implemented in a Xilinx Spartan-3AN XC3S700AN (package: fgg484, speed grade: -5), Virtex XCV50 (package: fg256, speed grade: -6) and Virtex-E XCV50E (package: fg256, speed grade: -8) FPGAs using the Xilinx ISE 9.2i design tool (Xilinx, 2007). Figures 1-2 show the differences in average connection delay and maximum pin delay for FPGA devices. The reduction in pin delay and the number of occupied slices used in truncated multiplier also show that it is one of the feasible solutions for medical image processing applications, such as CT scan, where most of the redundant information can be removed. Table 1-3 summarize the FPGA device resources utilization for standard and truncated multipliers. Table 4 presents the percentage change between the standard to truncated multipliers, which clearly demonstrates that the occupied slices ranges from 145-170% for Spartan-3AN, Virtex and Virtex-E FPGA devices.

CONCLUSION
We have presented hardware design and implementation of FPGA based parallel architecture for standard and truncated multipliers using VHDL. The design was implemented on Xilinx Spartan-3AN XC3S700AN, Virtex XCV50 and Virtex-E XCV50E FPGA devices using the ISE 9.2i design tool. The FPGA devices used almost same number of occupied slices but their average connection and maximum pin delays are different; which clearly indicates that the Spartan-3AN is better FPGA device than other Virtex and Virtex-E FPGAs. The truncated multipliers can be used in medical imaging technology, such as CT scan, due to reduced resources of FPGA and thus possibilities of utilization of real time conditions.

ACKNOWLEDGEMENT
The researchers acknowledge the assistance and the financial support provided by the Cornea Research Chair, College of Applied Medical Sciences, King Saud University.