Design of 8-4 and 9-4 Compressors Forhigh Speed Multiplication

This study presents higher order compressors which can be effectively used for high speed multiplications. The proposed compressors offer less delay and area. But the Energy Delay Product (EDP) is slightly higher than lower order compressors. The performance of 8×8, 16×16 and 24×24 multipliers using the proposed higher order compressors has been compared with the same multipliers using lower order compressors and found that the new structures can be used for high speed multiplications. These compressors are simulated with Cadence RTL complier at a temperature of 25°C with the supply voltage of 1.2 V


INTRODUCTION
Multiplication is a fundamental operation in most of the signal processing algorithms. Multipliers have large area, long latency and consume considerable power and the design of good multipliers is always a challenge for VLSI system designers. The objective of a good multiplier is to provide a physically compact, good speed and should consume low power. Multiplication consists of three steps (i) Partial product generation (ii) Partial product reduction (iii) Final product computation. Reduction of partial product stage will affect the multiplier performance in terms of speed and power dissipation in the VLSI circuits. Partial product reduction has high latencies due to the long vertical path. Normally adders are used to reduce the vertical critical path. But adders will create problems like glitches, uneven signal transition; and it will take more number of stages to reduce the partial product reduction. To avoid those problems, compressors needs to be implemented in the multiplier design. (Oklobdzija et al., 1996;Dandapat et al., 2007). The advantage of using compressors is to provide regular structure in partial product reduction stage.
The lower order compressors such as 3 -2, 4 -2, 5 -2 were studied by many researchers (Dandapat et al., 2010) and (Veeramachaneni et al., 2007). In high speed multiplier, 4-2 compressors have been widely used to lower the latency of the partial product reduction stages. Most of the commercial designs in the various processors in the market use 4 to 2 compressor. Even the number of partial product reduction stages cannot be reduced as much using the lower order compressors. Hence the delay of multipliers also was not reduced as much. The higher order compressors (5-3, 6-3 and 7-3) were used to improve the performance of multipliers earlier (Dandapat et al., 2010;Dadda, 1976). They have merged binary counter property into the high order compressors which have further reduced the partial product stages and power consumption. In this study, we have used 7-3 compressor which is designed by four full adders (Dadda, 1976) to improve the performance of multiplier. In this study, we have proposed 8-4 and 9-4 compressors which have further reduced the number of partial product stages of multipliers as compared to existing compressors. Moreover, the proposed compressors use less number of gates, so overall design area decreases. This technique offers less delay, but Energy Delay Product (EDP) is slightly higher than lower order compressor.
The 4-2 compressor has five inputs and produces two outputs and one carry-out. This compressor uses Science Publications AJAS two stages of full adders connected in series. This straight forward implementation has four XOR gate delays. (Oklobdzija et al., 1996). Various approaches have been proposed to improve their speed. As for example, 4-2 compressors were implemented with 3 XOR delays (Hsiao et al., 1998;Gu and Chang, 2003;Chang et al., 2004;Ma and Li, 2008).

Limitations of Lower Order Compressor Design
It is required to make a note of the disadvantages of existing lower order compressors such as: • They require more adders to compute the proper binary weighted output results • It is required to add half adder with a 4-2 compressor and a full adder with a 5-2 compressor to get proper binary weighted results • Uneven signal propagation into the adders leads to some unwanted transitions which increase dynamic power consumption • The third stage of full adder needs some extra time (say ∆) to compute the final sum and out c. Time ∆ will be more for 6-2 and 7-2 compressors

Proposed Compressor Design
The proposed higher order compressors, 8-4 and 9-4 give better performance than the lower order compressors in terms of speed and area. Some of the limitations mentioned above have been minimized in 7-3 compressor (Dadda, 1976). But the delay can be further reduced by using 8-4 and 9-4 compressors. We have developed 8-4 and 9-4 compressors for multipliers. A correct combination of adder has been chosen to develop an efficient 8-4 and 9-4 compressors.

Structure of 8-4 and 9-4 Compressors
Using full and half adder Fig. 1 shows that 8-4 compressor has 8 inputs (I0-I7) and four outputs (X1-X4). This compressor uses counter property so that, output of compressor gives number of 1's at input. For example, if all input bits are 1, then output of the compressor is "1000". In this design, compressor takes four stages of adders to compress the input bits into four output bits. In first stage, two full adders and one half adders are used in parallel. Two full adders are used in second stage. All ''Sum'' outputs from the first stage are fed with one full adder and all ''Carry'' outputs are fed to another adder. One half adder is used in third and fourth stage to produce the result. Totally, we have used four full adders and three half adders.
For example, 4-2 compressor takes four stages and six full adders to compress 8 bits into 4 bits. Proposed compressor has more number of half adders. Half adder often uses less number of gates and occupies less area than full adder.
The critical path delay of the proposed implementation is 6 XOR gate delay. The equations governing the outputs in the proposed 8-4 architecture are shown below Equation (1 to 4): Where: I0 I1; B I0 I1;  c I2 I3 I4   d  I2 I3 I2 I4 I3 I4 ;   e I5 I6 I7   f  I5 I6 I5 I7 I6 Figure 2 shows that 9-4 compressor has 9 inputs (I0-I8) and four outputs (X1-X4). If all input bits are 1, then maximum output for this compressor is 9 ("1001"). Five full adders and two half adders are effectively connected to design the 9-4 compressor. Three full adders are used in first stage and two full adders are used in second stage. Only half adders are used in last two stages. Proposed 9-4 compressor takes only four stages of adders to compress the input bits into four output bits, whereas in lower order compressor, stages can vary depending on the number of input bits. For example, 4-2 compressor takes 6 stages of full adders to design the 9-4 compressor. This leads to increase the delay, power and area. Proposed compressor have used two half adders in critical path. This technique offers less delay instead of using full adders in critical path. As well as, proposed compressor occupies less area than low order compressor. Therefore, we have selected correct pair of adders while designing a high order compressor. Proposed compressor reduces vertical critical path more rapidly than conventional compressor (Oklobdzija et al., 1993). Where: The critical path delay of the proposed 9-4 compressor is 6 XOR gate delay and the number of reduction stages is 4. Main advantages of the proposed compressors than low order compressors are, (1) Uniform XOR gate delay regardless of the input bits (2) Number of reduction stage is less (3) Less number of gates.

Structure of 8-4 and 9-4 Compressors Using Multiplexer
This structure is realized with the help of multiplexer in order to get the better result in terms of power dissipation and energy delay product. Figure 3, 4 and Table 1 shows the implementation of 8-4 and 9-4 compressors.
In a multiplexer, using the selection lines only the part of the structure is active, leaving the rest in idle mode. Thereby saving substantial amount of power and therefore reducing the energy delay product by many folds.

Multiplier Architecture
We have designed three different (8×8, 16×16 and 24×24) multipliers using Wallace tree architecture (Law et al., 1999). These multipliers uses higher order compressors. Figure 5 shows architecture of a 16×16 multiplier. Different types of compressors are used to compute the partial product. Partial products are added in five stages. Proper pairs of compressors/adders have been used in order to reduce the vertical critical path. Let us consider column number fifteen of Fig. 5, which has fifteen dot products. We can use one 7-3 and 6-3 compressors and half adder to that column. This combination produces eight outputs. Instead of that, one 8-4 and 7-3 compressors could be a better option. This combination produces only seven outputs. By choosing proper combination of compressors/adders, we can minimize the critical path.
In Fig.  3 Vertical box indicates the compressors/adder. If any of the columns is not covered in the boxes, those products can be passed to the next stage. If the box is horizontal in direction, it indicates that the parallel adders have been used. We have used ripple carry adders in parallel adder. All three multipliers are designed very efficiently. We have designed the multiplier which has less number of adders/compressors. Now let us consider column 31 which has two vertical dots and column 32 which has one dot. Instead of using one half adders in column 31, we directly propagated those two dots into the next stages. This minimizes the number of adders in the multiplier.

Multiplier Performance and Comparison
We have used 3-2 (Hsiao et al., 1998) and 4-2 compressors of (Chang et al., 2004;Ng and Lau, 1999;Prasad and Parhi, 2001;Baran et al., 2010;Ma and Li, 2008) in our 8 bit, 16 bit and 24 bit multipliers and found that our proposed higher order compressors give higher speed and lesser area Table 2. Figure 6-8 respectively shows the speed, area and power comparison of a multiplier using both low and high order compressors. Speed of the higher order compressor multiplier increases when multiplication bit increases. For 8 bit multiplication the speed improvement is 4% than low order compressor design. Similarly, speed improvement of higher order compressor for 16 bit is 9.04 and 9.3% for 24 bit multiplier. High order compressors have less gate count and it occupies less area than conventional compressors. High order compressor consumes more power than low order compressor. For 8bit, 16 bit and 24 bit multiplier the power consumption is increased by 10, 22.6 and 26.7% respectively.

CONCLUSION
Conventional multiplier uses low order compressors in the partial product reduction stage which provides uneven signal transition to the multiplier. Higher order compressors have been introduced to reduce the vertical critical path and also reduce the number of stages. Proposed compressors are designed with lesser number of gates. The proposed compressors give better results in terms of speed and area. Higher order compressor consumes more power than low order compressor and EDP of the higher order compressor is slightly higher than low order compressor. Using proposed structure one can make higher bit multiplications faster.