Low Power Multiplier Design Using Latches and Flip-Flops 1

Problem statement: Power dissipation is designated as critical parameter in modern VLSI design field. In VLSI implementation low power concept is necessary to meet Moore’s law and to produce consumer electronics with more back up and less weight. To save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power which is the major part of power dissipation. Multiplication occurs frequently finite impulse response filters, fast Fourier transforms, discrete cosine transform and other important DSP and multimedia kernels. Being one among the functional components of many digital systems the reduction of power dissipation in multipliers should be as much as possible. Approach: In this study a low power structure called Bypass Zero Feed A Directly (BZFAD) for shift and add multipliers was proposed for reducing the switching activity. Results: The simulation results showed conventional and proposed BZFAD 8 bit multipliers. Conclusion: From these results, BZFAD can attain considerable power reduction and area saving when compared to the conventional shift and add multipliers.


INTRODUCTION
The Golden formula for calculating dynamic power dissipation is P = C L V 2 f. Power reduction can be achieved by various manners. They are reduction of output load capacitance C L , reduction of power supply voltage V and reduction of clock frequency f. Many research efforts have been devoted to reduce the power dissipation in different multipliers (Chandrakasan et al., 1992;Shen and Chen, 2002;Chen et al., 2005). Among multipliers, tree multipliers are used in high speed applications such as filters, but these require large area (Chen and Chu, 2007;Huang and Ercegovac, 2005). The Carry-Select-Adder (CSA)based radix multipliers, which have lower area overhead, employ a greater number of active transistors for the multiplication operation and hence consume more power. Among other multipliers, shift-and-add multipliers have been used in many applications for their simplicity and relatively small area requirement (Nelson and Nagle, 1995;Wang et al., 2004).
Steps leading to a low power multiplier: Architecture of a conventional shift and add multiplier with multiplicand A and multiplier B is shown in Fig. 1. In the conventional shift and add multiplier, multiplier B is needed to be shifted in every cycle. The LSB obtained by shifting of B is connected to the select pin of multiplexer mux_A. It decides whether the multiplicand A or 0 is to be added with the partial product obtained in the previous cycle for forming the new partial product. From this operation of conventional shift and add multiplier we can see that there are many sources of switching activities present in the conventional architecture. There are 6 major sources of switching activity in the multiplier, which are marked in the dashed ovals in the Fig. 1. They are, (a) shifting of the B register, (b) activity in the counter, (c) activity in the adder, (d) switching between '0' and A in the multiplexer, (e) activity in the multiplexer select, (f) shift of the partial product register.
Power consumption can be lowered by minimizing or removing any of the above switching activity. More power reduction can be achieved by reducing the switching activity of nodes with higher capacitance. As an example B (0) is the selector line of the multiplexer which is connected to K gates for a K bit multiplier. If we somehow eliminate this node, noticeable power saving can be achieved.
The BZFAD architecture: By eliminating or reducing the sources switching activity described in the registers and counters low power architecture of multiplier can be derived. The proposed BZFAD architecture with multiplicand 'A' and multiplier 'B' is shown in Fig. 2. In the BZFAD architecture (Mottaghi et al., 2009), for getting the reduction of power consumption the main areas concentrated are described as in Fig. 1.

Shifting of b register (multiplier) using hot block ring counter:
In the conventional architecture register B should be shifted to the right in every cycle. It's right bit appear as B (0). B (0) is used to select between A (multiplicand) and 0. If B (0) is one then A should be added to the previous partial product and, if B (0) is zero, then '0' should be added to the previous partial product.
The rights shifting of B in each cycle give rise to some switching activities. To avoid this we use a low power ring counter to select the required bit of B without shifting in each cycle. The BZFAD architecture uses a multiplexer with a one hot encoded bus selector choosing the hot bit of B in each cycle.

Operation in the adder with feeder and bypass registers:
In the conventional architecture if LSB of B equal to zero then the current partial product is added to zero and if LSB of B equal to one then the current partial product is added to A. Addition of zero leads to unnecessary transitions in the adder that is, the adder can be bypassed if 0 is to be added and the partial product is required to be shifted to right by one bit. In BZFAD architecture the modifications are made by using Feeder and Bypass registers. The operations of the adder are optimized using these two registers.
Feeder and Bypass registers are used to bypass the adder in cycles in which the LSB of B is zero. In each cycle the hot bit of next cycle is checked and the following operations are performed: • Feeder is clocked if LSB of B in next cycle equal to "1" • Bypass is clocked if LSB of B in next cycle equal to "0" Thus the current partial product is stored either in feeder or in bypass register. NAND and NOR gates are used to clock feeder and bypass registers. Since these are inverting logic inverted clock is fed to them. It is shown as ~Clock in Fig. 2. Thus the reduction of switching activity in adder is mainly due to the following reasons. The right input of adder is A, which is constant during multiplication. This enables us to removing the multiplexer and feeding A directly to the adder, resulting in a noticeable power saving. In each cycle when the next hot bit is zero, feeder is not clocked and current partial product is stored in bypass register. So there is no transition in the adder input which also causes power saving.
Shifting of the partial product register using P Low latch: The computation process of a multiplier manipulates two input data to generate many partial products for subsequent addition operations, which require a lot of switching activities. Thus switching activities within the functional units of a multiplier accounts for the majority of power dissipation of a multiplier as described in the following equation: Where: α = The switching activity parameter C = The loading capacitor V dd = The operating voltage f clk = The operating frequency Minimizing switching activities can efficiently lower power dissipation without affecting the circuit operation performance (Chen et al., 2003).
The least significant bits of partial products need not to be shifted for completing the multiplication. We take advantage of this observation that is the multiplication can be completed by processing the most significant bits of partial products. The BZFAD architecture uses P Low latch to store the lower half of partial product. It has K latches for K bit multiplier.
In the first cycle, the LSB of partial product become finalized and is stored in the right most latch of P Low . In the subsequent cycles the next LSBs are finalized and stored in the proper latches. The ring counter output is used to open the proper latch. Using this method no shifting of lower half of partial product is required; shifting is required only for the higher half of the partial product.
Hot block ring counter: The ring counter used in the BZFAD multiplier is noticeably wider than the binary counter used in the conventional architecture. Therefore an ordinary ring counter, if used in BZFAD would raise more transitions than its binary counterpart in the conventional architecture. To minimize the switching activity of the counter we use low power ring counter. Steps towards hot block ring counter: According to the previous discussion some flip-flops can be clock gated leading to fewer switching activities. A flip-flop in a ring counter must be clocked if and only if either its input or its output is "1" immediately before the triggering clock edge comes. Therefore only 2 flip flops must be clocked in each cycle.
The clock gating logic in the Fig. 4 ORs the value of flip flop's input and output on positive clock edges stores the result in a latch. The output of the latch determines whether or not to gate the clock signal. This clock gator is positive edge triggered.
If we want to avoid all the unnecessary transitions raised by the clock signal we should provide each flipflop with the clock gating circuitry of Fig. 4, but this solution ends up with a large area overhead plus due to transitions in clock gator themselves the resulting ring counter will not have fewer switching activity. A better solution is used in the BZFAD architecture.
One of the important properties of the ring counter is that its output is one hot encoded in Fig. 5. This property of the ring counter makes its output wide especially as the counter size increases. As an example, consider a 5 bit binary counter which counts from 0-31. A ring counter with the same counting range is 32 bit wide. To reduce the switching activity of the counter the counter is partitioned in to a number of blocks which are clock gated with a special clock gating structure whose power and area overheads are independent of the block size. The clock gating structure is shown in Fig. 6.
In the partitioned ring counter there is exactly one block should be clocked, it is the block which is having a "1" in it at that time (except for the case that one leaves a block and enters another). The block which is clocked is called hot block. Therefore for each block the Clock Gating structure (CG) should only know whether "1" has entered the block (from the right) and has not yet left it (from the left). The CG starts passing the clock pulses to the block once the "1" appears at the input of the first flip flop of the block. It shuts off the clock pulses after the "1" leaves the left most flip flop of the block. In Fig. 6 the signals entrance and exit is coming from the neighboring right and left blocks. The entrance and exit signals have special meanings as follows. When"1" entrance means that "1" is about to enter the block in the next cycle. This line is connected to the input of the left most flip flop in the right hand block. The exit signal indicates that "1" has left the block and hence it should no longer be clocked.
In Fig. 2 the s M1 and M2 are controlled by the same hot block ring counter. So the low power ring counter has the main role in BZFAD architecture.

MATERIALS AND METHODS
In this study, we propose a low-power, low-area multiplier using BZFAD architecture. The BZFAD architecture avoids the unwanted addition and thus minimizes the switching power dissipation. The conventional (Chen et al., 2005;2006) and proposed design of 8 bit multiplier functionality can be verified using Model-Sim software and synthesized for getting power and delay report with Xilinx software back end tool.

RESULTS
The simulation results of conventional and proposed BZFAD 8 bit multipliers are shown in Fig. 7 and 8. The Fig. 9 and 10 shows the power reports of both multipliers.

Area report for conventional shift and add multiplier is given below:
Design Information Command Line: Map -p xc2s100-fg456-5 -cm area -k 4 -c 100 -tx off eight_conventional.ngd Target  Area report for BZFAD multiplier is given below:

DISCUSSION
Based on the reports generated at the time of synthesis process, a comparison is made between two multipliers and it is shown in Table 1.

CONCLUSION
The proposed BZFAD architecture lowers the power dissipation and area when compared to a conventional shift and add multiplier. A multiplexer with one hot encoded bus selector is used for avoiding the switching activity due to the shifting of the multiplier register. Feeder and bypass registers are used for avoiding the unnecessary additions. The BZFAD architecture makes use of a low power ring counter. The design can be verified using Model-Sim with VHDL code and Xilinx tool for synthesis process. From these results, BZFAD can attain considerable power reduction and area saving when compared to the conventional shift and add multipliers.