FPGA Implementation of Power Aware FIR Filter Using Reduced Transition Pipelined Variable Precision Gating

: With the emergence of portable computing and communication system, power awareness is one of the major objectives of VLSI Design. This is its ability to scale power consumption based on the time-varying nature of inputs. Even though the system is not designed for being power aware, systems display variations in power consumption as conditions change. This implies, by the definition above, that all systems are naturally power aware to some extent. However, one would expect that some systems are more power aware than others. Equivalently, the system should be able to re designed to increase their power awareness. This research proposes a pipelined Variable precision gating scheme to improve the power awareness of the system. This research illustrates this technique by applying it to FPGA Implementation of multipliers and digital FIR filters. This proposed technique is to clock gating to registers in both data flow direction and vertical to data flow direction within the individual pipeline stage based on the input data precision. For signed multipliers using 2’s complement representation, sign extension, which wastes power and causes longer delay, could be avoided by implementing this technique. Very little additional area is needed for this technique. The designed circuit is simulated, synthesized and implemented in Xilinx Spartan 3e FPGA. The Power is analyzed for the designed circuit and the power saving of 18 % obtained for the proposed FIR Filter with 3 % increase in area compared to the existing pipeline gating design


INTRODUCTION
Growing of battery-operated multimedia devices requires energy-efficient circuits, particularly digital multipliers which are building blocks of digital signal processors. Though many efforts have been focused on the improvement of multiplier designs [1,3,4,6,7] to challenge the high speed circuit is the high power consumption, which is not a tolerable price to pay in recent mobile technologies. Digital multipliers are the most critical arithmetic functional unit in many DSP applications, e.g., Fourier Transform, DCT, filtering, etc., Array and parallel multipliers are very welcomed due to their high execution speed and throughput. The energy consumption of CMOS logic is proportional to the number of transitions, i.e., P = C total V o 2 f, where C total is the load, V o denotes the voltage swing and f is the switching frequency. Many prior techniques were aimed at transition or switch reductions to reduce power dissipation. A 2-dimensional signal gating method for low power array multiplier design [1] approach provides gating lines for both multiplicand and multiplier operands. By deactivated different regions in the multiplier, power dissipation could be reduced by reducing unwanted transitions. This approach is for non-pipelined array multiplier and cannot be extended to pipelined design because it cannot reduce the switching activities in registers. Another technique introduces a selective method [2] to design power-aware multiplier. This method is also for non-pipelined designs and brings high area cost. Polarity-inversion technique [3] used for the adders in signed multiplier. This method not taken care of power dissipation due to unwanted transitions due to sign extension problem so that the multiplicands in lower precision still cannot be processed directly. A clock gating method [6] is used to design a reconfigurable multiplier. In this method pipeline stages are selectively disabled by gating clocks and to select correct results by multiplexers. This method requires less additional hardware resource. It gives the better power reduction and latency reduction due to reduced switching activities in the pipeline stages. Still there is a possibility of reducing the power consumption of pipelined system by using the gating signal with in the pipeline stage vertical to data flow direction and keeping fixed latency.
This study proposes a novel technique, called pipelined Variable precision gating. This technique is to gate the clock to the registers in both vertical direction and horizontal direction to the data flow. The additional area cost to implement this technique to design pipelined multiplier is very little and the overhead is hardly noticeable. The effectiveness of the multiplier will increase with the growth of the multiplication length. In that variable precision gating, current input precision information is provided through gating signals from precision detection circuit. These signals are combined with system clock to generate sub-clocks, which are connected to the corresponding registers in all pipeline stages. The proposed technique is suitable for both signed and unsigned multipliers. The designed circuit is simulated and implemented using Xilinx Spartan 3E FPGA. The Power is analyzed forthe designed circuit using Xilinx ISE XPower Analyzer tool.

Pipeline gating technique:
In Pipeline gating technique [6,7] , clock gating used in Data path direction in the pipeline. The system clock is distributed to the different sections as a sub clock based on the precision of the input data. Each sub-clock is connected to one pipeline stage and drives all registers in that stage. Based on the input data, some of the register stages need not be used in the calculation of result, so that stage is disabled and result will be diverted to the output through MUX as shown in the Fig. 1, this reduces the pipeline latency. As there are no transitions in the masked stage the power consumption also reduced. In general, the above methods are providing power awareness in the digital system. Problem description: The 2-D gating technique [1,8] and selective method gives the power awareness to non pipelined digital system. The polarity inversion Technique [3] not taken care of sign extension problem in multipliers. The pipe line gating technique disabling the stages only in the direction of data flow [6] , also this cannot be directly applied for the pipelined system, because the latency is not fixed. For implementing the pipeline concept, latency must be known and fixed. This study proposes a technique Variable precision Variable precision pipeline gating is to gating the registers in both horizontal to data flow direction in pipeline and within each pipeline stage vertical to the direction of data flow. Also, variable precision pipeline gating needs the same additional hardware as pipeline gating technique and has the same latency reduction. The additional area cost to implement this technique to design pipeline gating multiplier is very little and the overhead is hardly noticeable. The effectiveness will increase with the growth of the multiplication length.
In the variable precision pipeline gating scheme shown in Fig. 2, when under a certain case pipeline stage 4 could be disabled, some of the registers in previous stages (the first two registers in stage 1, 2 and 3) could also be disabled if the data going through them was to be processed only in stage 4 thus is no longer useful. These registers can be disabled by using Clock 3 as their clock inputs.
For the same reason, if stage 3 needs to be disabled, the third and fourth registers in stage 1 and 2 could also be disabled. The total number of transition is further reduced compared to that in pipeline gating system. As the number of registers in each stage as well as the total number of stages in the pipeline (pipeline depth) increase, this further benefit becomes more and more significant. Sub clocks and bypass signal from different stages are pipelined by the registers clocked by system clock; this is to maintain the fixed latency to directly use in any fine grain pipelined system [10] .  Fig. 3 using that multiplier cell is designed in this study.
In Fig. 3, X and Y are inputs while S is the output. When the input precision is 4, for example, calculating 1111×1111, S is generated based on all inner partial products.
If the input precision is 3, for example, calculating 0111×0111, the partial products containing X3 or Y3 are all zero and S only has six digits instead of eight. From a reset-to-zero state, there is no need to let registers propagate these zeros because the reset state of register is zero. So clock 3 connected to these registers can be disabled.
Under a certain input precision, one or more subclocks(c0,c1,c2,c3) may be disabled. The registers connected to these sub-clocks will not function during the calculation. The multiplexers select correct outputs from corresponding stages. For example, while performing 0001×0001, only S0 has useful value. This value is selected from the stage right after the AND matrix. Except this register and the two registers in the first stage for X0 and Y0, all other registers do not function because their clocks have been disabled. The power dissipation is reduced significantly. The output S0 is from the first stage after the AND matrix instead of the eighth one, thus the pipeline latency has also been reduced by a factor of eight in a non pipelined structure. To use this multiplier for fine-grain pipelined FIR Filter, the sub clocks and bypass signals to the multiplexer are passes through registers FF driven by system clock.
The detection of current input precision is a typical interrupt-response scheme for a CPU. For example, when the user of digital camera pushes the button to reduce the resolution, an interrupt is sent to the CPU. Then CPU reads the corresponding register and sets up the clock gating signals based on the register value. So the additional area cost is very low, just a few AND gates and some multiplexers are needed. The clock gating signals are also used as the control signals of these multiplexers. Based on the discussion above, a set of 8-bit variable precision pipeline multiplier were designed (Fig. 4). Both pipeline gating and variable precision pipeline gating techniques have been applied to each multiplier. These multiplier were synthesized by Xilinx ISE tool and power is analyzed using Xilinx Xpower analyzer. During the analysis, the multipliers were given data in different input precision. The power dissipation were recorded and compared.
Sign extension detection for 2's compliment multiplier: In many multimedia and signal processing applications, the bit precision used for data representation is variable depending on the amount of accuracy sought. The number of bits used to represent data directly translates to the number of quantization  levels that the dynamic range could be divided into and determines the quantization error. The variance of the quantization noise in uniform quantization. The higher the bit precision, the greater is the accuracy. However, when algorithms are implemented on standard DSP or general-purpose processors, computational precision is fixed. When it is implemented in ASIC, this is definitely wasteful in terms of power as unnecessary sign extension bits are toggling.
Consider the two cases shown in Fig. 5. The four registers are represented with two's complement 16-bit data [5] . The sign extension bits have been shaded in every register.
Notice that in case of small numbers the MSB's are 0 for positive quantities and 1 for negative quantities. No accuracy is lost if we reject the sign-extension bits. Determining the number of sign-extension bits is relatively simple in hardware and can be done by the circuit shown in Fig. 6. The 1 outputs of the mask determine the signextension bits in each register as shown in Fig. 7. The number of sign-extension bits that can be rejected is

APPLICATION OF VARIABLE PRECISION PIPELINED GATING TECHNIQUE FOR FIR FILTER
Pipelining for low power: Pipelining can be used to reduce the power consumption of a FIR filter. High speed and Lower power are two main advantages of using pipeline processing. Now consider use of these techniques for lowering the power consumption where sample speed does not need to be increased. The propagation delay T pd is associated with charging and discharging of the various gate and stray capacitances in the critical path. For the CMOS circuit, the propagation delay is given in Eq. 1: where, C charge is charge/discharge capacitance, V 0 is supply voltage and V t is threshold voltage. Parameter k is a function of technology parameters µ, W/L and C ox . The power consumption of a CMOS circuit can be estimated using the Eq. 2: where, C total is Load Capacitance,V 0 is supply Voltage and f is the clock frequency. Note that Eq. 3 are based on simple approximations and are approximate only for a 1st-order analysis: represent the power consumption in the original filter. It should be noted that: where, Tseq is the clock period of the original. Now consider an M-level pipelined system, where the critical path is reduced to 1/M of its orginal length and the capacitance to be charged/discharged in a single clock cycle is reduced to C charge /M. Notice that the total capacitance does not change. If the same clock speed is maintained, i.e, the clock frequency f is maintained, only a fraction of the original capacitance, C charge /M, is being charged and discharged in the same amount of time that was previously needed to charged/discharged the capacitance, C charge shown in Fig. 8. This implies then that the supply voltage can be reduced to βV 0 , where β is a positive constant less than 1. Hence, the power consumption of pipelined filter will be: Therefore, the power consumption of the pipelined systems has been reduced by a factor of β 2 as compared with the original system. The power consumption reduction factor, β, can be determined by examining the relationship between the propagation delay of the original filter and the pipelined filter.
The propagation delay of the original filter is given by: The propagation delay of the Pipelined filter is given by: It should be noted that the clock period, T clk, is usually set equal to the maximum propagation delay T pd in a circuit. Once β is obtained, the reduced power consumption of the pipelined filter can be computed using Eq. 5.

Pipelined FIR filter design:
To improve the throughput of the FIR filter, one commonly used method is to pipeline the multipliers and adders. Since the multiplication time TM is usually much larger than the addition time TA, much shorter critical path length can be achieved by carefully balancing the pipeline stages. Figure 9 shows the pipelining scheme for FIR filter. In Fig. 9, each multiplier is divided into two pipeline stages. A series of registers is added between the two sub-multipliers. The time taken by each stage of the multiplier is denoted by TM1 and TM2, respectively; and the delay time in the added registers is denoted by TDR [10] .
By pipelining multipliers can only achieve limited throughput improvement. Assume the number of pipeline stages that the multipliers can be divided to approaches infinity; the slowest stage will contain the adder and a very small part of multiplier.
So the length of critical path is approaching TA. When the word length of input data and coefficients is short, TA is small enough for the FIR to operate in high sampling frequency. In recent years, the word length of FIR filter has been growing from 8 and 16 bit, up to 32 and 64 bit. Under long word length condition, addition also takes significant time. Instead of pipelined multipliers, adders become bottleneck in these FIR filters under such conditions.
To further improve throughput of FIR filters, the critical path in addition process needs to be shortened too. So adders, as well as multipliers, need to be pipelined. Pipelining one adder changes the timing relationship between the two inputs of the next adder. Unlike pipelining multipliers, which doesn't change the relative timing sequence between adder inputs, pipelining adders just likes adding delay elements to the paths between adders. So additional delay elements need to be added between next adder and its corresponding multiplier.
A set of 8 tap FIR filter design is proposed in this study. The same as in multipliers design, the average power dissipation is significantly reduced by applying variable precision pipeline gating technique. The power reduction rate of variable precision pipeline gating design is much better than that of pipeline gating design. On reducing pipeline latency, pipeline gating and variable precision gating techniques have the same  Fig. 9: FIR filter by pipelining multipliers rate of advantage. By selecting the correct outputs from corresponding stages, the total pipeline latency is significantly reduced Based on the discussion above, a set of 8 tap FIR Filter using Array, pipelined and variable precision pipeline multiplier were designed. This designed circuit were implemented using Xilinx Spartan 3e FPGA and power is analyzed using Xilinx Xpower analyzer.

XPOWER OVERVIEW
XPower is the power-analysis software available for programmable logic design. It enables to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices. XPower includes both interactive (xpower) and batch (xpwr) applications. Earlier in the design flow than ever, the total device power, power per-net, routed, partially routed or unrouted designs can be analyzed, all driven from a comprehensive graphic interface or command-line driven batch-mode. XPower also reads VCD simulation data from the ModelSim family of HDL simulators to set estimation stimulus, reducing setup time, as well as from the additional simulators listed in simulator support. Xpower tool flow is shown in Fig. 10. Physical constraints file: A Physical Constraints File (PCF) is a text file containing two separate sections: a section for those physical constraints created by the mapper and a section for physical constraints entered by the user. Temperature, voltage, Max delay and Time graphs are read from the physical constraints file (PCF). Therefore, for accurate power estimations, proper PCF is used must be used.
Settings file: Specifies a settings file (*_xpwr.xml) to be used by XPower. A settings file is an XML-based file that represents the current state of the power data, constrained by the reporting options that already have specified. This file is generated by XPower when settings are saved and is used to restore the settings.
Simulation file: Specifies a simulation file (*.vcd) to be used by XPower. This file is the output of a simulation run on the design. IEEE standard VCD files are accepted for input of simulation data.

RESULTS AND DISCUSSION
The performance and implementation details of the proposed technique using Xilinx Spartan 3e FPGA is discussed in this section.
The 8 bit Array multiplier (Array), Pipeline Gating Multiplier (PM) and Variable precision pipeline gating Multiplier (VPM) is designed then this pipelined multipliers were used in 8 Tap FIR filter for Fine grain pipelining. Figure 11 shows the power consumption of different multipliers for different input data precision, it shows that the power consumption of the proposed multiplier is less and power saving is more when precision approaches to 1. Figure 12 gives the power consumption for 8-tap FIR Filter using Array multiplier (FIR array), FIR Filter using Pipeline Gating Multiplier (FIR PM) and FIR filter using Variable precision pipeline gating Multiplier(FIR-VPM), all the filters were tested with same set of input data sequence. From the above result, it shows that the power consumption of the proposed filter is less compared to other design under different input data rate. Figure 13 and 14 shows the implementation details of the existing and proposed multipliers and FIR filters. It is clearly seen that the proposed technique has improved the system speed with small increase in the area along with the advantage of less power consumption as discussed above.
High speed as well as low power consumption is the major concern in the digital system design. Hence the important issues in the variable precision gating pipelined system are the maximum throughput achievable with the minimization of power consumption. Power consumption has been analyzed for three multiplier implementations with same data samples for different precision. The power saving is more when the input data precision is low, ie if the precision approaches 1, then power saving is 27% than the existing pipelined gating technique. The same technique is applied to 8 tap FIR Filter with fine grain pipelining, existing and proposed technique is analyzed with Xilinx Development environment, It clearly shows that the proposed method gives better power reduction for the real time data samples containing data with all precision. When the speed is considered proposed method reduces the critical path delay due to pipelining. The concept has been verified using FPGA implementation, here speed is limited by FPGA interconnects, if it is implemented in ASIC the speed advantage become appreciable.

CONCLUSION
The Pipelined variable precision gating technique is proposed in this study. This Proposed method that consumes less power than the existing methods. The Pipelined variable precision gating technique is explained by designing a multiplier and applied to 8 tap FIR filter. For signed multipliers, it also avoids the sign extension problem while processing low input precision multiplicands When comparing with existing techniques, the proposed method enables the power reduction of 18% with 3% of additional area in the FIR Filter without hit in output quality.