A Quasi Delay Insensitive Reduced Stack Pre-Charged Half Buffer based High Speed Adder using pipeline templates for Asynchronous Circuits

A bstract: Problem statement: Recent research in asynchronous design technique is making asynchronous circuits an increasingly practical alternative. These challenges include the increasing pressure for low power, the growing challenges of predicting the increasing impact of wire load and delay and the performance penalty associated with supporting communication between different clock domains Asynchronous is binary signals, but there is no common or discrete time. Instead the circuits use handshaking between their components in order to perform the needed synchronization, communication and sequencing of operations. This difference gives asynchronous circuits’ inherent properties in the areas of lower power consumption, higher operating speed and robustness toward variations in supply voltage, temperature and fabrication process parameters, less emission of electromagnetic noise, better modularity, no clock distribution and clock skew problems. Low power consumption seems to be one of the most promising directions and the design reported in this study is one of these examples. Approach: In this study provide different solutions to these problems that the spectrum of existing asynchronous design technique support. It focuses on technique for fine grain two dimensional pipelining that yield ultra high speed at normal power supplies and very low energy at reduced power supplies. The new templates that provide significant performance improvements in quasi delay insensitivity. The key idea is to reduce the complexity of internal circuitry by intelligently reducing concurrency and using an additional wire for communication between pipeline stages. Results: In this study, Quasi Delay Insensitive RSPCHB template has been proposed to enhance the performance is faster with a maximum throughput of 970MHz than the previously designed system. Conclusion: The proposed model has been tested using HSPICE. The authors believe that the proposed design will provide a platform for designing high speed, low power digital circuits such as pipelined multiplier implemented in any application of digital signal processors


INTRODUCTION
Asynchronous circuits have a number of performance advantages over their synchronous equivalents, yet these advantages are rarely realized due to an overhead in the asynchronous design method (Singh and Nowick, 2007). This study investigates the Asynchronous logic is a hot topic due to its interesting features of high power, power savings, low noise and robustness to parameter variations, the handshake protocol strongly influences the performance of the pipelined architectures (Yahya and Renaudin, 2006). This approach includes a novel high-concurrent handshake protocol, with fewer synchronization points between neighboring pipeline stages than almost all existing asynchronous dynamic pipelining approaches. This study presents a methodology for improving the speed of high-speed adders. As a starting point, a previously proposed method, called "speculative completion," is used in which fast-terminating additions are automatically detected. Unlike the previous design, the method proposed in this study is able to adapt dynamically to (1) Application-Specific Behavior and (2) The latches worked on AMULET1 are level sensitive, so two-to four-phase converters are necessary in each latch controller. To avoid this transparency, an investigation has been carried out into four-phase micropipeline control circuits; this has terrified up several design issues telling to cost, performance and safety and forms a useful illustration of asynchronous design techniques. (Furber and Day, 1996). A robust asynchronous full adder design corresponding to early output logic, synthesized using the elements of a standard cell library is presented in this study. As the name suggests, the adder ensures the gate orphan freedom and neatly fits into the self-timed system architecture. In comparison with many of the indicating full adder designs, which can be embedded in the self-timed system, it is found. These design estimates correspond to simulation results of the 32-bit carry-ripple adder circuit; derived by targeting a highspeed 130 nm bulk CMOS process technology. Also, the proposed full adder facilitates a faster reset and the return-to-zero for the fundamental carry-propagate topology is achieved with only two full adder delays (Balasubramania, 2011). One important class of asynchronous circuits, which we consider in this study, is Quasi Delay-Insensitive (QDI) circuits. They are an interconnection of logic gates without any clock signal for sequencing and operate correctly regardless of gate delays (Josephs and Nowick, 1999). Due to the absence of a global clock reference, robust asynchronous circuits tend to have better noise and electro-magnetic compatibility properties compared to their synchronous counterparts (Folco et al., 2007). A QDI system is constructed as a collection of concurrent hardware modules (called processes) that communicate with each other through message-passing channels. These messages consist of atomic data items called tokens, which are usually multi-rail encoded each process can send and receive tokens to and from other processes through one-to-one communication by means of handshake protocols. Due to the lack of global clock and multi-rail encoded data communication, QDI circuits has the potential to achieve self-checking and halt the circuit in the presence of failures (LaFrieda and Manohar, 2009). Increasing the packing density of system VLSI system creates problems with clock distribution and clock skew. Utilizing self timed operands in VLSI design can reduce this problem. However, it is necessary to determine whether or not the additional logic used in asynchronous operands significantly, increasing the system computational time (Luderman and Albicki, 1992) With higher clock frequency, decreased feature sizes and increased transistor counts, clock distribution and wire delays present a growing challenge to the designers of singleclocked, global synchronous systems. It is becoming more and more difficult and expensive to distribute a global clock signal with low skew throughout a processor die. On the other hand, asynchronous circuits do not suffer such problem due to no global clock. This fact makes it more attractive for researchers to eventually abandon singly-clocked globally synchronous systems in favor of asynchrony making asynchronous circuits be an important topic in future digital VLSI designs. Adders are one of the key components in arithmetic circuits. Enhancing their performance can significantly improve the quality of arithmetic designs. This is the reason why the theoretical lower bounds on the delay and area of an adder have been analyzed and circuits with performance close to these bounds have been designed. In this study, we present a novel adder design that is exponentially faster than traditional adders.

Asynchronous channels:
An asynchronous communication channel is a bundle of wires and a protocol to communicate data between a sender and a receiver. The encoding scheme in which one wire per bit is used to transmit the data and an associated request line is sent to identify when data is valid is called single-rail encoding and is shown in Fig. 1. A protocol that can be sent by sender through channel, this is called a push channel, the opposite the receiver asking for new data is called pull channel. In both cases the directions of the request and acknowledge signals are reversed and the validity of data is indicated in the acknowledge signals from the sender to the receiver. The associated channel is called a bundled-data channel.
Alternatively, if the data is sent using two wires for each bit of information, the encoding is called a dualrail channel. Extensions to 1-of-N encoding also exist. Both single-rail and dual-rail encoding schemes are commonly used and there are tradeoffs between each. Dual-rail and 1-of-N encodings allow for data validity to be indicated by the data itself and are often used in QDI designs. Single-rail, in contrast, requires the associated request line, driven by a matched delay line, to always be longer than the computation. This latter approach requires careful timing analysis but allows the reuse of synchronous single-rail logic. The false and true dual rail inputs and outputs are L0 and L1, R0 and R1 respectively. Rack and Lack are activelow acknowledgments signals. We don't show any statistics that are required to hold state at the output of all C-elements. The operation of the buffer is as follows. The Buffer is reset, all data lines keep low and acknowledgment lines, Lack and Rack, are made high. When data comes from one of the input rails going higher, the respective C-element output will go low, lowering the left-side acknowledgment Lack. Once the data is propagated through inverters to the outputs, the right environment would assert Rack to low, acknowledging that the data has been received. Once the input is reset, the template raises Lack and resets the output. Since two distinct token cannot be put into hold by L and R channel the circuit is said to be a half buffer or has stack ½. This WCHB buffer has a cycle time of 10 transitions, which is significantly faster than other QDI pipeline templates. The validity and neutrality of output (R) imply the input (L) which is also a feature of WCHB. This kind of logic is called weak-conditioned logic and the same Logic is used in fine-grain a pipeline which was proposed by Theses logic after a detailed study of non-linear pipeline templates we shall discuss its advantages and disadvantages. Fig. 3 below shows the template for Pre-Charged Half Buffer (PCHB). The test for validity and neutrality is confirmed using an input end detector WCHB. This input end detector is denoted by LCD while its output end detector is denoted by RCD.

Fig. 2: WCHB
In order for the function block to evaluate before all the inputs have arrived, it need not be weak conditioned. On the other hand, the template only generates an acknowledgement signal Lack after all the inputs have arrived and output having been evaluated. The LCD and RCD are therefore combining using a C-element to generate the acknowledgement signal. Pointing out some aspects of this template; first, as the C-element is inverting the acknowledgement Signal is an active low signal. Second the Lack signal is after buffering using the inverters before being sent out usually, two other inverters are added to buffer the internal signal en that controls the function block. The Fig. 4 below shows the template for Pre-Charged Half Buffer (PCFB). The PCFB is more concurrent than the PCHB because its L and R handshakes reset in parallel at the cost of requiring an additional state variable.
Proposing a new pipeline template eliminating the need of the internal en signal of the PCHB template reducing the transition stack sizes in the function block. This new QDI pipeline template is referred to as a Reduced Stack Pre-Charged Half Buffer (RSPCHB) as shown in Fig. 5. It is noted that the RCD block is optimized by tapping its inputs before the output inverter and a NAND gate is used instead of an OR gate. RSPCHB template facilitates the removal of the internal enable signal by reducing concurrency that does not improve performance. In the PCHB templates the output of the LCD and RCD are companied using a C-element to generate acknowledgment signal Lack . As a result integration of the handshake protocol with the validity and neutrality of both input and output data is supported. This gets rid of the need for function block to be weak conditioned. Though requires the use of the en signal, this replacement introduces more concurrent than is necessary. In the case of a join, the non weak conditioned function block may generate an output when one of the input channels provides data. Therefore the RCD of the join asserts its output. On the other hand any subsequent stage can receive data, evaluate asserts both its LCD and RCD outputs and asserts an acknowledgment signal. Though the join can receive acknowledgment, it will not pre-charge until after en is asserted. After the acknowledgment to the input stages has been asserted the en signal delays the pre-charge of the circuit. This delay prevents the pre-charge from triggering the RCD to deassert. Further preventing the C-element from ever generating the acknowledgment. The en signal could be safely removed if generation of acknowledgment signals from any stage subsequent to the join has arrived and been acknowledged. The join is the performance bottleneck for the subsequent stages, delay of the acknowledgment would not impact performance. The advantage of RSPCHB is that the lack of an LCD and reduced stack size of the function block, which reduces capacitive load and yields significantly faster overall performance. The cost of this increase in performance is that it requires one extra communicating wire between stages.
Asynchronous pipeline adder: RSPCHB adder stage: Sum and Carry circuits are designed as per the RSPCHB template. The Fig. 6 and 7 shows the circuit of Full adder with Dual rail logic, precharge and enable signal required to pipeline the different stages Inputs and outputs are individually detected by an acknowledgement signal from the buffer. Latency of the pipelined asynchronous adder is 162ns.

RESULTS
The previous study (Jayanthi and Rajaram, 2012) different Asynchronous pipeline controller was selected and applied in order to reduce the power consumption due to introducing Latches in pipelines. The important inherent properties in the area of low power consumption, higher operating speed and no The clock skew problem is observed as major parameters to solve for growing asynchronous techniques analyzed by various asynchronous pipeline templates Fig. 8 and 9 shows the simulation and waveforms of Asynchronous pipeline adder using RSPCHB template.
Simulation and analysis of the pipeline adder result of PCHB asynchronous template and RSPCHB asynchronous template reveal that the RSPCFB has 10 transitions per cycle, less than PCHB template which has 12 transitions. The PCHB achieves a maximum throughput of 742MHz with a dynamic stack of 6.2. The RSPCHB is faster with a maximum throughput of 890MHz and a dynamic stack of 7.15. The throughput improvement is approximately 15%. For the full buffers, the PCFB achieves a maximum throughput of 687MHz and a dynamic stack of 2.6. The RSPCHB is faster with a maximum throughput of 970MHz and a dynamic stack of 4.8. The speed improvement is approximately 30%, however due to the C-elements in the forward path of the RSPCFB, the forward latency are about 10% slower. Additionally the RSPCHB provided with less power consumption of 0.082microwatts, grater in PCHB template which has the power consumption of 0.132 microwatts.

DISCUSSION
The Power Consumption is much reduced in the case of RSPCHB template compared to PCHB asynchronous pipeline adder template. Using about RSPCHB type pipelining protocol computation time will be reduced. Using reduced stack pre charged half bufer, we achieved higher dynamic stack. This means that our template supports more system-level concurrency and higher stage utilizationHence Figure  10 shows the performance of throughput have been increased in asynchronous pipeline adder using a RSPCHB template than PCHB asynchronous pipeline adder template.  Figure 11 shows the layout diagram of full adder with dual rail logic using RSPCHB template. This study described a concept based simulation performance. The simulation performance explained the physical layout design using microwind simulator. It gives how the diffusion layer intersected with polysilicon for creating a number of transistor using layout design rule.
The simulation result illustrates the fact that the device is in 0.12micrometer-6metal (1.20v) at 478.65ns timescale. That the proposed full adder enables reduction in latency by 20.7%, occupies a lesser area by 15.4% and features minimized average power dissipation by 8.6%. The voltage source inverter is adopted for the generation of injected voltage.