The Optimal Implementation of a Generator of Sinusoid

: The embedded systems occupy a place increasingly important in our daily life. In recent years much of research it is focused on the miniaturization of the various electronic devices. In this study we present the architecture of a digital generator of frequency based on trigonometrically algorithm to obtain sample high frequencies. We treat a comparative study between various processors CORDIC: Iterative, pipeline, pipeline without accumulator of phase and parallel. The objective of this study is to find the best sampling frequency and the throughput. This will allow the establishment of a generator of sinusoid. The results which we got show that parallel CORDIC is most powerful. This best algorithm increases the effectiveness of the digital modulator in order to optimization of system of data transmission to high-speed in telecommunication.


Introduction
The new technologies offer a rate of integration of several hundreds of millions transistors on same platinum. This technological change of integration allows the realization of the increasingly complex circuits. At the present time, one finds on the same circuit a set of components with knowing of the processors, memories, the material accelerators and the peripheral components. All of these components are interconnected by a communication network.
The Eighties knew the birth of very first Field Programmable Gate Array (FPGA). This component of programmable logic gathers a fixed number of basic logical blocks who can or not be used in order to from a specific circuit.
The role of FPGA circuits for the digital modulation of the signals present of many advantages, in particular the facility of unequalled adaptation in the event of change of protocol and the practically immediate validation of the features.
The establishment on this circuit depends on the frequency of clock and the quantity of the cells available. These limitations can be compensated while paralleling in an important way the treated algorithms and by using simple arithmetic functions. Who take a account of the architectural differences of these components.
Generally the digital modulation is composed of four principal parts which are the accumulator of phase, the coder, the generator of sinusoid and the converter (digital/analogical) with the low-pass filter.
The purpose of this study it is the optimized implementation of a generator of sinusoid based on the CORDIC algorithm. This implementation is done by Integrated Software Environment (ISE 12.2) is a software of description, simulation and Programming of circuits and digital systems on component programmable with also language of description of hardware VHDL (Very High Speed Integrated Circuit Hardware Description Language) intended to represent the behavior as well as the architecture of a digital electronic system which generates the greatest sampling frequency.
For comparison to test four techniques of CORDIC processor who are: Iterative, pipeline, pipeline without accumulator of phase and parallel. We will show the results between the different techniques in concluded that the parallel CORDIC is optimal because to generate better sampling frequency and the throughput.
In the following sections, firstly a bibliographical study of this field is presented. Secondly the stages of creation of the generator of sinusoid based on a processor CORDIC are detailed. Then the third section methodology will detail followed by the various techniques used. Then in the fourth section will contain the results and discussions. Finally we will finish with the conclusion and the proposal for a future work.

State of the Art
Several studies were conducted on the generator of sinusoid. These studies targeted measurements of the following parameters: Maximum frequency, sampling frequency, cell and throughput by different processor CORDIC (iterative, pipeline, pipeline without accumulator of phase and parallel) in order to find the sampling frequency optimal and the best throughput. Mandal and Mishra (2012) studied the reconfigurable design of a processor pipeline CORDIC for the digital generator of frequency. In using for the implementation the FPGA circuit spartan3-XC3S50pq208-5 of technology xilinx, the software ISE 10.1 and ModelSim for test bench simulation. This algorithm did not obtain good performances because generates in16 bits minimum power 0,096 Watt and also consumed 3% of the registers, 2% of the memoires of decoding (LUT) and 35% of the blocks of Inputs/Outputs (IOB). These results show that this processor is not satisfactory. Arnould et al. (2005) studied the generation of comparative study between ROM memory with compression, pipeline CORDIC and pipeline CORDIC by removing the stage of accumulator of phase. In using the FPGA circuit of the type AlteraFLEX10K200E-2, the maximum frequency announced by the manufacturer for this circuit is 150MHz, the software's: The ModelSim and Quartus. These techniques are not obtained good performances, because generates the sampling frequency lower, equal and a higher of frequency of clock, for example in 16 bits: • The sampling frequency of ROM memory with compression is equal 41 M-sample/s and 5543cells • The sampling frequency of processor pipeline CORDIC is equal 101 M-sample/s and 1430cells • The sampling frequency of processor pipeline CORDIC without accumulator of phase is equal 110 Msample/s and 980 cells Boudjema and Kaddour (2012) studied the comparative study between iterative CORDIC and pipeline CORDIC without accumulator of phase (with recoding of the angle). In using for implementation the software ISE 12.2, the language VHDL, software MATLAB and the FPGA circuit virtex 5 XC5LX110T.
The maximum frequency announced by the manufacturer is 200MHZ. The got results are bad in 16 bits of an iterative CORDIC: Sampling frequency 22,129 M-sample/s and cell 208 because the sampling frequency is lower than the clock frequency, this slower algorithm. The second technique it is pipeline CORDIC without accumulator of phase to generate a sampling frequency a higher of frequency of clock, this architecture is fast and powerful. The results in 16 bits: Sampling frequency 542,977 M-sample/s and cell 528. Mehra and Kamboj (2010) studied the implementation of pipeline CORDIC for the generator of sinusoid. In using the FPGA circuit spartan3-XC3S200-5ft56, software UISE 10.1 and ModelSim 6.3XE for test bench simulation. This algorithm is not obtained good results by report the work of (Mandal and Mishra, 2012), because in 16 bits generate minimum power 0.03942 watt and consumed 10% from the slices, 9% of the memoires of decoding (LUT), 21% of the blocks of Inputs/Outputs (IOB), 8% of slice flip flop and 12% of GCLOK. These results show that this processor is not sufficient.
All previous work treated the generator of sinusoid based on the ROM memory with compression, iterative CORDIC, pipeline CORDIC and pipeline CORDIC without accumulator of phase. They use the same goal because it is not generates of acceptable sampling frequency in the FPGA circuit. This frequency lower or is equal of frequency of clock, they are not satisfactory to generate an effective modulator nowadays, but we developed these architectures who generates the sample simultaneously, it is the parallel CORDIC. This faster algorithm and obtained better results because this sampling frequency higher than double of frequency of clock.

Creation of a Generator of Sinusoid
A digital modulator is composed of four principal parts which are: The accumulator of phase, the coder, the generator of sinusoid and the converter (digital/analogical) with the low-pass filter.

The Accumulator of Phase
The role is to produce the term of phase (wt + Φ). This term corresponds to the instantaneous phase of the sample of sinusoid produces by the modulator.
The general architecture of the accumulator of phase is presented on Fig. 1. It is made up of two adders and a register. The accumulator of phase is controlled by a control word of the frequency F cw corresponding to increment of phase applied to each period of clock to the system. F cw to determine the effective frequency resultants of the modulated signal, according to the frequency of clock f clk of architecture host and according to the Equation 1: The accumulator of phase also applies, if necessary, a phase shift Ф n when the modulation considered used this parameter to transmit in formations.
The K th word of phase produced by this stage is worth: The Coder Who allows converting the series of M-bits composing a symbol, to make them compatible with the selected type of modulation. For an amplitude modulation for example m a series of M-bits will be replaced by the corresponding value of the amplitude of the sinusoidal signal to produce at exit of the modulator.

The Converter (Digital/Analogical) with the Low-Pass Filter
Are the last elements of the data processing sequence. They convert the samples into adapted signals that can be hertzian or optical for example. It is the only part of architecture functioning in analogical mode.

Generator of Sinusoids
The exit of the accumulator of phase is connected to the most critical part of the digital architecture of modulation with knowing, the generator of sinusoids, who must calculate the cosine of the term of phase produced by the accumulating stage (Vankka, 1997). With the K th period of clock of the host system the value: Will be at exit of generator, where Φ k is the word of phase calculated previously by the accumulator of phase. This element is the essential in this article. It is the use of several techniques to obtain the best sampling frequency of a generator of sinusoid. It allows producing digital sample starting from the parameters that provide him the coder and the accumulator of phase.
Among these definitions most important for the creation of a generator of sinusoid it is algorithm CORDIC (Coordinate Rotatio DIgital Computer) is a calculation algorithm of the goniometrical and hyperbolic functions, in particular used in the computers. It was described for the first time in 1959 by jack (Volder, 1959;2000). This technique uses language VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) intended for the origin to document in a formal way the structure of the behavior of the integrated circuit. VHDL was described by the standard IEEE 1076 in 1987 (Polavarapu, 2013), then revised successively in 1993 and 2000. This program is established on the FPGA (field programmable gate array) circuit Virtex 5-XC5LX110T who belong to the family programmable components electronically (Xilinx, 2010). At the origin based on a network of elementary matrix AND/OR programmable Array Logic (PAL), these circuits programmable became in the middle of the nineties of the circuit more complex grace has the integration of the specific resources, associated memory and of the flexible inputs/outputs. Also it uses software ISE12.2 for the establishment. It is a description software, simulation and programming of circuit and digital systems on programmable components.
The continuation ISE allows: • The description of digital circuits in the form of logic diagrams, machines in finished states or in language of material description (VHDL, Verilog), compilation, behavioral simulation • The synthesis, the placement routing and implementation • It temporal simulation and analysis of timing, programming on the programmable circuits of Xilinx (CPLD and FPGA) It there are several parameters displayed in software ISE and we treat the size of throughput of a digital signal, we calls throughput binary the number of transmitting bits by second and depends on the physical characteristics of transmission media and the techniques used.

Methodology
Currently the best processor it is parallel CORDIC. This technique established on the FPGA circuit XC5LX110T of new technology Xilinx. We finds the samples in a shifted way to decrease the execution time thus increases the speed of operation and the throughput for example the results in 8 bits of sampling frequency is equal 2410 M sample/s and 19.280 M bits/s of throughput. In this part we present several techniques for the digital generator of sinusoid. The architectures established on the FPGA circuit and the use of VHDL language and software ISE12.2. We define these techniques as continuation.

Iterative CORDIC
This architecture composed of three elements necessary: A read-only memory for container values of the arctan 2 −k and also two combinative blocks X and Y, the first loaded to calculate the successive values of the coordinate of a sample (X k and Y k ) and the other to load with evaluation the angle (Hu, 1992;Duprat and Muller, 1993;Boudjema and Kaddour, 2012).
Coordinate (x k+1 , y k+1 ) in will see the system equivalent:

T = 0: No iteration
We loaded the block X by A. C N , the block Y by 0 and Z by angle Ө. With C N it is an error finished on the standard of the resulting vector ' V :

T = n iteration
We calculate the value of X and Y by the Equation 1 and the direction of rotation given by the sign of Z, but the value of Z is modified to reflect the rotation and finished calculation when Z = 0. It is necessary approximately (n+1) iterations to obtain the value of a sample of sinusoid with a precision of n bits. We measured the parameters like continuation: Sampling frequency is equal the maximum frequency on the number of cycles of clock.
Cells it is the number of logical block occupied on the FPGA circuit.
The throughput it is the number of bit in parallel multiplied by sampling frequency.
This architecture is slow because the sampling frequency is lower than the frequency clock.

Pipeline CORDIC
This architecture composed of a memory for container values of arctan 2 −k and (n+1) adder subtractions for each of the three ways of calculation (X,Y,Z) and also 2(n+1) shifts (Kang and Swartzlander Jr, 2003;2006;Mandal and Mishra, 2012).
For n bits it here is (n+1) stages, each stage of the pipeline is not loaded to carry out that only one microrotation, always the same one, whose only direction can be modified. i.e., the kth stage corresponds to the system of equation:

Pipeline CORDIC Without Accumulator of Phase
This technique use (n+1) adder subtracters for each of the three axes of calculation (X,Y,Z) and also 2(n+1) shift, not requiring an element of combinatory logic (Boudjema and Kaddour, 2012;Arnould et al., 2005). The kth stage corresponds to the system of equation: This can be done to note that a sufficiently small value x arctan x ≈ x: The same principle of pipeline CORDIC and the advantage is removed the stage of accumulator of phase and implemented this algorithm on FPGA circuit to obtain the sampling frequency higher of frequency of clock and the result which we get in 8 bits: Sampling frequency = 602.518 M-sample/s and the throughput = 4.820 M-bits/s it is satisfactory.

Parallel CORDIC
This technique uses (n+1)/n adder subtractions for each of the three axes of calculation (X,Y,Z) and also 2(n+1)/n shift, not requiring an element of combinatory logic.
n it is the number samples to send in parallel. The principle of this architecture who we focus in this study it is the parallel CORDIC is to allow the production of sample in a shifted way, it is necessary to modify the term of phase provided in a following way: We implant this algorithm on FPGA circuit we will see that the sampling frequency higher of frequency of clock. The results which we get in 8 bits: 2410.072 Msample/s and throughput: 19.280 Mbits/s. It is satisfactory and sufficient.
The parallel CORDIC is better technique by report other architectures because to solve all the problems to obtain from generator of sinusoid of small execution time thus increases the sampling frequency and the throughput, it is purpose of work which we concentrate in this field.

The Maximum Sampling Frequency
The Occupied Logic Cells

The Sampling Frequency Depending on the Number of Logic Cells Occupied
We summarize all the methods in Table 5 in 8 bits for example.

Discussion
We implemented the various techniques on the FPGA circuit of destination of the type Xilinx Virtex5-XC5LX110T and the maximum frequency announced for the manufacturer in this circuit is 200 MHz and uses software ISE 12.2 and also language VHDL after plotting the curves of sampling frequency, the cells and throughput by software MATLAB, we will see all that on the following figures. The problem which we treat in this study it is an architecture who to generate the sampling frequency large and higher than double of frequency of clock on this FPGA circuit.
Firstly we use the generator implementation of sinusoid based on iterative CORDIC. This slower architecture who obtained bad results of the sampling frequency in various bits (Fig. 1), the cells (Fig. 2) and also throughput (Fig. 3), because this algorithm uses several cycle to generate one sample. We remark that this sampling frequency is lower than the clock frequency (f sample <f clk ). We will see it's entire on Table 1. We take the results in 8 bits like continuation: f sample=maximum frequency on the number of bit is equal 47.307 M-sample/s, he cells it is the number of the logical blocks occupied in FPGA is equal 106 cell and the throughput= sampling frequency produced the parallel number of bits is equal 0.378 Mbits/s. Then we used processor CORDIC pipeline for increases the sampling frequency and the throughput because the role of this algorithm it has each cycle of clock produced one sample (f sample =f clk ) and results in various bits displayed in Table 2, its insufficient. For example in 8 bits: f sample=104.45 M-sample/s, cells=130cell and throughput = 0.835 Mbits/s. Then we removes the stage of accumulator of phase of pipeline CORDIC, this better algorithm compared to other architectures because the sampling frequency higher of frequency of clock thus increases the throughput and the sampling frequency (Table 3).
Finally the essential in this study it is the establishment of a parallel CORDIC to generate the samples in a shifted way. The later increases the sampling frequency of clock (Table 4). We interest in this architecture because only who allows reaching sampling frequency acceptable on FPGA and obtaining the best result in various bits by report other architectures.  In 8 bits: f sample=2410.072 M-sample/s and throughput =19.280 M-bits/s. The generator of sinusoid based on a processor CORDIC was established on a FPGA of the type XC5LX110T and the result of the synthesis are presented on the Fig. 2-5. To provide the bases of comparison, other architectures of generator of sinusoid were also established on the same type of FPGA: A powerful architecture using an iterative CORDIC, pipeline, pipeline without accumulator of phase and we concentrate on a parallel CORDIC.
For the software presented in this work is used to generate descriptions VHDL automatically adequate. The Fig. 2 present the sampling frequency reached by the various systems, then modified the simple structure of the algorithm, near to that of the logic elements composing a FPGA, explain the good performances obtained. We'll see all the sampling frequency of different architectures they have an inverse relation with the bits: • Bit increases ® sampling frequency decrease • Bit decrease ® sampling frequency increases We note through the curve that parallel CORDIC Obtain best sampling frequency by report other algorithms.
Finally Fig. 5 exposes the throughput reached according to the binary resolution of the sinusoid. The architecture parallel CORDIC shows its interest by having best throughput reached compared to the binary resolutions of the sinusoid.
Results of four architectures presented in this part are summarized on the table above, thus the best powerful in 8 bits it is the parallel CORDIC because the higher sampling frequency is double of frequency of clock as well throughput optimal. It is my purpose of work.

Conclusion and Prospects
We presented in this study an architecture generation of sinusoid using a CORDIC processor to fulfill the requirements of an optimal establishment on a digital circuit FPGA, the resulting architecture authorizes the effective use of a parallel CORDIC several stage. This multiple architecture the number of effective samples produced in only cycle of clock. The positive points of this technique is the sampling frequency double of frequency of clock and the throughput optimal to increase effectiveness of the modulator.
In the future, we treat the parallel architectures of several stages of a digital demodulator of frequency and phase based on shift register with linear feedback (LFSR). In also established on the same type of circuit and we work the complete architecture digital of a modulator and demodulator.