Power Consumption and Performance’s Library on DSPs: Case Study MPEG2

: The growing demand for portable electronic devices has led to an increased emphasis on power consumption within the semiconductor industry. As a result, designers are now encouraged to consider the impact of their decisions not only on speed and area, but also on power throughout the design space exploration. This article presents a high-level design space exploration methodology. It allows characterising a software application computing on TI-DSPs. The proposed approach exploits parametric models representing the consumption’s behaviour of both DSP’s architecture and algorithm. This consists in releasing the laws of consumption on a high level. This approach makes it possible to deduce the power and the energy consumption of a code in an advanced language for a given target. Feasibility and the interest of the approach were proved using MPEG2. This new approach was based on a functional level power analysis. The advantage of this approach was that the consumption and performance estimation can be made at a high level. Moreover, the proposed approach gives a detailed function-level characterization of the energy behaviour of application, enabling estimation of software energy consumption.


INTRODUCTION
The design of embedded systems on chip is a complex process, involving different steps at different abstraction levels.Design steps can be grouped into two major tasks: architecture design space exploration and selection of architectural platform, parameters components and architecture design.The overall design process must consider strict requirements, like time to market, system performance, power consumption and cost.Moreover, new applications such as laptop, wireless telecommunications are more and more increasing in the electronic domains.This type of applications integrates complex functionality, which requires powerful computations, while adding strong constraints on the system consumption [1][2][3] .
The availability of high performance System on Chip (SoC) devices is an important factor of the electronic market and has attracted significant research interest.Moreover, in order to maximize the operating time provided by the battery and to satisfy the real time constraint, we need high level model.So we can maintain performance, low power constraints as well as that of the battery and real time.
The objective of this work is to define an approach of MPEG2 consumption's estimation and modeling in the System level using FLPA.Initially, we point out the related works then the methodology "FLPA" (Functional Level Power Analysis).Then, we will introduce its transposition for MPEG2 in order to consider the algorithmic specifications.Then, we will introduce the different models suggested at different levels of abstraction for this application.Finally, we will discuss the results obtained compared to the physical measurements carried out on the development board and their benefits in the design.

Related work:
Hardware/software techniques to reduce energy consumption have become an essential part of current system designs.Extensive researches on power optimization from circuit level to system level have been conducted in these recent years.Such techniques have targeted the memory system due to the prevalent use of data signal and video applications, which focus on exploiting cache to reduce power consumption.The work in [4] presented an architecture-oriented power minimization approach.In fact, a power and performance simulation tool are used to do architecturelevel optimizations.A framework for describing the power behavior of system-level designs was proposed by [5] .
The availability of high performance application cores for System on Chip (SoC) devices, which make up these systems is an important portion of the electronic market and has attracted significant research interest.Moreover, in order to maximise the operating time provided by the battery and to satisfy the real time constraint, we need high level IP modeling.So we can maintain performance, low power constraints as well as that of the battery and real time.Some energy estimation tools and monitoring techniques can be presented.Energy simulators such as Wattch [6] and SimplePower [7] estimate the energy consumption in "reasonable" time [8] .
On RTL level, we can mention the DSP-PP [9] , a tool for simulation allowing the estimate of the power dissipated by DSPs.It is composed of two components: the simulator of performance on the cycle level (CPS) and the estimator of the dissipation of power (PDE).It is written in C++ making it possible to consider abstract models.The components of the DSP are modeled like objects integrating the model of consumption.DSP-PP considers the simulation on the cycles level of all the DSP's components: the ways of data and the interconnection and estimate the value of dynamic power, short-circuit of each component of the DSP.
Representative researches in measure-based estimation techniques are SES [10] and PowerScope [11] .SES is an energy-monitoring tool, which collects energy consumption data in a cycle-by-cycle resolution and maps the collected energy consumption data to program structure.PowerScope [11] is based on hardware instrumentation by using a digital multimeter with support of embedded operating system.Therefore, PowerScope is applicable to ordinary embedded systems.EPRO [8] employs measure-based estimation techniques used in SES and PowerScope.However, ePRO is distinct from SES because ePRO does not need any extra hardware module such as profile acquisition module in SES.
On algorithmic level, we can mention SoftExplorer [12] , a tool that can estimate the power and energy consumption of an algorithm directly from the C program, or from the assembly code.Estimation is based on a power model of the targeted processor, obtained through the FLPA methodology for some Texas Instrument's DSP.This methodology FLPA allows establishing a model of consumption at high level of a given processor.In this approach, the architecture of the processor is decomposed into various independent functional blocks.Each of these blocks is stimulated separately thanks to a scenario of instructions in assembly code so as to obtain the model of consumption.In addition, the increasing importance of the software/hardware part in these embedded systems requires the analysis of consumption on advanced level of the design.In fact, researchers have started investigating system-level tradeoffs and optimizations whose effects transcend the individual component boundaries.Techniques for synthesis of multiprocessor system architectures and heterogeneous distributed HW/SW architectures for real-time specifications were presented in [13] .These approaches either assume that all tasks are pre-characterized with respect to all possible implementations for delay and power consumption, or assume a significantly simplified power dissipation model.In [14] , separate execution of an instruction set simulator (ISS) based software power estimator and a gate-level hardware power estimator were used to drive exploration of tradeoffs in an embedded processor with memory hierarchy and to study HW/SW partitioning tradeoffs.
The objective of this work is to define an approach of consumption's estimation in the system level.For this reason, we do not propose models of consumption of the target architecture but a model of the algorithms themselves.Parametric models binding consumption (power and energy) with the architectural aspects of the target (DSPs) have been used.Here, we do not apply this method to architecture but directly to the algorithm in order to characterize its consumption.

MATERIALS AND METHODS
The estimation methodology (Fig. 1) is based on the functional analysis, the "FLPA" methodology allows us to develop a parametric model, which represents the consumption behaviour of a target.In fact, this functional analysis is composed of three steps.* Functional analysis [12] determines the effective parameters to take into account in the power model.
* Characterisation of each parameter is tuned to qualify the output variations either by measurements on board or by low-level simulation.Each parameter that does not impact significantly the characteristics is then discarded.* The general model is established according to the available parameters.The transposed methodology for Application is also composed of the same three steps (Fig. 2b).Thus, we can take account of the algorithmic specifications, in order to appraise the consumption at the algorithmic level according to the application parameter variations.This work is based on using 3 models of DSPs: Texas Instruments (C6201, C5510 & C6701) integrated in the tool SoftExplorer [12] .They exploit this methodology of functional analysis "FLPA".
In fact, this methodology starts from the extraction of the algorithmic, architectural and technological parameters, which have a direct influence on the consumption of the application (image size, resolution, number of images per second, computing precision, target, Frequency) The following stage consists in extracting the consumption variation according to each parameter extracted through estimates or measurements thanks to scenarios.Finally, the mathematical formulation of consumption laws according to these parameters.A confrontation of the models established with measurements on DSP board is possible in order to have an idea on the precision of these models according to the application parameters.
This Transposed methodology is applied for the study and modeling of the various MPEG2 blocks.In fact, MPEG2 encoder is based on many tasks: * The computing of the motion vector, * The displacement of the data of the image N using the vector of movement to build the predicted image * The comparison between the current image and the predicted one and emission of the vector and the prediction error.
In this work, we appraise the models suggested on the MPEG2 modules (Quantization, DCT, IDCT, Motion Estimation, Prediction…) written in C and which are in relation to the parameters influencing consumption.Through this study and thanks to Soft Explorer, a model of consumption is established according to these parameters (Fig. 3).The application parameters considered are algorithmic (the frame size, chrominance) and architectural (the clock frequency (30 MHz up to Fmax of the DSP) and type of the DSP used (C6201, C6701 & C5510).Indeed, this model is deduced by varying these parameters and by exploiting the estimation given by SoftExplorer for various frequency and targets.In fact, on the basis of measurements on boards and of these estimates, the consumption variations and time execution are deduced for many values of frequency.We expect that this information will provide a base for program optimisation for low power, which becomes

MODELS
The consumption models of MPEG2 modules (time, power and energy) are established for the three processors and they are given in Table 1 for the C6701 processor running on mapped mode.Those models are given for a frame of 8*8 pixels.Modules are : DCT, Inverse DCT, Intra & N-Intra Quantization, Inverse Intra & N-Intra Quantization and VLC (Variable Length Coder).Other important modules like motion estimation and prediction are modeled also but according to the image size, chrominance and frequency.The models error compared to SoftExplorer are less than 5.5%.Moreover the maximum error between the measures and model doesn't exceed 15% in and 7,5% in power.The execution time is inversely proportional to the frequency.This energy model is quasi-invariant for a fixed size of frame even if the frequency changes in particular for high frequency.Motion and prediction models are established using the height, the width, the chrominance and the frequency.For motion estimation, it is well noticed that the execution time model varies linearly with the size.
The chrominance has an important impact on the consumption, that's why the designer has better to minimize the chrominance, which influence the size of exchanged data, images quality and energy.Especially that the human vision is not very sensitive to the chrominance like luminance.
Figure 4 illustrates the execution time of the different modules of MPEG2 running on the DSP C6701 according to the frequency (100 MHz), frame size (128*128) and the chrominance (4:2:2).In this figure, motion estimation, VLC and quantization which utilize 74%, 6% and 7% of the time respectively are the three primary computationally-intensive components.
Physical measurements on development boards (C6701) are made in order to check the validity of the model established with SoftExplorer according to the parameters of MPEG2 application.These measures are established thanks to: * The evaluation platform « Code Composer » of TI, * The numerical oscilloscope, which provides the instantaneous current on the DSP core.To compare consumption between DSP, Figure 5 illustrates the energy consumption for quantization module in (C67, C55 and C62).Based on this figure, C62 is not adequate for the low power consumption like C55.  High level models MPEG2 models: Once all the necessary functions are modeled, the MPEG2 temporal models can be obtained by studying the task graph and cumulating all temporal the models of each function.For the power model, it is the weighted average power.
Let us take the case of the treatment of a GOP (Group Of Pictures) with images 4:2:2 (size 128*128).In each image, there are 256 blocks of size 8*8 and since the chrominance is generally of type 4:2:2, an additional 128*128 matrix for (Cr and Cb) will be to add to the treatment.In Table 2, we present the performance and consumption model of MPEG2 running on the DSP C6701.
The MPEG2 temporal models obtained are function of the image size, the number of GOP (Group Of Picture), the target and the frequency.Whereas, the power models are function of the DSP frequency.Thanks to this high level model, we can define the maximum size of the image or the frequency if the system is subjected to constraints.Moreover, to limit the space of solutions and to be able to choose effective and real time solutions, target architectures and high level parameters are considered.Furthermore, the availability of such a model is essential in developing a systematic optimization framework that considers different optimization criteria at the same time, execution time and energy consumption.

Video standards models:
The model of MPEG2 is established according to the image size, Number of GOP, Target and frequency.A general high level consumption model will be established by changing the granularity of modeling (from blocks 8*8 pixels to the standard PAL, SECAM and NTSC).These models will be based on the MPEG2 models (Table 3).
Through this methodology, we will be able to estimate the performance and consumption for video standard on a high level of abstraction and we will be able to choose the standard, which respect the system constraints (real time, consumption, cost…).This will also allow us to choose the best frequency for the application.With a higher level of abstraction, the number of objects in the design decreases.This allows the designer and tools to focus on the critical aspects and explore a larger part of the design space without being overwhelmed by unnecessary details.

CONCLUSION
Through this work, we have shown the interest and the feasibility of MPEG2 consumption modeling at a many high level.The models and the environment suggested make it possible to estimate the consumption of the application at the system level as a function of the frequency and the video parameters.It is necessary to have a reliable high-level estimate, which allows the designer not only to choose the most adapted processor to its application but also to regulate its parameters according to the constraints.
Such models can be used, for example, by an operating system, which could choose the parameters of the algorithm to respect the constraints of consumption according to the context.We would have thus an approach of power and energy management at the algorithmic level in order to carry out an adaptive control.This estimation was fit to curves with higherlevel parameters.Using the curves, we can then have an estimation of power vs. number of image size, for example.This higher-level modeling could be used in design exploration.A vendor of IP (intellectual property) could provide these higher-level models to the customer.
This work opens new possibilities in taking into consideration the consumption in the applications' design flow.So we can add new dimensions to solution selection, namely the guarantee of QoS (Quality of Service).In fact, Design space exploration allows analyzing the possible solutions to deduct the optimal solution according to a function of cost: performance, surface and power.So, the main conception parameters will be respected such as the application specification and its constraints.

Fig. 3 :
Fig. 3: MPEG2 Encoder increasingly important in embedded systems design.Furthermore, the availability of such a model is essential in developing a systematic optimization framework that considers different optimization criteria at the same time, execution time and energy consumption.

Table 2 :
High level consumption models of the MPEG2 running on