A CASE FOR HYBRID INSTRUCTION ENCODING FOR REDUCING CODE SIZE IN EMBEDDED SYSTEM-ON-CHIPS BASED ON RISC PROCESSOR CORES

Embedded computing differs from general purpose computing in several aspects. In most embedded systems, size, cost and power consumption are more important than performance. In embedded System-on-Chips (SoC), memory is a scarce resource and it poses constraints on chip space, cost and power consumption. Whereas fixed instruction length feature of RISC architecture simplifies instruction decoding and pipeline implementation, its undesirable side effect is code size increase caused by large number of unused bits. Code size reduction minimizes memory size, chip space and power consumption all of which are significant for low power portable embedded systems. Though code size reduction has drawn the attention of architects and developers, the solutions currently used are more of cure than of prevention. Considering the huge number of embedded applications, there is a need for a dedicated processor optimized for low power and portable embedded systems. In the study, we propose a variation of Hybrid Instruction Encoding (HIE) for the embedded processors. Our scheme uses fixed number of multiple instruction lengths with provision for hybrid sizes for the offset and the immediate fields thereby reducing the number of unused bits. We simulated the HIE for the MIPS32 processors and measured code sizes of various embedded applications of MiBench and MediaBench benchmarks using an offline tool developed newly. We noticed up to 27% code reduction for large and medium sized embedded applications respectively. This results in reduction of on-chip memory capacity up to 1 mega bytes that is very significant for SoC based embedded applications. Considering the large market share of embedded systems, it is worth investing in a new architecture and development of dedicated HIE-RISC processor cores for portable embedded systems based on SoCs.


INTRODUCTION
An embedded system is not a general purpose computer. Instead, it is a preprogrammed system to perform one or more dedicated functions. In most embedded systems, size, cost and power consumption are critical than performance (Hennessy and Patterson, 2012). A large number of embedded systems such as cellular phones, cameras, toys are portable and battery operated and their design is based on System-on-a-Chip (SoC). As applications become increasingly complex, code memory consumes a large portion of the area in SoC architectures. Apart from increased chip space and cost, the power consumption also increases due to larger code memories. Hence minimizing code size is an essential requirement in Battery Operated Portable Embedded Systems (BOPES). In the study, we deal with reduction of code size at processor Instruction Set Architecture (ISA) level so that the code generated by the compiler is shorter.

JCS
The RISC processors such as ARM and MIPS are widely used in the embedded SoCs, due to high performance offered by the RISC Architecture. The Fixed Instruction Encoding (FIE) of RISC processors helps in simpler instruction decoding and easy pipeline design (Hennessy and Patterson, 2012). But the FIE increases the code size as some fields are either unused or underutilized in several instructions. In embedded SoCs, the code memory is integrated with the processor and the other system hardware on a single chip (Fisher et al., 2005). This limits the available space for the application memory for the SoC architectures. Although embedded systems typically cost far less than desktop computers, several billion embedded SoCs are sold annually compared to a few hundred million desktop processors (Vahid and Givargis, 2006). Our paper proposes replacing the 'uniform instruction size' feature by 'hybrid instruction size' in the embedded RISC cores used in BOPES so as to reduce the code memory space, for embedded programs.
The main contributions of this work can be summarized as follows. The study proposes replacement of FIE with Hybrid Instruction Encoding (HIE) with two modifications to RISC Architecture: Multiple instruction sizes and hybrid lengths for the offset and immediate fields. We designed a HIE-ISA for the MIPS processor as a modification to MIPS32 ISA for evaluating the HIE-ISA and developed an offline tool, that converts the object codes from MIPS ISA to HIE-ISA. This tool measures the code size savings for embedded applications in MiBench and MediaBench benchmark suites.

RISC Instructions and Code Density
RISC processors generally have three types of instructions: ALU, Load or store and Branch and Jump (Hennessy and Patterson, 2012). Figure 1 summarizes the basic formats of MIPS32 integer instructions (other than floating-point instructions) with examples. All the instructions are 32-bits and the most significant 6 bits contains the opcode. In the I-type and J-type instructions, the opcode itself indicates the exact operation. In the R-type instructions, the op field identifies the instruction type and the fn field (least significant bits 0-5) indicates the exact operation. The R-type is for register-to-register operations. The I-type is for data transfers, branches and immediate operations. In load/store type instructions, the offset field is added to the contents of the rs register, usually an address, to form the effective address for one of the operands, either the source or the destination. The major drawbacks of RISC instruction formats causing increased code size are listed below using MIPS32 as example.
Five bits are unused in most R-type instructions as illustrated in Fig. 2 for the and instruction.
In most immediate type instructions, 8 bits are sufficient for the operand and the remaining 8 bits are redundant. Figure 3-6 illustrate the four different cases of immediate field patterns out of which only in one case, both bytes of immediate field are non-zero.
In branch instructions such as beq, the offset field is underutilized in those cases where the offset required can be specified in 8 bits.

Related Work
There have been significant efforts at system design level to compensate for the code size increase caused by the FIE., Several techniques (Heikkinen et al., 2009) have been implemented to minimize the object code. These are classified into three types (Xie et al., 2006), Offline Code compression, Compiler techniques and ISA modification. The first two techniques retain the original ISA but require software/hardware additions by the system developers, whereas the third technique involves supporting a new instruction set that is a subset of the original ISA.
In ISA modification cases, such as ARM Thumb and MIPS16, the original ISA is modified with shorter instructions, limited instruction set, smaller operand fields and fewer GPRs. This results in code size reduction by 30 to 40%, but reduces performance by 15 to 20% (Bonny and Henkel, 2008) and also requires a decoder and de-compressor inside the processor to support both ISAs. The other drawback of this approach (Benini et al., 2004) is the performance penalty caused by lack of several instructions in the dense instruction set. This approach customizes the existing RISC instruction set architecture with narrow instructions supporting fewer operations, smaller operand fields and fewer registers. A variation of this approach is used by microMIPS (ITGPLC, 2009;2010) that is a recent addition to MIPS architecture. It offers a new ISA that supports both 16-bit and 32-bit instructions in a single program. However, its new instructions have restrictions on using certain registers. Some of the 16-bit microMIPS instructions can access only 8 of 32 GPRs. RISC-V project at University of Berkeley is somewhat similar to microMips architecture permitting 32-bit base instructions and 16bit extensions of compressed instructions. It hopes to achieve up to 30% savings in static and dynamic memory space. Though the researchers term it as variable instruction decoding, it offers a two instruction length feature similar to microMIPS.      6. Format of addiu instruction with both bytes of immediate field containing non-zero value A mixed approach is also followed (Bonny and Henkel, 2008) by re-encoding unused bits in the instruction format for a specific application, using Huffman Coding algorithm. The compressed code and the decoding table are stored in the code memory. During execution of the program, a hardware decoder Science Publications JCS external to the processor decodes the compressed instructions.
The study presents an architectural solution that is application independent and recommends fixing the length of various instructions to 1, 2, 3 or 4 bytes instead of uniform size of 4 bytes. Though compiler and processor modifications are required to existing RISC architectures, these are one time efforts by the processor manufacturers/compiler developers and there is no burden on embedded system developers as required in other approaches. Also, it is a program independent solution for embedded applications. However, this strategy does not prevent inclusion of other methods for achieving additional amount of code size reduction for specific applications.
The organization of the study is as follows. Section 2 discusses the behavior of RISC processors for embedded applications and describes the HIE-ISA proposed by us for BOPES as a modification to existing MIPS32 ISA. Section 3 describes the architecture of the offline tool developed by us for static simulation of HIE-ISA and details the experiments carried out with this tool using MiBench and MediaBench applications for comparing the object code sizes for MIPS32 ISA and the proposed HIE-ISA. Section 4 discusses the results. Section 5 presents conclusions.

Behavior of Embedded Applications on RISC Processors
In order to estimate the extent of wastage in RISC object codes, we analyzed the MIPS32 object codes (Patterson and Hennessy, 2008) for the embedded benchmark suits, MiBench and MediaBench. The MiBench (Guthaus et al., 2001) is a set of benchmark programs in C, for six embedded applications: Automotive and Industrial control, Consumer Devices, Office Automation, Networking, Security and Telecommunications. Table 1 lists the MiBench programs used by us for evaluating the HIE for MIPS32. Typical applications of Automotive and Industrial Control are air bag controllers, engine performance monitors and sensor systems. These benchmarks perform mathematical, calculations, bit counting, sorting and image recognition. The typical examples of consumer devices are scanners, digital cameras and Personal Digital Assistants (PDAs).
The benchmarks mainly consist of multimedia applications with the representative algorithms for jpeg encoding/decoding, image color format conversion, image dithering, color palette reduction, MP3 encode/decoding and HTML typesetting. The typical examples of network devices are switches and routers. The work done by the embedded processors in these devices involves shortest path calculations, tree and table backups and data input/output.
The algorithms used in these benchmarks are finding a shortest path in a graph and creating and searching a Patricia trie data structure. There are some benchmarks common to network, security and telecommunication classes. The Telecommunications benchmarks have algorithms for voice encoding/decoding, frequency analysis and checksum calculation. The Office applications are primarily text manipulation algorithms.
The typical examples of office automation are printers, fax machines and word processors. The security benchmarks have algorithms for data encryption, decryption and hashing.
The MediaBench suite (Lee et al., 1997) is composed of multimedia applications. MediaBench 1.0 contains 19 applications collected from image processing, communications and DSP applications. Certain applications such as jpeg and gsm are common to MiBench and MediaBench suites. A short note on the selected applications in MediaBench suite is given in Table 2.
We cross-compiled the MiBench and MediaBench programs on Intel PC and analyzed the compiler output using our tool MIDACC, an offline code analyzer tool. Given a MIPS object code, this tool produces the instruction count for each instruction type. It also analyzes the utilization of the offset and immediate fields in the instructions and lists extent of wastage in terms of percentage of total program size. Analysis of MIPS object codes using this tool reveals two interesting behaviors.
Four instructions, addu, addiu, lw and sw, dominate the embedded programs consuming as high as 60% of the code, as shown in Fig. 7. Applying 80-20 rule, any technique to improve the density of these four instructions will reduce the code size.
The extent of wastage due to underutilization of the offset and immediate fields varies from 10 to 20% of the code size (Table 3) for the embedded applications.
The largest program of Automotive applications of MiBench suite is the susan occupying 51,000 bytes of memory. It is an image recognition package used for a

Science Publications
JCS vision based quality assurance application. Figure 8 shows that in susan, the immediate/offset field is fully used in 80% of the cases only. This amounts to wastage of 10,200 bytes, i.e., 20% of the code memory. Our proposed HIE-ISA for MIPS processor gives overall code reduction of susan and mpeg2 by 27 and 21% respectively due to hybrid instruction length feature and hybrid length provision for the offset and immediate fields. Though the HIE-ISA does not eliminate the wastage totally, it minimizes the wastage to a major extent. The extent of code size reduction achieved with HIE-ISA is also indicated in Table 3. A program for public key encryption and authentication EPIC An experimental image compression utility. The compression algorithms are based on a bi-orthogonal critically sampled dyadic wavelet decomposition and a combined run-length/Huffman entropy coder. The filters have been designed to allow extremely fast decoding without floating-point hardware ADPCM Adaptive differential pulse code modulation is one of the simplest and oldest forms of audio coding

HIE-Methodology For MIPS32
Our goal for the HIE-ISA is minimizing unused fields within instructions and improving the utilization of the offset and immediate fields. Based on our analysis of all the 66 integer instructions of MIPS32 ISA, we finalized on 8 different types of instructions for the HIE-MIPS ISA.

HIE-MIPS Instructions
For the HIE-ISA, we decided on four sizes for the integer instructions: Three 8-bit, seven 16-bit, twenty one 24-bit, three 32-bit and thirty two with three options: 16/24/32 bits. Figure 9 shows the proposed instruction formats for HIE-MIPS. To evaluate the effectiveness of our proposed HIE-ISA, we modeled it for the MIPS32 ISA. Basically, for every integer instruction of MIPS32 ISA, we provide an equivalent HIE-ISA instruction. Out of 66 integer instructions, j, jal and break, are retained as 32 bits as in MIPS32 ISA. The remaining instructions are translated into one of the 8 types. In several ALU instructions, there are five zeroes. If three more bits are made free, these instructions can be reduced to 24 bits. Hence we reduced the register fields by 1 bit each. This restricts the number of GPRs to 16; however, it will not strain the compiler as graph coloring technique for register allocation works satisfactorily for 16 GPRs, (Hennessy and Patterson, 2012). Popular RISC Processors such as

JCS
ARM and SH4 have only 16 registers. In addition to reducing the length of register fields, the shift amount (sa) field (used in the shift instructions) is reduced by 1bit. The nop, rfe and syscall are 8-bit instructions with a common opcode and a 2-bit iid field to identify the instruction. The 16-bit instructions are jr, mfhi, mflo, mthi, mtlo, mfcz and mtcz. In mfcz and mtcz, the rd field is retained as 5 bits since it refers to coprocessor registers. The iid bit differentiates between mfcz and mtcz. The mthi, mflo, mthi and mtlo have a common format and the register field is shared between rd and rs. In mfhi/mflo/mthi/mtlo format, the rd/rs field denotes rd for mfhi and mflo. For mthi and mtlo, it denotes rs.
The 24-bit instructions that form three different Rtypes, are add, addu and div, divu, mult, multu, nor, or, sll, sllv, sra, srav, srl, srlv, sub, subu, xor, slt, sltu and jalr. In type1, there is no sa field. In type2, there is no rs field. In type3, there are four zeroes to maintain byte alignment. The remaining 32 instructions have three length options: 16, 24, or 32 bits. The offset and immediate fields are encoded in a unique way in our proposal. If the value of the offset/immediate is zero, these fields are omitted. When one of the bytes in the offset/immediate is zero, that byte is omitted and the hybrid length identifier hl is formed. Table 4 shows a typical example using hexadecimal notation. All the four cases have a common opcode.

Mapping MIPS32 ISA to HIE-MIPS
MIPS Instructions are converted into new HIE instructions of 8 different types and the conversion depends on the opcode and immediate/offset fields. Table  5 indicates the length of each converted instruction. All unconverted instructions are retained as 32 bits.

HIE-Simulator Tool-MIDACC
We developed a standalone software tool for simulating the HIE for MIPS32 and measuring the code size reduction. Since we need to simulate a new ISA, it will be a complex process if we were to use any existing simulator for the HIE-MIPS. Our objective is not executing any program, but measuring static code sizes of HIE-MIPS, for various embedded applications and comparing with static code sizes of MIPS32 for the same applications. Hence we decided to develop a simple offline tool that can convert the object codes of MIPS32 into object codes of HIE-MIPS. We built the tool, MIPS Instruction Distribution Analyser And Code Converter (MIDACC), in C#, with twin functions: Code analysis and code compression. The first module performs analysis of given MIPS32 object code and identifies the distribution of 66 integer instructions in the object code. This module also provides details on utilization of the immediate and offset fields by different instructions in the application programs (Table 3 and Fig. 8). The second module is a code converter that converts each instruction in the object code, from MIPS32 ISA to HIE-MIPS ISA, as per the HIE-MIPS methodology. The integer instructions of MIPS32 are converted into nine groups in HIE-ISA ( Table 5). The software tool was developed under Windows XP on Intel PC and occupies 2 MB memory and runs in. NET Framework 3.5.

Estimating WASTIO Percentage
WASTIO refers to wastage due to unused (underutilized) bits in immediate and offset fields in the MIPS32 code. The wastages are classified into four types A, B, C and D based on the number of immediate/offset bytes that are redundant in the code; type A: 2 bytes wastage; type B: 1 byte wastage of,least significant byte; type C: 1 byte wastage of,most significant byte; and type D: no wastage. WASTIO percentage is calculated using the formula below: WASTIO percentage = 100 × (WASTIO/code size) We observed varying extent of reduction for embedded programs as shown in Fig. 10. Since certain applications contain multiple benchmarks, the figures use geometric means for the reduction percentages.

DISCUSSION
It is observed that the Automotive and Consumer applications gain maximum with HIE-ISA; the mpeg2 and the office applications gain least. The other applications get medium reduction. The Automotive and Industrial Control benchmarks show reduction varying from 21 to 27%. The image recognition program, susan, gets best reduction and the basicmath program gets the least reduction. Though 65% of susan code consists of the four major instructions, the poor result for the Automotive group is due to basicmath in which the four major instructions form only 33% of the code. The consumer benchmarks, get reasonably good reduction. The jpeg gets maximum reduction whereas the lame gets the least. The network benchmarks, dijkstra and patricia, get equal amount of reduction. In both benchmarks, 59% of the code is made up of the four common instructions. All the Telecommunications benchmarks undergo almost equal extent of reduction. In the office automation benchmarks, the rsynth and stringsearch programs get the maximum reduction and the ispell the least. The security benchmarks get medium reduction. The MediaBench programs also get medium reduction.
There is a wide variation in the sizes of the benchmark programs. Out of the 23 embedded applications, four are small (≤ 8KB), one is medium (8KB-32KB) and eighteen are large (≥ 32KB). Table 6 summarizes the extent of code size reduction by HIE for the 23 benchmark programs classified according to their sizes. It is obvious that HIE offers satisfactory extent of code reduction for majority of the embedded applications. A relationship is found between the code size reduction in HIE-MIPS and two properties of MIPS32 object codes: One is quantum of four major instructions and the other is percentage underutilization of immediate and offset fields. This is visible from Fig. 11.
It is noticed from Fig. 11 that code size reduction is higher for those programs that have higher amount of Science Publications JCS four major instructions and higher amount of under utilization of immediate and offset fields. This behavior forms the backbone of our HIE methodology. However, there are marginal exceptional behaviors by some programs. For instance, the sha has only 42% of four major instructions and only 18% of the code is wasted due to under utilization of immediate and offset fields. However, there are marginal exceptional behaviors by some programs. For instance, the sha has only 42% of four major instructions and only 18% of the code is wasted due to under utilization of immediate and offset fields. In spite of this, there is 23% code size reduction with HIE for the sha. This could be due to increased number of R-type instructions in the MIPS32 code for the sha. These instructions have been reduced to 24 bits in the HIE-MIPS code.
The instruction fetch and decode logics need to manage hybrid instruction lengths and multiple sizes of offset and immediate fields. These hardware changes do not need much additional space in the processor. However, reduced number of registers in HIE-RISC saves chip space. The processor itself occupies lesser area than the on-chip memory in embedded SoCs and hence the HIE reduces the overall chip area for SoCs. The study has estimated the static code size reduction for HIE based ISA and dynamic simulation is to be done for evaluating performance and power consumption. Marginal performance reduction can be tolerated for BOPES in view of savings in chip space and power consumption.

CONCLUSION
In The study, we have proposed a modified Hybrid Instruction Encoding in place of Fixed Instruction Encoding so as to reduce the code memory size in SoCs. We have established that four major instructions dominate the embedded applications occupying up to 65% of the code and up to 20% of the code size is wasted due to underutilization of the offset and immediate fields. This is in addition to wastage due to unused bits in other fields of the instructions.
An HIE-ISA has been proposed for RISC processors supporting multiple instruction sizes and four options for immediate and offset fields. We simulated HIE with four instruction sizes for MIPS32 processor and the results show code size reduction up to 27%. We experimented with twenty three benchmark programs collected from MiBench and MediaBench suites, using an offline static simulator developed by us. We noticed that except for two programs all others got reduced by more than 20%. Whereas two large programs got reduced by more than 25%, only two large programs got reduced by less than 20% in HIE code and the remaining 14 large programs got reduced between 20-25%. Considering the significant extent of savings in code memory and chip space in SoCs, we recommend development of dedicated HIE-RISC processor cores for the embedded market.