Single Core Hardware Module to Implement Partial Encryption of Compressed Image

: Problem statement: Real-time secure image and video communication is challenging due to the processing time and computational requirement for encryption and decryption. In order to cope with these concerns, innovative image compression and encryption techniques are required. Approach: In this research, we have introduced partial encryption technique on compressed images and implemented the algorithm on Altera FLEX10K FPGA device that allows for efficient hardware implementation. The compression algorithm decomposes images into several different parts. We have used a secured encryption algorithm to encrypt only the crucial parts, which are considerably smaller than the original image, which result in significant reduction in processing time and computational requirement for encryption and decryption. The breadth-first traversal linear lossless quadtree decomposition method is used for the partial compression and RSA is used for the encryption. Results: Functional simulations were commenced to verify the functionality of the individual modules and the system on four different images. We have validated the advantage of the proposed approach through comparison, verification and analysis. The design has utilized 2928 units of LC with a system frequency of 13.42MHz. Conclusion: In this research, the FPGA prototyping of a partial encryption of compressed images using lossless quadtree compression and RSA encryption has been successfully implemented with minimum logic cells. It is found that the compression process is faster than the decompression process in linear quadtree approach. Moreover, the RSA simulations show that the encryption process is faster than the decryption process for all four images tested.


INTRODUCTION
The rapid growth of image and video communication nowadays is powered by ever-faster systems demanding greater speed and security. Realtime secure image and video communication is challenging due to the processing time and computational requirement for encryption and decryption. In order to cope with these concerns, innovative image compression and encryption techniques are required.
Although a vast number of compression and encryption algorithms exist, they have been traditionally developed independently of each other. A partial encryption scheme for images that takes advantage of the image compression algorithm has been proposed by Liu et al. (2011); Cheng and Li (1996; and Cheng (1998). The scheme makes use of a compression algorithm that decomposes an image into several different parts. A secure encryption algorithm is then used to encrypt only the crucial parts, which are considerably smaller than the original image. This will result in significant reduction in processing time and computational requirement for encryption and decryption.
Other researchers have also proposed partial encryption, or combined compression and encryption methods (Liu et al., 2011;Ahmed, 2010;Akter et al., 2008a;2008b;Reaz et al., 2006a;2007a;Tho et al., 2004). Dang and Chau (2000) has proposed the joint image compression and encryption scheme using Discrete Wavelet Transform (DWT) and Data Encryption Standard (DES). Jakobsson et al. (1999) developed a "Scramble All, Encrypt Small" technique that encrypts only a small block of an arbitrarily long message. However, the former is less efficient than a partial encryption scheme and utilizes an encryption algorithm (DES) that is no longer secure. The latter requires an ideal hash function that is hard to realize and may not be suitable for images as it was designed for data encryption. In another work, Prasad and Kurupati (2010) proposed a combination of Arnold scrambling and DWT for secured image compression. But Arnold scrambling alone is not sufficient enough to provide significant security with the implementation of RSA (Wei et al., 2009).
Traditionally, image compression and encryption algorithms have been restricted to the software realm and developed separately. Although the advantages of software are ease of update, flexibility and portability, hardware implementation is faster and more physically secure, especially when secret key storage security are concerned.
The Field-Programmable Gate Arrays (FPGA) offers a potential alternative to speed up the hardware realization (Marufuzzaman et al., 2010;Reaz et al., 2007b). From the perspective of computer-aided design, FPGA comes with the merits of lower cost, higher density and shorter design cycle (Choong et al., 2005). It comprises a wide variety of building blocks. Each block consists of programmable look-up table and storage registers, where interconnections among these blocks are programmed through the hardware description language (Reaz et al., 2004a;Reaz et al., 2003). This programmability and simplicity of FPGA made it favorable for prototyping digital system. FPGA allows the users to easily and inexpensively realize their own logic networks in hardware. FPGA also allows modifying the algorithm easily and the design time frame for the hardware becomes shorter by using FPGA (Choong et al., 2006;Ibrahimy et al., 2006).
This study aims to investigate the hardware feasibility and performance of a novel partial encryption scheme for compressed images using FPGA by means of using a standard hardware description language VHDL. The use of VHDL for modeling is especially appealing since it provides a formal description of the system and allows the use of specific description styles to cover the different abstraction levels (architectural, register transfer and logic level) employed in the design (Pang et al., 2006;Reaz et al., 2006b). In the computation of method, the problem is first divided into small pieces; each can be seen as a submodule in VHDL. Following the software verification of each submodule, the synthesis is then activated. It performs the translations of hardware description language code into an equivalent netlist of digital cells. The synthesis helps integrate the design work and provides a higher feasibility to explore a far wider range of architectural alternative 2004b).
The FPGA implementation combines compression and a secure encryption algorithm that encrypts only crucial parts of the compressed image. The algorithms chosen for implementation are the lossless quadtree compression and the RSA algorithm. The hardware implementation was done using Altera FLEX10KE device.

MATERIALS AND METHODS
The partial encryption scheme depends on a compression algorithm that decomposes the input image into a number of different logical parts. The output consists of parts that provide significant amount of information about the original image, referred to as the important parts. The remaining parts have little meaning without the important parts, hence known as the unimportant parts. In this partial encryption approach, only the important part needs to be encrypted by a secure encryption algorithm. When the important part is considerably smaller than the total output of the compression, the encryption and decryption time can be reduced significantly.
Quadtree compression: The quadtree decomposition method converts an image into a quadtree structure with intensity values attached to the leaf nodes of the tree. The quadtree structure reveals the outline of objects in the original image (Cheng and Li, 2000). Since the quadtree indicates the location and size of each homogeneous block in the image while the intensity values do not reveal much information, partial encryption is possible by encrypting only the quadtree structure. Here, the quadtree structure is the important part whereas the intensity values form the unimportant part. In the case of lossless compression on a b-bit image, the total size of the leaf values is b(3k +1) bits, where k is the number of internal nodes, which is equivalent to multiplying the size of each leaf value with the number of leaf nodes in the quadtree. An approximate upper bound on the relative quadtree size, which is the ratio of the size of the quadtree and the total size of the compressed image, is given in Eq. 1: 3b 4 3b 4 k Where size of quadtree = number of nodes = 4k +1 size of compressed image = size of quadtree + size of leaf values = 4k +1+ b(3k +1).
For 8-bit images, b = 8, the size of the quadtree relative to the lossless quadtree compression output is at most 14.3%. The approximation is valid for large value of k, which is typically at least 1000 for 256×256 images and greater for larger images. For lossy compression, this calculation is not applicable because variable number of bits is used to represent leaf values. Results collected from experiments performed by Cheng (1998) on test images show that for typical images, the relative quadtree size is between 13 and 27%. Therefore, only 13-27% of the output of lossy quadtree algorithm is encrypted for typical images.
The lossless quadtree compression algorithm with Leaf ordering II has been used in this research, as it is computationally simpler and secure.
Linear lossless quadtree: Representing quadtree in a tree structure requires the use of pointers. However, the amount of space required for pointers from a node to its children is not trivial. Samet (1985) suggested that each node in a quadtree is stored as a record containing six fields. The first five fields contain pointers to the node's parent and its four children labeled as NW, NE, SW and SE; whereas the sixth field describes the intensity value (color) of the image block that the node represents. The pointers would occupy nearly 90% of the memory space required to store the quadtree (Dang and Chau, 2000). As a result, several pointerless quadtree representations have been proposed by researchers such as Lin (1996) and Gargantini (1982).
This research is based on the breadth-first traversal of linear quadtree proposed by Chan and Chang (2001) and Chang et al. (2008). It consists of two lists, i.e., a tree list and a color list. The tree list stores the quadtree structure, where '0' denotes a leaf node and '1' denotes an internal node. The color list simply stores the pixel values of the image in a sequence defined by the tree structure.

RSA Encryption:
Since the encrypted part of the proposed partial encryption scheme is preferably small, public key algorithms has been applied directly to it.
In RSA a plaintext block M is encrypted to a cipher-text block C by: e C M mod n = And the plaintext block is recovered by: RSA encryption and decryption are mutual inverses and commutative, due to symmetry in modular arithmetic. Also, (2-3) show that both encryption and decryption are based on the same operation, which is modular exponentiation. Therefore, hardware implementation of RSA allows the encryption and decryption to share the same architecture, which helps reduce the hardware size.
VHDL modeling: The VHDL model for the proposed work consists of four sub-modules. The overall implementation is known as the PARTIAL_ENCRYPT chip and it consists of the functional sub-modules Compression, Encryption, Decryption and Decompression.
Linear quadtree compression/decompression: Linear quadtree compression and decompression are implemented in two separate blocks, QT_ENCODER and QT_DECODER respectively. The combination of these two functional blocks is named QT_CODEC. The linear quadtree codec connects both QT_ENCODER and QT_DECODER in parallel and to the memory block RAM256X8. There are four input control signals, i.e., CLK, RESET, GO and E_D. The architectures of QT_CODEC, QT_ENCODER and QT_DECODER are implemented using Moore state machines with asynchronous reset. The reset signal (RESET) is used to set the state machine to its initial idle state, while a high GO signal switches it from idle state to the next state. A low E_D signal activates the QT_ENCODER while a high E_D activates the QT_DECODER. The READY signal is high when the compression operation is completed.
For compression, the input image is scanned in an order, where each quadrant is scanned in the NW, NE, SW and SE directions. The input image is stored in the RAM from addresses 00 to 3F (hex) in raster scan order, i.e., from left to right and from top to bottom. For a pixel in an 8×8 image indexed by row I and column J where I, J = 0, 1, 2, …7, its corresponding address in the RAM is expressed by: Address = (8× I ) + J For 8×8 input images, the sequence of RAM addresses in the appropriate scan order is: Where, K5, K4, K3, K2, K1 and K0 are the individual bit (0 or 1) values of a 6-bit counter that counts from 0 to (111111) 2 .
The output of the compression is a tree list that describes the quadtree structure ('0' for leave node and '1' for internal node) and a color list that contains the intensity values of the quadtree. On the other hand, the linear quadtree decompression performed by the QT_DECODER block is just the reverse process of the compression. In linear quadtree compression, if a 2× 2 block of an image is homogeneous, it is reduced to one block containing the pixel value; otherwise it is reduced to an 'I' block. The intensity values in an 'I' block are stored in a list. This continues recursively until the 8×8 image is reduced to only one block. The tree list and color list are stored at RAM addresses beginning with 40(hex) and 80(hex) respectively.

RSA encryption implementation: RSA Module
consists of 3 sub-modules. They are RSA_LOAD, RSA_CORE and RSA_OUTPUT. RSA_CORE performs encryption and decryption, RSA_LOAD serially captures incoming message to be encrypted or decrypted and RSA_OUTPUT serially outputs the decrypted/encrypted message. A RAM with 2048 bits in size was design to provide the storage element for the RSA encryption modules.
Arithmetic Logic Unit accepts 32 bits data as input and produce 32 bits output. The input data is stored temporary in a larger register (34-bits). Arithmetic operations are performed on the temporary register. The working result is then moved to the output port when the operation is done. The design uses four large registers (34 bits) to hold the working results and 2 small registers (5 bits) to hold the loop variables (i, j). The extra 2 bits in the 4 registers are used in order to prevent overflowing during addition operations.
After consideration on the trade-off between security and speed, the size of parameters and signals of the RSA_CORE module for the VHDL model are chosen as follows: • M is the 32-bit plaintext for encryption, or the 32bit cipher-text for decryption • E and N_C are the 32-bit public key (e, n) used for encryption, or the 32-bit private key (d, n) used for decryption • CLK is the clock input signal • RST sets the state machine implemented in RSA_CORE architecture to the initial idle state • GO switches the state machine from idle state to the next state • C is the 32-bit cipher-text produced by encryption, or the 32-bit plaintext recovered by decryption • DONE is high when the encryption or decryption operation is completed, otherwise it is always low Top level design: The overall design incorporates the RSA_CORE module into the linear quadtree codec. The top level entity is named as PARTIAL_ENCRYPT where a low E_D signal activates the QT_ENCODER block to perform linear quadtree compression on the input image stored in the RAM256X8 block. When compression is completed, the RSA_CORE is activated to encrypt the tree list stored at RAM addresses 80 to 82 (hex). The encrypted tree list is then stored at addresses 88 to 8B. On the other hand, a high E_D signal starts the decryption operation of RSA_CORE on the encrypted tree list to recover the tree list. Then decompression is performed by the QT_DECODER to reconstruct the original image. Four test images are used as inputs to verify the correctness of the design using functional simulation. All of the test images are grayscale with dimensions 8×8. For clarity, each image is arranged in a 8×8 table, in which the cells correspond to the pixel intensity values (grayscale level or color). The size of each pixel is 8 bits and its value is expressed in 2 hexadecimal digits.

Theoretical results for test image 1 and its quadtree:
The output of linear lossless quadtree compression is a tree list that contains the quadtree nodes and a color list that contains the pixel values of the image. In the tree list, binary '0' denotes a leaf node and '1' denotes an internal node. The results of linear lossless quadtree compression are: Tree list = 1001110000012 = 9C1 Color list = 00 FF 00 FF 00 00 FF 00 00 FF FF 00 FF 00 FF FF FF 00 00 Size of image = 64×8 bites = 512 bits Size of tree list = 12 bits Size of color list = 152 bits Compression ratio = Size of image / (Size of tree list + Size of color list):   Fig. 3.

DISCUSSION
During the formulation of theoretical results for test image 1, we have omitted the root node and bottommost leaf nodes in the tree list in order to achieve better compression ratio, as the decompression algorithm does not need them. Since decompression is simply the reverse process of compression, its results can be deduced from those of the compression.
Functional simulation of the linear quadtree codec (QT_CODEC) is performed on the four test images with 20ns simulation clock period (50 MHz). The time interval between high GO signal and high READY signal is divided by the simulation clock period to calculate the processing time for compression or decompression. Form the results of functional simulation for linear quadtree compression and decompression as shown in Table 1, it is observed that the processing time is longer with smaller compression ratio and decompression is faster than compression.
In the functional simulation for partial encryption, the time interval between high GO signal and high READY signal is divided by the simulation clock period to calculate the processing time for combined compression and partial encryption or partial decryption and decompression and the results are compared in Table 2. It is concluded that the partial decryption and decompression is much slower because the decryption time of the RSA_CORE module is twice longer than the encryption time. Throughout the synthesis results, there are a few points worth to be discussed. Firstly, from the synthesis results, the RSA Core module utilized around 20% of the chosen FLEX 10KE device. Nevertheless, the clock frequency report showed the critical frequency is only 34.7MHz. This has given the limitation of the frequency of the RSA Module, even though the serial to parallel and parallel to serial converters could achieve 133.9MHz and 89.6MHz respectively. For the RSA Core module, though the 34.7MHz is acceptable, it is not fast enough compare to today's FPGA technology. However, the critical frequency can possibly be increased further by optimizing the circuit through place and route the internal probes. The synthesis of the whole RSA encryption, which included the RAM implementation, has taken up 554 units of logic cell (LC). This is about 35% utilization of the chosen device. Lastly the top level design, which is the PARTIAL ENCRYPT entity, was synthesized. A total of 2928 units of LC were used and it is about 58% utilization of the device (Altera EPF10K100EQC208-1). The frequency achieved was 13.42 MHz.

CONCLUSION
In this research project, the FPGA prototyping of a partial encryption of compressed images algorithm that allows for efficient hardware implementation had been implemented. The lossless quadtree compression and RSA encryption algorithms are chosen for implementation due to their computational simplicity in hardware. It is found from the simulation results that in linear quadtree approach the compression process is faster than the decompression process. Moreover, the RSA simulations show that the encryption process is faster than the decryption process for all four images tested.