CARRY SAVE COMMON MULTIPLICAND MONTGOMERY FOR RSA CRYPTOSYSTEM

RSA public key cryptosystem provides encryption and digital signatures. With growth of key size an efficient design of RSA in terms of area, frequency, throughput and power consumption is hard to achieve. Also with the different type of attacks possible, a need for secure RSA cryptosystem which is attack resistant has arisen. This study presents RSA design with Montgomery powering ladder and proposed carry save common multiplicand Montgomery on FPGAs. Since the modular exponentiation is based on Montgomery powering ladder therefore it is power attack resistant. Common multiplicand Montgomery modular multiplication reduces the complexity by computing once the common operations in modular squaring and modular multiplication. The proposed carry save common multiplicand Montgomery modular multiplication maintains intermediate results in carry save form and utilizes the DSP slices to convert the redundant results into binary at the end of the modular multiplication. The proposed RSA design implemented on FPGAs is efficient in terms of area, frequency, power consumption and is power attack resistant.


INTRODUCTION
RSA is a popular public key cryptosystem (Rivest et al., 1978). The security of RSA lies in large size operands which are 1024 bits or more. RSA encryption and decryption are modular exponentiation functions. Classical binary exponentiation methods-left to right and right to left perform modular squaring in each iteration but modular multiplication only when exponentiation bit is one. Montgomery powering ladder has a regular structure with parallel modular squaring and modular multiplication and prevents the implementation attacks due to its regular behavior (Joye and Yen, 2002). Common multiplicand multiplication takes the advantage of parallel modular squaring and multiplication and reduces the complexity by computing once the reductions on common multiplicand. Common multiplicand Montgomery design suitable for hardware implementation is proposed in (Wu et al., 2013). Their word based radix 2 and radix 4 architectures have been presented by the authors in (Wu et al., 2013). Various architectures: Systolic arrays and carry save designs (McIvor et al., 2004;Fournaris, 2010) for Montgomery modular multiplication (Montgomery, 1985) are in literature. Carry save designs provide the advantage of high frequency at cost of large area when implemented on FPGAs. This is due to the mapping of carry and sum bit on different LUTs. A high performance fault attack and simple power attack resistant modular exponentiation with carry save Montgomery modular multiplication is proposed in (Fournaris, 2010). It employs carry save logic in all its inputs, outputs, intermediate values and computations. It is optimized in terms of area, frequency and throughput and is attack resistant. The work in this study aims in power attack resistant efficient RSA design with low power consumption so that it is energy efficient design. To achieve it, the RSA is based on Montgomery powering ladder, carry save common multiplicand Montgomery modular multiplication. It uses 2 DSP slices for redundant to binary conversion at end of common multiplicand Montgomery modular multiplication. Section 2 gives a brief introduction to Science Publications AJAS common multiplicand Montgomery modular multiplication. The proposed carry save common multiplicand Montgomery modular multiplication for RSA is presented in section 3. Section 4 presents its architecture. Section 5 presents the modular exponentiation for RSA based on Montgomery powering ladder and carry save common multiplicand Montgomery. Section 6 gives the implementation results and comparison with related carry save designs in literature. Section 7 concludes the paper.

COMMON MULTIPLICAND MONTGOMERY MODULAR MULTIPLICATION
Common multiplicand Montgomery modular multiplication takes the advantage of the common multiplicand in the modular squaring and modular multiplication and divides them into two parallel processes (Wu et al., 2013). Let R and P be k bit numbers, n is k bit modulus and MMM is Montgomery modular multiplication.
Algorithm 1 is common multiplicand Montgomery modular multiplication proposed by authors (Wu et al., 2013). Algorithm 2 is the proposed carry save method for common multiplicand Montgomery modular multiplication. All the intermediate addition operations of large numbers are done with carry save adders. The input operands to algorithm are in binary form. To convert the results from redundant to binary few extra cycles are required. Also it is essential to perform the conversion of result from redundant to binary at the end so that in successive common multiplicand Montgomery modular multiplication in exponentiation, the accumulation of partial products can start from most significant bit of multiplier.

PROPOSED CARRY SAVE COMMON MULTIPLICAND MONTGOMERY MODULAR MULTIPLICATION
Algorithm 2 takes input P, R and n, computes modular reduction on T which is initialized to the common multiplicand P. To reduce the iteration time the various steps are parallelized by making them independent computations.

ARCHITECTURE OF CARRY SAVE COMMON MULTIPLICAND MONTGOMERY MODULAR MULTIPLICATION
Common reduction unit • X, Y accumulation units • Adders The I/O interface takes three inputs P, R and n and gives two outputs X and Y in binary. The control unit controls the sequence of computations to achieve modular multiplication. Common reduction unit computes quotient and reduction on common multiplicand. Common reduction unit, X and Y accumulation units are pipelined so that common reduction and accumulation are computed in parallel. The counter keeps track of the computations. Adders convert the result from redundant to binary. The number of cycles in conversion from redundant to binary depends on the adder and its implementation.

RSA MODULAR EXPONENTIATION
Algorithm 3 is the modular exponentiation based on Montgomery powering ladder and common multiplicand Montgomery modular multiplication to compute M e mod n (Wu et al., 2013). M is converted to Montgomery domain and R is reassigned pre-computed value Z. This is done to have one modular multiplication unit in exponentiation. If exponent bit is set then: R = P. R. 2 -(k+2g) mod n , P = P 2 2 -(k+2g) mod n.
At step 8 and 9 the result is converted to integer domain and stored in C respectively.

IMPLEMENTATION RESULTS
RSA modular exponentiation with Montgomery powering ladder and carry save common multiplicand Montgomery modular multiplication is coded in VHDL and synthesized in Xilinx ISE design suite 12.4. The target device is xc5vlx50t (package ff65 target speed -3). The size of operands is 1024 bits and encryption exponent is e = 2 16 +1. Figure 2 shows DSP48E chosen for addition. To convert the results from redundant to binary (algorithm 2) two DSP48E are used that work in parallel to add X = X1+X2 and Y = Y1+Y2. 48 bit operands and carry bit are taken in each cycle and added to give 48 bit result and 1bit carry out that becomes carry in for next cycle. For RSA 1024 bits the operand size in common multiplicand is 1036(1024+12) bits which requires approximately 22 cycles for addition using DSP48E.
Using IP core and architecture wizard DSP48E is selected and the instruction: Is given and CARRYOUT is selected which is shown in Figure 2. Table 1 gives the number of cycles for RSA 1024 bit modular exponentiation and taking encryption exponent e = 2 16 +1. For 1024 bits, g has value 12 (Wu et al., 2013). For 17 bit exponent the total calls for Common multiplicand Montgomery are 17+1 (from integer to Montgomery domain) +1 (Montgomery to integer domain). The total cycle count for RSA exponentiation is 20368. Table 2 gives area results in terms of slice registers, LUTs and DSP48Es. These results are obtained from place and route report generated in Xilinx ISE 12.4.

AJAS
RSA exponentiation based on carry save Montgomery modular multiplication was proposed by the authors (McIvor et al., 2004) and its implementation results on Virtex 2 FPGAs was presented. They used carry save adders for addition of operands during modular multiplication and there was no conversion of results from carry-save to binary at end of modular multiplication. The proposed carry save common multiplicand Montgomery in this work requires a format conversion from carry save to binary at end since each successive modular multiplication in exponentiation starts the accumulation of partial products from the most significant bit of multiplier. Our proposed modular multiplication uses two DSP48E for addition. It adds 48 bits in one cycle and requires 22 cycles for addition of 1036 bits. The proposed RSA is implemented on virtex 5 FPGAs. Its implementation on virtex 2 FPGAs is not possible due to lack of DSP slices in virtex 2 FPGAs. Compared to RSA (Fournaris, 2010) which is based on carry save Montgomery modular multiplication and is attack resistant, our RSA is efficient in terms of area, frequency and throughput. Also the implementation of RSA with Montgomery powering ladder naturally protects it from many implementation attacks (Joye and Yen, 2002). The power consumption of our RSA design is very less as compared to the power consumption of 1024 bit modular multiplication in (Ye et al., 2013). The addition cycles in our work for redundant to binary conversion can be further reduced by using fast adders presented in (Zicari and Perri, 2010). The use of reversible logic in Montgomery modular multiplication to prevent power attacks was presented in (Nayeem et al., 2009). The performance of common Multiplicand Montgomery modular multiplication with reversible adder proposed in (Haghparast and Navi, 2008) can be analyzed.

CONCLUSION
In this study, RSA modular exponentiation based on Montgomery powering ladder and carry save common multiplicand Montgomery modular multiplication uses DSP48E to convert result from carry save to binary at end of modular multiplication. The design is efficient in terms of area, throughput and power. Also the design is power attack resistant. The throughput is inversely proportional to the cycle count. The number of cycles of carry save common multiplicand Montgomery modular multiplication can be reduced with the use of efficient adders used to convert redundant results to binary. Also area results can be improved by efficiently mapping the carry save design on FPGAs.