FEED FORWARD BACK PROPAGATION NEURAL NETWORK BASED CHARACTER RECOGNITION SYSTEM FOR TAMIL PALM LEAF MANUSCRIPTS

Optical character recognition refers to the process of translating segmented hand-written images or typewritten images into machine editable text. In this study, we propose a Tamil palm leaf manuscripts character recognition system using FFBNN technology. First the palm leaf manuscripts characters are segmented by exploiting the sliding window and adaptive histogram calculation. Afterwards, these segmented characters are assembled and then stored in a database. To accomplish the character recognition process, the characters thus stored in the database are learn by Feed Forward Back Propagation Neural Network (FFBNN). Ten set of Tamil palm leaf manuscripts are utilized to evaluate the performance of proposed FFBNN based character recognition system. The results show the effectiveness of proposed character recognition system in recognizing the palm leaf manuscripts characters and the achieved improvement in character recognition shows accuracy of 90%. The performance of the proposed FFBNN based character recognition system is evaluated by changing the number hidden layers for ten set of Tamil palm leaf manuscripts.


INTRODUCTION
Character recognition is also known as optical character recognition because it talks about the recognition of optically processed characters instead of magnetically processed ones. The objective of recognizing the character is to interpret input as a sequence of characters from an already existing set of characters. Character recognition is one of the most fundamental topics in the context of pattern recognition and it is included recognition of hand written characters and digits. Palkovic (1999) stated that Optical Character Recognition (OCR) is a technique, which is used to locate and identify the text stored in a jpeg or a gif image and then it translates the text into a computer recognized form such as ASCII or Unicode. OCR converts the pixel representation of a letter into its corresponding character representation. Recently, in most of the government agencies and companies, the proof data and documentations that are passed certain period of time are converted into electronic forms by the regulation of an office management. Park et al. (2008) proposed that the stored significant documents by scanning or entering manually on computer is used for document's digitalization. So that the government agencies and companies are trying to reduce these inconvenience nowadays. Optical Character Recognition (OCR) is the most important part of Electronic Document Analysis Systems. The solution lies in the intersection of the area of image, pattern recognition and natural language processing. Though

JCS
there are lot of research is recognition, OCR has only reached in recent years by (Mahender and Kale, 2011). Its aim is to process data that is usually processed only by humans with computers.
Handwritten character recognition means the written characters are classified into appropriate types based on the features extracted from each character. It can be performed either online or offline. The researches on the recognition of the handwritten writing tend is a difficult task because of the differences of the irregularity of the writing and of handwritings of the same writer by (Thein and Yee, 2010).
OCR is playing an increasingly important role in various practical applications like data acquisition in bank checks, automated postal address and ZIP code reading, processing of archived institutional records Reading aid for the blind, automatic text entry into the computer for desktop publication, library cataloging, ledgering, language processing and more by (Kannan and Prabhakar, 2008a). Kannan and Prabhakar (2009) discusses the comparison study of recognition of normal tamil characters. With the help of this comparison, we would not able to process the characters which has been written on palm leaf since the style of those characters are different. Singh et al. (2010) developed a scheme for complete OCR system for different fonts and sizes of Devnagari characters with the help of that they can use in Banking and Corporate sectors.
Several different kinds of character recognition techniques are available all over the world. Automatic character recognition develops the relation between man and appliance in various applications such as office automation, cheque verification, mail sorting and a large variety of banking. Character recognition techniques contain statistical, semantic, back propagation neural network, pattern recognition by (Phyu et al., 2011). The goal is to process the data that normally is processed only by humans with computers. One of the advantages of computer processing is coping with large amounts of information at high speed by (Almohri et al., 2008). Sitamahalakshmi et al. (2010) developed a system to recognize offline handwritten Telugu characters and numerals using five different methods. These results are combined with Dempster-Shafer method to arrive at a single precision result. Dongre and Mankar (2010) discusses the various techniques available in preprocessing, segmentation, feature extraction, recognition and post processing of devnagari optical character recognition. Rahiman and Rajasree (2011) used Intensity variation based Character Recognition System for Handwritten Malayalam Characters. This method provides recognition of isolated and combinational handwritten characters in a noiseless environment.
The outline of the study is as follows: A brief discussion about the recent research works related to the palm leaf character segmentation is given in section 2. The proposed character recognition process is briefly explained in section 3. The experimental result and conclusion of this study is given in section 4 and 5. Gandhi and Iyakutti (2010) have discussed that the applying of horizontal projection concept in segmenting the entire document into individual lines does not function well. Moreover, the segmentation of overlapping lines is a tedious problem when using the unclear Tamil scripts. To solve this problem, an algorithm for segmentation of overlapping lines of uniform sized Tamil text has been formulated. They have analyzed the algorithm on segmentation of distorted Tamil scripts by using test-set of reasonable size. Their results have been showed that the practicability of the proposed algorithm. Priyanka et al. (2010) have proposed a technique for segmentation of individual text lines based on the modified histogram obtained from run length based smearing. They have presented a complete line and word segmentation system for some well-known Indian printed languages. Both foreground and background information has been utilized for exact line segmentation. Some touching or overlapping characters may be present between two successive text lines. Due to these touching and overlapping character occurrences, most of the line segmentation errors are formed. Sometimes, interline space and noises make line segmentation a difficult task. This situation has been handled accurately by their method. They have also discussed the word segmentation from individual lines. Their proposed technique has been tested on the documents of Bangla, Devnagari, Kannada, Telugu scripts as well as with some multi-script documents. Finally, they have obtained a promising result from the proposed technique. Nikolaou et al. (2009) have proposed a technique which introduced some features: (1) a Adaptive Run Length Smoothing Algorithm (ARLSA) has been used to handle the problem of complex and dense document layout, (2) identification of noisy areas and punctuation marks which are common in historical machine-printed Science Publications JCS documents, (3) identification of possible obstacles formed from background areas in order to separate neighboring text columns or text lines and (4) skeleton segmentation paths have been used to isolate possible connected characters. The effectiveness of the proposed technique has been proved by the experiments done using several historical machine-printed documents. Surinta (2009) has proposed two techniques for comparing Thai character and for sorting and distinguishing. These two techniques have been used with recogniz,ed techniques on the basis of projection profile (including horizontal projection profile and stripe). From the outcome of the research, the researcher has suggested that the proposed technique more applicable for sorting and distinguishing the singlecolumn of Thai documents and their technique has achieved an accuracy of 97.11%. Venkatesh and Sureshkumar (2009) have proposed an approach for the recognition and reproduction of hand written documents in South Indian languages. Hand written Tamil Character recognition means the handwritten Tamil characters were converted into printed Tamil character. The scanned image was segmented into paragraphs by using spatial space detection method, paragraphs into lines using vertical histogram, lines into words using horizontal histogram and words into character image glyphs using horizontal histogram. Each image glyph was given to feature extraction phase, that extracted the features of the glyph such as character height, character width, number of horizontal lines (long and short), number of vertical lines (long and short), horizontally oriented curves, vertically oriented curves, number of circles, number of slope lines, image centroid and special dots. Kumar et al. (2005) have proposed a hierarchical framework for document segmentation. Unlike traditional document segmentation algorithms, their model incorporates the dependencies between various levels of the hierarchy. The parameters of the document segmentation algorithm have been learned by applying this framework using optimization techniques such as gradient descent and Q-learning. Learning the segmentation parameters in the absence of ground truth is the ability of their proposed approach.

RELATED WORK
Boveiri (2010) proposed a system for Persian printed numeral characters recognition with emphasis on representation and recognition stages are introduced. They have obtained 99.16% correct recognition which shows geometrical central moments and fuzzy min-max neural network are adequate only for Persian printed numeral character recognition. Laroum et al. (2011) have presented HYBrid REpresentation of Documents (HYBRED) approach which combines different features in a single relevant representation. This approach is used to represent complex data in classification process. Sastry et al. (2010) described a novel method to recognize and classify Telugu (a south Indian language) characters, written on the palm leaves. The overall accuracy obtained in this method is 93.10%. It is applicable for basic Telugu characters. Kannan and Prabhakar (2008b) proposed a technique for handwritten tamil character recognition sytem using octal graph. The overall efficiency of the system was found to be 82%. It is suitable for the handwritten characters not the characters which are in palm leave. Surinta and Chamchong (2008) have presented an image segmentation of historical handwriting from palm leaf manuscripts. Their process has been performed in three steps: (a) background elimination for separating text and background by using Otsu's algorithm (b) line segmentation and (c) character segmentation via histogram of image. The final result is the character's image. Kader and Deb (2012) proposed a system to recognize only numeric digits from 0 to 9, letters from A to Z and alphanumeric characters (0 to 9, A to Z). They haved trained and tested only the printed characters not the handwritten characters. Those printed characters must be in the same angle. If there is any skew in the character, the system would not be able to recognize it. Yeremia et al. (2013) discusses back propagation neural network algorithm is combined with genetic algorithm to obtain the good accuracy for recognizing the printed individual alphabetical characters.

FFBNN BASED CHARACTER RECOGNITION SYSTEM
In this study, we proposed a method for recognizing the characters from the Tamil palm leaf manuscript images. The proposed method utilizing Feed Forward Back Propagation Neural Network (FFBNN) for the character recognition process. Our proposed character recognition method comprised of four stages namely, (i) Preprocessing (ii) Line Segmentation (iii) Character Segmentation and (iv) Character Recognition. These four stages are consecutively performed on the input Tamil palm leaf manuscripts images and the more accurate recognition results are obtained. The basic structure of our proposed Tamil palm leaf manuscript character recognition method based on FFBNN is illustrated in Fig. 1.

Preprocessing, Line Segmentation and Character Segmentation
To get the more accurate segmentation and recognition results, initially four processes like de-skewing the palm leaf digital image, RGB to gray scale conversion, binarization followed by cropping non-textual areas of the palm leaf document image are performed on the input Tamil palm leaf manuscripts by (Ramya and Parvathavarthini, 2012). Afterwards, lines of preprocessed palm leaf images are segmented by accomplish the histogram equalization and the sliding window process. In this, initially, the adaptive histogram equalization is calculated separately for red, green and blue values, subsequently after the conversion of binary image, the mean histogram value is calculated and then threshold value which is to be used for line segmentation is identified. According to the threshold value, the lines in the document image are segmented. For line segmentation, the histogram equalization is calculated horizontally. Based on the sliding window process and the threshold values the lines are segmented from the palm leaf manuscripts.
Finally the extracted lines are segmented into characters using the vertical histogram of the line. Same as the line segmentation, the adaptive histogram equalization and binarization of red, green and blue values are performed vertically for character segmentation. The optimum threshold value which separates each character in a line is found and then using the threshold value, the characters in a line are segmented by (Ramya and Parvathavarthini, 2012).

Character Segmentation
Afterward the process of character segmentation, the recognition process is performed by using the AI technique. For accomplishing the character recognition process, the database palm leaf manuscripts images characters are segmented and stored in the character database. Let us considered the Tamil palm leaf manuscripts database

FFBNN Training
To perform the character recognition process, the characters from the database C d is given to the Neural Network (NN). In our proposed work, we have utilized Feed Forward Back Propagation Neural Network (FFBNN) for the character recognition. In training phase, each document images segmented characters are given as input to the FFBNN (Venkatesh and Sureshkumar, 2009). In our proposed method, a q number of FFBNN (q-FFBNN) is utilized to accomplish the recognition process. In q-FFBNN, there is H d number of hidden layers and one output layer, which indicate that the corresponding input character is recognized or unrecognized. The q-FFBNN is well trained by this segmented characters and provided an accurate result for the corresponding input. The proposed character recognition by q-FFBNN structure is shown in Fig. 2.
The q-FFBNN contains q input units (the character q is recognized in q-FFBNN and other character are unrecognized), one output units and H d hidden units. Initially, the input value is transmitted to the hidden layer and then to the output layer. This process is called the forward pass of the back propagation algorithm. Each node in the hidden layer gets input from the input layer, which are multiplexed with suitable weights and summed. The hidden layer input value calculation function is called as bias function, which is described below: Science Publications

Fig. 2. Structure of Tamil palm leaf manuscripts characters recognition by q-FFBNN flow for characters recognition
In Equation (1), z a ij q c is the input segmented character from the document z ij a . The output of the hidden node is the non-linear transformation of the resulting sum. Same process is followed in the output layer. The following Equation (2) denotes the activation function of the output layer. The output values from the output layer are compared with target values and the learning error rate for the neural network is computed, which is given in Equation (3) In Equation (3), δ is the learning error rate of the q-FFBNN, z a ij h D is the desired output and z a ij h A is the actual output. The error between the nodes is transmitted back to the hidden layer and this process is called the backward pass of the back propagation algorithm. The reduction of error by back propagation algorithm is described in the subsequent steps.
Initially, the weights are assigned to hidden layer neurons. The input layer has a constant weight, whereas the weights for output layer neurons are chosen arbitrarily. Subsequently, the bias function and output layer activation function are computed by using the Equation (1 and 2).
Next, the back propagation error is computed for each node and the weights are updated by using the Equation (4) is changed, which is given as Equation (5): where, δ is the learning rate that normally ranges from 0.2 to 0.5 and E (φ) is the BP error. The bias function, activation function and BP error calculation process are continued till the BP error gets reduced i.e., E (φ) <0.1. If the BP error reaches a minimum value, then the q-FFBNN is well trained by the segmented characters for performing the character recognition. The well trained q-

JCS
FFBNN provides an accurate recognition result for the respective input characters.

FFBNN Testing
During the testing, the unknown Tamil palm leaf manuscripts images to be taken for analyzing the performance of the trained FFBNN. The unknown palm leaf images are given to the preprocessing, line and character segmentation process which is already given in section 3.1. The segmented characters are given to the well trained q-FFBNN to check whether the given input characters are recognized or unrecognized. The unknown document u ij segmented characters are given to the well trained q-FFBNN and the output from the q-FFBNN is represented as D uij . Based on the output value recognition process is performed by: In Equation (6), the q-FFBNN output value D uij is compared with the threshold value t, if the output value D uij is greater than the given threshold value t means the given input palm leaf manuscripts segmented characters are recognized or otherwise unrecognized. The similar procedure is followed for all Tamil palm leaf manuscripts images to recognize the palm leaf manuscripts characters more effectively.

DISCUSSION
The proposed palm leaf manuscripts character recognition method based on FFBNN is implemented in the working platform of MATLAB (version 7.12). It has utilized more number of palm leaf manuscripts images to analyze the recognition performance. The sample palm leaf manuscripts images are given in Fig. 3. The statistical performance measures, which are obtained for ten different palm leaf manuscripts images, of our proposed method for the different number of hidden layers are shown from Table 7-12.
As can be seen from Fig. 4, proposed FFBNN has given high accuracy in detecting the authorized and unauthorized characters when the number of hidden layer is 10. By changing the hidden layer is 5, 20, 30, 40 and 50 the accuracy of FFBNN is decreased and also when analyzing the sensitivity measure the number of hidden layer is increased the performance is decreased. Same as the accuracy measure, the proposed FFBNN specificity measure also has given low performance when increase the number of hidden layers. The specificity measure of FFBNN is 100% when the number of hidden layer is 10. Hence our proposed FFBNN based palm leaf manuscripts character recognition technique has attained overall recognition of 90% accuracy.
As can be seen from Fig. 4, proposed FFBNN has given high accuracy in detecting the authorized and unauthorized characters when the number of hidden layer is 10. By changing the hidden layer is 5, 20, 30, 40 and 50 the accuracy of FFBNN is decreased and also when analyzing the sensitivity measure the number of hidden layer is increased the performance is decreased.

CONCLUSION
In this study, we proposed a palm leaf manuscripts character recognition method to recognize the characters from the palm leaf manuscripts. The proposed method was implemented and a huge set of palm leaf manuscripts images were utilized to analyze the results of the proposed character recognition method. The performance analysis proved that the FFBNN based character recognition method offers a remarkable rate of accuracy, sensitivity and specificity measures at different number of hidden layers. When increasing the number of hidden layers our proposed FFBNN has attained low performance of accuracy, sensitivity and specificity measures. The 10 number of hidden layers in FFBNN our proposed has acquires high level accuracy, sensitivity and specificity values. The performance analysis results show that the FFBNN based character recognition method more accurately recognizes the palm leaf characters.