Classical Flexible Lip Model Based Relative Weight Finder for Better Lip Reading Utilizing Multi Aspect Lip Geometry 1

Problem statement: Deaf and dumb needs assistance from a technical box that takes movements of lips to identify the words. This technical article provided appropriate model implementation of flexible lip model for better visual lip reading system. Approach: From the frame sequence of words, Active Shape Model (ASM) based lip model provided local tracking and extraction of geometric lip-feature. Two geometric criteria define required geometric features and its variations in the sequence. Results: The feature established machine classification using Analytic Hierarchy Process (AHP), a relative weight finder. AHP presents weight vector to fuzzy classifier to decide the video frame sequence belonging to a respective word. Conclusion: The suggested model tested on a total of 5 different sample databases results in 83.2% accuracy over the other combinational algorithms.


INTRODUCTION
Identification of a word is from a video framesequence with the help of a statistical classifier using a snapshot database has been attempted in this study. Compared to the requirements of speech signal, environmental noise problems are not difficult to deal visually. This feature reduces the burden of preprocessing and makes the model to run fast. In a sequence of frames, lip tracking and segmentation can be done in every frame using traditional image based techniques. Here, lip-tracking becomes stand alone issue. In addition, any image based technique makes it difficult to find geometrics in glossy environment. Regarding local tracking and segmentation of lips, model based techniques called active shape model solves both issues in single step by minimal-set features effectively, which are consistent over frame constraints (Mattews et al., 2002). ASM is used in lipreading but the feature extraction method supported in this study is different from the technique proposed in (Faruquie et al., 2000;Mok et al., 2004;Sum et al., 2001). ASM is widely used in facial recognition and most of the researchers use this model only to extract the lip region and branch towards other areas, not focusing on the lip reading. Geometrics of outer and inner lips are found from ASM with optimal-point lip model. Inner lip geometry is a sub-set to provide additional information. Usual training and testing of classifiers like HMM and other network models become time consuming and more erratic when database size increases with more words. These classifiers are loaded heavily in sequence comparison of individual frame features belonging to a word, while training it. By introducing AHP, a relative weight finder wherein the relationship of each feature to a shape has been characterized on a numerical scale and its weight is defined, to minimize erratic decisions taken by classifier. Once the burden is shared by AHP, simple fuzzy classifier makes decision about a particular word, belonging to the test frame-sequence.
The objective of this study is to deploy the above techniques in lip reading to classify the words from 'one' to 'nine', in a customized database created for this purpose. The organization of this technical article is as follows: Under active shape model defines local tracking and segmentation lip images. Feature selection defines about feature extraction based on length and area information. The relative weight finder includes: the subsection AHP classifier algorithm and the next subsection as fuzzy decision maker. Finally the results and discussion followed by the conclusion.

MATERIALS AND METHODS
Active shape model: ASM is a shape constrained iterative fitting algorithm (Koschan et al., 2003;Cootes, 2000;Cootes et al., 1993;1994;Fieguth and Terzopoulos, 1997) where the Point Distribution Model (PDM) called statistical shape model, characterizes the shape constraint. ASM represents the shape as: Where: X i , Y i (1≤ i ≥ 16) = The coordinate of the i th Point of outer lip contour a j , b j (1≤j ≥ 4) = The coordinate of the j th Point of inner lip contour Initial model points x k in the shape represented by the function: Transformation of (X P) by[x , y ,s, ] X ′ + λ θ = (2) When initial shape obtained from training of it based on sample of shape of a person, then tracking and deformation of the shape to frame lip shape does not need multi-resolution strategy (Cootes et al., 1993), further it achieves fast convergence. Figure 1a is the histogram equalized gray scale test frame for the word 'seven'. ASM segmentation of that lip image is displayed in Fig. 1b from a video frame of that word. Figure 1c represents outer and inner lip contours meanwhile to realize the dimensions. They represent texture model with edge constraints to have better lip contour introduced with 16 point references. Inner lip contour gets only 4 point references with reduced effort on it. These point model is compact representation of lip shape and it can be obtained from sample lip image of a person belongs to a particular database called training image.
Use the Constraint-Based Model m Matching technique for image search, to match frame lip shape. Gaussian satisfied, normalized profile defines guidelines for gradient matching. The degree of reaching frame lip shape from sample is given by: Where: m' = The mean of normalized first derivative of gradient profile {m i } (1≤ i ≥16) and S m = The covariance Convergence minimizes f (m s ) and its value in the probability of m s in the distribution. Strong edge of lip shape is identified by least probability of m s, this may be the best choice but could not be the optimal choice. Edge constraint keeps all the 16 points to lie on strong gradient edge of outer lip. Improved mahalanobis distance function with Sobel edge intensity can increase the probability of finding strong edge of lip shape. For the Inner lip features, finding 4 point inner lip shape is an integrated one in these sample models.
Feature selection: Selection of features, belongs to a two-dimensional lip shape with outer and inner contour, is based on length and area-based criteria. The length based feature is being further classified into outer lip length-width ratio, vertical up-distance ratio and inner lip length-width ratio. The Outer Lip Length-Width Ratio (OLWR) can be obtained by the expression: pixel positions are key points to calculate OLWR. In both X min and X max , change in row positions will never affect conceptual analysis, so that the OLWR is calculated as 1.3065. When the pronunciation of the word happens, wide variation happens in OLWR. It becomes vital feature in feature domain.
The Vertical Lip Distance Ratio (VLDR) is the second length based feature. From the centre of X min − X max line, the centre pixel position of the line, the distance to the top edge of outer upper lip contour is called L 1 , is of 133 pixels and the distance to the bottom edge of outer lower lip contour is called L 2 , is of 129 pixels. The ratio between L 1 and L 2 is called vertical lip distance ratio and its value is 1.0310. This geometry is one of the length based representation of openness.
Third length based feature is Inner Lip Width-Length Ratio (IWLR). IWLR is similar to outer lip length-width ratio.
Where: IWLR = 0.1731 Y imax = Inner lip contour's bottom edge Y imin = Inner lip contour's top edge The area based feature constitutes the lipopenness area ratio. This requires the area occupied by outer lip contour as a whole (A 1 ) and the area occupied by inner lip contour (A 2 ), are obtained by applying a seed-filling algorithm. Lip-Openness Area Ratio (LOAR) is the first feature under area-based features that needs both A 1 and A 2. LOAR = A 2 /A 1 = 0.1676 LOAR becomes pure indication of openness. LOAR accounts inner lip length and width indirectly.
By analyzing these features proportional decremented changes in length with incremented change in width may keep A 2 as approximately constant but ILWR increases rapidly. These two parameters play a major role to decide different kind of openness. Feature matrix of considered frame is:   Table 1, lists all the essential attribute values of normalized frame 1, 6, 11 and 16 sequence belongs to the words 'Three', 'Five' and 'Seven'. Then, the comparison of individual features for three different words is shown in Fig. 2. From Fig. 2 the ILWR for all these words are more and among this the ILWR for 'Five' is more than 'Three' and 'Seven'.

Relative weight finder:
When a database is applied to feature extraction after both normalization and selection of p number of test frames, its feature matrices are stored. Input video frames are also applied to the normalization of requirement and selection of same p number of test frames. Finally input video's feature matrix is obtained. The four features of lip shape, is used to find individual feature matrices as follows:  where, f is a particular feature collected from p number of test frames of a word. The individual feature matrices are sequenced according to the priorities decided by ourselves based on geometric importance. AHP's structure is decided as shown in Fig. 3.

AHP classifier:
Step 1: Calculating relative weights: The relative weights in the feature level are expressed as pair-wise comparisons (Saaty, 1980;2008) as the ratios of relative importance between pairs of same feature from database and test input. The weight of every feature is represented as an integer between 1 and 9 and the relative weights of features are expressed as a ratio. These values define only the weight: It is therefore always possible to normalize the vector w by imposing: The ratio w i /w j would then be the relative weight of the i th to the j th element. By combining all possible pair-wise relative weights into a preference matrix C as follows: The elements of matrix A have the special property w i / w j = a ij = 1/a ij = 1/( w j /w i ) for all i and j.
Step 2: calculating global weights: The resultant vector with a heavier global weight is the module's classification decision. The ten AHP modules of pronunciation of numeric's gave us ten global weights. AHP normalizes the weight of the i th row as follows: Let w (k-1) = (w 1 {kand then weight is calculated from top to 1) , w 2 {k-1) ,......, w p {k-1) ) be the weight for the given number of elements in the (k-1) th level and q j (k) = (q 1j (k) , q 2j (k) ,......, q pj (k) ) T is the weight for the next level of p elements in the k th level bottom as: This weight is called as global weight of every AHP module. Consistency of the preference matrix (Forman and Peniwati, 1998;Barzilai and Golany, 1990;Peniwati, 1996) is verified by consistency Index.

Fuzzy decision maker:
The study is focused on classification and not recognition. Although AHP converges well in terms of weight vector, each AHP module can only distinguish one word's lip movement from other word's lip movement. Secondly, as the lip movement happens, it need not belong definitively to one word or another in the database.
The fuzzy membership theory (Mikhailov and Tsvetinov, 2004;Pedrycz and Gomide, 1998) is premised on the observation that many phenomenal lip movements cannot be discreetly categorized as members of one word or another, but rather, share features with other word so that they may be said to belong to one or more word and only to some degree. These factors are imposing the necessity of using fuzzy final decision maker.

RESULTS AND DISCUSSION
Well identified feature and its variations define better classification process. OLWR is identified as primary feature and its values of the word "FIVE" and "seven" are compared as given in Fig. 4 from Table 1 values. With reference to the comparison given, it is proved that feature definition is more vital in lip reading process.
A multi-feature lip model derivation increases the consistency as well as accuracy. A database consisting of five different speakers with all 10 digit video frame sequence is created. The image used for the experiment is of person-3 created by recording Audio Video Interface (AVI) files and it is converted to frames for further processing.