Motion Detection and Projection Based Block Motion Estimation Using the Radon Transform for Video Coding 1

Problem statement: Motion estimation and compensation is the most computationally complex module of video coder. In this study, an in novative algorithm was proposed for a further complexity reduction of the Motion Estimation (ME) module of video coder by employing motion detection prior to motion compensation. Approach: A Motion Detection (MD) module can be added to the video coder in order to decide whether the c urrent block contains motion or is with zero motion . This study propounded a MD module that depends on motion activity. Generally, the correlation of the two consecutive frames is a good criterion to measu re motion activity. We applied the correlation as a threshold to detect the motion activity. To assure the correct motion vector and thus better video quality, to calculate motion vector of motion block s this study also proposed a new block matching motion estimation criterion based on the Radon tran sform using projection-matching method. However, computationally complex, the method had the ability to be implemented in real time by using pipeline architecture. Results: A comparative result showed that the MD module reduced the number of search points for motion estimation and compared with some well-known algorithm that uses minimum absolute difference criterion, the new criterion can provide much higher performance. Conclusion: The result showed that this proposed scheme can simplify the encoder complexity maintaining good video quality.


INTRODUCTION
The wide range of multimedia applications based on video compression (video telephony, video surveillance, digital television) leads to different kind of requirements for a video coding standard (image quality, compression efficiency). Several multimedia application areas require high power efficiency (especially in the video encoder part) in order to work on embedded systems and mobile terminals (Paul et al., 2005;Richardson, 2002). This requirement implies the need to dramatically reduce the complexity of the video encoder. Algorithmic analysis shows that motion estimation is the most complex module in the video encoder. This is mainly due to the great number of calculation in motion estimation. Having this in mind, the present study postulates motion detection for fast and efficient video coding (Bailo et al., 2006). Many algorithms are suggested in the literature to reduce the number of calculation intended for motion estimation based on spatial and temporal correlation of motion vector (Nakaya and Harashima, 1994;Korada and Krishna, 2001;Nam et al., 2000a).
We suggest motion detection method to separate out an active region and an inactive region within a frame. A novel reduced complexity motion detection technique is presented in this study. It is based on segmenting the current frame in to blocks containing moving objects (an active region) and stationary background (an inactive region). The inactive region is assumed with zero motion vectors. It has been experimentally proved that about 75% of the total blocks within frame corresponds to stationary background and that can be considered with zero motion vector. The early detection of blocks with zero motion vector leads to significant redundant computation being skipped and thus speed up the coding of video sequence. In this study more precise threshold value is proposed to decide motion activity of a block. The threshold value is derived from the concept of correlation and no assumption is adopted, ensuring that video quality is not degraded. And for the motion vector estimation of an active region a new feature based bock matching criterion using the concept of projections that is built by the radon transform is presented in this study. Various matching criterion were presented in the literature for motion vector estimation. In order to reduce the computational effort, a sub sampling scheme was performed in (Bierling, 1988). Only every second pixel is taken into account for estimation of distortion in both horizontal and vertical directions and the computational burden is reduced by a factor of four. Aliasing effects can be avoided by lowpass filtering. In (Zaccarin and Liu, 1992;Liu and Zaccarin, 1993), periodic alternation of four subsampling patterns was adopted on different search positions to solve the aliasing without filtering. Adaptive pixel-decimation scheme was further proposed in (Wang et al., 2000). It does not require an initial division of a block and selects pixels only when they have the features important in determining a match. Pixel Difference Classification (PDC) (Gharavi and Mills, 1990) is to threshold every pixel absolute difference and classifies it into match or mismatch. The best candidate block has the highest number of match pixels. However, the threshold value affects the quality a lot and is not easy to be decided automatically. Minimax criterion (Chen et al., 1995) finds the maximum error among all pixels in a candidate block and then chooses the final MV by minimizing the maximum errors of all candidate blocks. Boundary match (Chen, 1997) is also a simplified matching criterion. Moreover, it is often adopted for error concealment from loss of motion vectors. The concept of integral projection was introduced in (Kim and Park, 1992;Sauer and Schwartz, 1996). For simple translational motion, the information on the axes in the Fourier transform domain is sufficient to estimate motion between two images. Computing the horizontal and vertical frequency information is equivalent to discrete approximation as integral projections at these two orientations. The simple equivalence between shifts in integral projection measurements and shifts in the corresponding segments of images suggests the comparison of integral projections as a computationally efficient technique for block matching estimation since the pair of horizontal and vertical projections contain fewer data than the pixels in a candidate block. The sum of absolute difference and mean square error matching criterion involves all pixels in the current block and the candidate block and it does not consider orientation of pixel in different directions hence it fails to find the best match using only pixel differences. Structural information plays important role for image quality. In this study, structural information is modeled as the energy of directional projections using the radon transform. Block distortion could be modeled as the difference between the directional projections of reference and current bock.

Motion detection:
Motion detection is carried out prior to motion estimation to avoid the heavy computational overhead. This study recommends a simple method for motion detection based on correlation. Calculate the correlation of the frame as: Where: A = Denotes the reference frame B = Denotes the current frame M×N = The frame size A , B = The mean of the reference and current frame respectively 'cor' = The threshold value An input frame is partitioned into blocks of p×q pixels and then the local correlation of p×q block is premeditated by: where, A b and B b denotes the mean of the block difference of current and reference frame respectively. There is comparisons made between 'corb' and threshold 'cor' to classify the block under consideration as a part of moving objects (active region) or as part of stationary back ground (inactive region). If corb<cor, the block is a part of moving objects and if corb>cor the block is a part of stationary background and is considered with zero motion vector. The motion activity is not constant over the entire frame of an image sequence thus the block correlation (threshold), is not taken as a constant value but it is locally calculated. Figure 1a and b shows the two successive frames of News sequence, Fig. 1d and e shows two successive frames of Silent sequence and Fig. 1g and h shows two successive frames of Claire sequence. , f and i depicts the motion detected region obtained by the motion detection algorithm. It is apparent that, approximately 25% of the frame area is detected as an active region. Remaining 75% frame area is found as an inactive region. The inactive region is consider with zero motion vectors. Thus we can save intense computations involved in finding motion vectors. Only active region will be consider for motion estimation The proposed criterion: In general way, an image may be segmented into two kinds of areas: Structure and texture and especially between them, structural information plays a more important role. Natural images are of much structure that carries important information about the objects structure for the visual scene (Wng et al., 2004). The pixels, edges and shape with directional characteristics contribute a lot to the structural information. The distortion of the structural information could be modeled as the alteration of the directional characteristics. In this study we try to use a directional projections vector to present the directional characteristics built by the Radon transform (Deans, 1983).

Fig. 2: The radon transform computation
The radon transform: Firstly the Radon Transform is introduced before we begin the discussion of the use of projections in motion estimation. The 2D Radon transformation is the projection of the image intensity along a radial line oriented at a specific angle (Deans, 1983). Suppose a 2-D function f(x, y) as shown in Fig. 2. Integrating along the line, whose normal vector is in θ direction, results in the g(s, θ) function, which is the projection of the 2D function f(x, y) on the axis s of θ direction. When s is zero, the g function has the value g(0, θ) which is obtained by the integration along the line passing the origin of (x, y)-coordinate. The points on the line whose normal vector is in θ direction and passes the origin of (x, y)-coordinate satisfy the equation: The integration along the line whose normal vector is in θ direction and that passes the origin of (x, y)coordinate means the integration of f(x, y) only at the points satisfying the previous equation. With the help of the Dirac functionδ, which is zero for every argument except to 0 and its integral is one, g(0, θ) is expressed as: Similarly, the line with normal vector in θ direction and distance s from the origin is satisfying the following equation: (x s cos ) cos (y s sin ) sin 0 x cos ysin s 0 So the general equation of the Radon transformation is acquired (Deans, 1983): We note here that while the above definition represents the model for the Radon transform of a continuous image, we will in practice use a discrete version of the Radon transform. Very early works such as (Alliney and Morandi, 1986) use image projections at 0° and 90° to register translated images using a relative phase approach. In these works, the projections used to estimate translational motion were confined to 0° and 90°. Similarly, in (Cain et al., 2001) the authors use correlation between pairs of image projections at 0° and 90° to again register translated images. The above methods have not addressed the performance issues concerning the application of projections in estimating motion vectors. The present study justifies the use of projections based on the Radon transform to estimate the motion vectors.

Radon Projection Matching (RPM) criterion for block motion estimation:
To understand how to estimate motion parameters indirectly using projections, we must first explore the relationship between motion in the original image sequence and the "induced" motion or transformation in the projections. We begin our analysis for the simple case of translational motion which is completely characterized by the shift vector 0 v r . The simple relationship known as the shift property of the Radon transform (Gharavi and Mills, 1990), relates motion in images to the motion in projections by: where, The current and reference image is divided into small blocks with the block size M×N.
The following steps describe the steps to compute the proposed criterion.
Step 1: Compute the projections of the n th current (B C ) and reference frame (B R ) block using the Radon transform: where, nC P and nR P are the directional projection vectors corresponding to θ.
Step 2: The distortion is calculated as: Step 3: Reference block with minimum distortion is consider as the best matched block this specifies the motion vector for the current block.
In this study we build the directional projection vectors of the blocks based on the Radon transform. The difference of the projection vector represents the distortion. The lower the value of the distortion better the match is. In Eq. 5 there are two parameters s and θ. θ is varied to achieve scalable efficiency and computational cost.
Owing to find out the projections at different angles the RPM criterion is computationally complex. But Sanz (1988) have shown that the Radon transform is suitable for being implemented in a powerful and flexible pipeline architecture that they call a Parallel Pipeline Projection Engine. Moreover, as the projection considers the pixel orientation within block in different directions the RPM assures the accurate motion vector.
The projected criterion can be used along with any conventional block-matching algorithm to find the motion vector.

Motion estimation:
The proposed motion detection module is utilized to find the active region of motion. Motion detection decides whether to subject the block for the motion vector estimation or block is consider with zero motion vector. The complexity of the video coder is strongly influenced by the number of calculations required to find motion vector. Consequently by reducing the number of blocks for motion estimation can save a measurable time in the encoder process with minor effects on the quality of the produced video sequence. Block matching algorithms are the most popular motion estimation methods, which are adopted by various video coding standards such as MPEG-1 and MPEG-2 due to their less computational complexity (Zhu and Ma, 2000;ISO/IEC 13818-4, 1995). One of the first algorithms to be used for block based motion estimation is Full Search Algorithm (FSA) or Exhaustive Search Algorithm (ESA), which evaluates the Block Distortion Measure (BDM) function at every possible pixel locations in the search area (Jain and Jain, 1981). Although this algorithm is the best in terms of quality of the predicted frame and simplicity, it is computationally intensive. In the past two decades, several fast search methods for motion estimation have been introduced to reduce the computational complexity of block matching, for examples, two dimensional Logarithmic Search (LOGS) (Nam et al., 2000b), three-Step Search (3SS) (Koga et al., 1981), four Step Search (4SS) (Lai-Man and Wing-Chung, 1996), Modified Orthogonal Search Algorithm (MOSA) (Metkar and Talbar, 2009). Among these algorithms, 3SS becomes most popular one for low bit rate application owing to its simplicity and effectiveness The three-Step Search algorithm (3SS) is proposed by Koga et al. (1981). This algorithm is based on a coarse-to-fine approach with logarithmic decreasing in step size as shown in Fig. 3. The initial step size is half of the maximum motion displacement d. For each step, nine checking points are matched and the minimum BDM point of that step is chosen as the starting center of the next step.  After each stage the step size is halved and minimum distortion of that stage is chosen as the starting center of the next stage. The procedure continues till the step size becomes one. The number of checking points required equals to [1+log (d+1)]. The proposed study suggests three-step search algorithm with Radon Projection Matching (RPM) criterion for motion vector estimation of active region blocks.

RESULTS
The test sequences used in the experimental results are the seven standard QCIF (176×144) videos (100 frames) of Foreman, Carphone, Silent, News, Grandma, Miss America and Salesman defining different motion content. We had selected block size as 8×8 as a tradeoff between computational complexity and the quality of the image. Out of total 396 blocks, Table 1 list the average number of inactive blocks detected by the motion detection module. It is not essential to execute motion estimation and compensation for all these blocks. These blocks are considered with zero motion vectors. As a result, numerous unnecessary calculations required for motion estimation of these blocks are saved. To find the motion vectors of the blocks in the active region the study tested the result of the projected method by adopting standard three Step Search (3SS) algorithm. The computational complexity of the different techniques in terms of average number of search points is shown in Table 2. We can observe the measurable difference in number of search points required by the MD technique as compared to the full search techniques. The Signal-to Noise Ratio (SNR) of the reconstructed image with the original image is used as a quality measure to compare the performance of the mentioned Radon projection-matching criterion with that of the corresponding mean absolute difference criterion. Table 3 indicates the prediction performance for the conventional block-matching algorithms and proposed Motion Detection+3SS (MD+3SS) algorithm using MAD and RPM criterion.
A set of experiments have been done to compare the performance of proposed Radon projection matching criterion and well known mean absolute difference criterion for motion vector calculations. In our experiment for RPM we calculate the distortion for θ = 0, θ = 45 and θ = 90°. The proposed RPM criterion has better SNR performance than MAD criterion as indicated in Table 3. However, RPM has a more computational complexity as compared with MAD. To compare the computational complexity of the proposed approach of RPM for different algorithms the average time in seconds to process entire sequence is reported in Fig. 4.

DISCUSSION
For MAD criterion, Table 3 indicates that the performance of the proposed MD+3SS method degrades for the sequence of high motion content like Foreman, News and Salesman. Where as the proposed technique shows good performance for the sequence of moderate to slow motion sequences. It is seen that quality of the presented technique is close to the standard methods as FSA and 3SS with much more reduced complexity.     As seen in Fig. 4 the processing time of the FSA is extremely high for all benchmark sequences. For MD+3SS less time is required for Silent and Grandma Sequences as compared to 3SS. As these are sow motion sequences and most of the blocks are being categorized as Type 1 blocks and we can save intense computations required for motion vector estimation for these blocks. Where as time required for the proposed algorithm is more as compared to 3SS for Carphone, News and Foreman sequence as these are high motion sequence and most of the blocks are categorized as active blocks by MD module and hence require more time to process active blocks using the RPM block matching criterion.

CONCLUSION
Thus to conclude, this study propounds the motion detection technique which is presented and tested on a set of sequences. MD is an innovative method for motion estimation complexity reduction based on active motion detection. A classifier based on the frame correlation has been employed to detect active and inactive blocks. Additionally the threshold value in the algorithm is calculated automatically, which avoids user interaction. It has been experimentally proved that the majorities of the blocks within a frame are inactive and can be taken with zero motion vectors. As a result we saved measurable computational time required for motion vector estimation for these blocks. This study also introduced an efficient approach to block motion estimation using the Radon transform. Our criterion appears to compare favorably with existing MAD criterion with increase in computational complexity. We can avail the advantage of RPM by combining it with motion detection approach and compensate for the computational complexity. It should be noted that the projected criterion performs well for most of the video sequences and can be used with any conventional block-matching algorithm.