Single step optimal block matched motion estimation with motion vectors having arbitrary pixel precisions

This paper proposes a non-linear block matched motion model with motion vectors having arbitrary pixel precisions. The optimal motion vector which minimizes the mean square error is solved analytically in a single step. Our proposed algorithm can be regarded as a generalization of conventional half pixel search algorithms and quarter pixel search algorithms because our proposed algorithm could achieve motion vectors with arbitrary pixel precisions. Also, the computational effort of our proposed algorithm is lower than that of conventional quarter pixel search algorithms because our proposed algorithm could achieve motion vectors in a single step.


INTRODUCTION
Motion estimations play an important role in motion tracking applications, such as in a respiratory motion tracking application [1] and in a facial motion tracking application [2]. The most common motion estimation algorithm is the block matched motion estimation algorithm [3]. The curr ent frame is usually partitioned into numbers of macro blocks with fIxed or variable sizes. Each macro block in the current frame is compared with a number of macro blocks in the reference frame translated within a search window. Block matching errors are calculated based on a predefmed cost function. The macro block in the reference frame that gives the minimum block matching error is considered as the best approximation of the macro block in the current frame. Each macro block in the current frame is represented by the best macro block in the reference frame, the motion vector (the motion vector is the vector representing the translation of the macro block in the reference frame.) and the residue (the residue is the difference between the macro block in the curr ent frame and the best translated macro block in the reference frame).
The most common block matched motion estimation algorithm is the full integer pixel search algorithm. The full integer pixel search algorithm is a centre based algorithm in which all integer pixel locations in the search window are examined. However, the motion vectors are not necessarily represented by integer pixel precisions and a large portion of macro blocks in the curr ent frame are best approximated by the macro blocks in the reference frame translated within a plus or a minus one pixel range 978-1-86135-369-6/10/$25.00 ©2010 IEEE around integer pixel locations. Hence, block matching errors could be further reduced if motion vectors are represented by non-integer pixel precisions. Conventional non-integer pixel search algorithms start searching pixels at half pixel locations. Half pixels are interpolated by nearby pixels at integer pixel locations. Block matching errors at some or all half pixel locations are evaluated. The half pixel location with the minimum block matching error is chosen. Similarly, quarter pixels are interpolated by nearby pixels at half pixel and integer pixel locations. The quarter pixel location with the minimum block matching error is chosen. Finer pixel locations could be evaluated successively. Since the block matching errors at fIner pixel locations are evaluated via interpolations from the coarser pixel locations, if motion vectors with very fIne pixel precisions are required, then many pixel locations are required to be evaluated. Hence, computational efforts of these algorithms are very heavy and these algorithms are very ineffIcient. Also, existing pixel search algorithms could only achieve motion vectors with rational pixel precisions. If the true motion vector is with an irrational pixel precision, then an infmite number of pixel locations have to be evaluated.
Interpolations are implemented via some predefmed functions, such as a real valued quadratic function with two variables [4], a paraboloid function [5] and a straight line [6]. As the block matching error is a highly non-linear and non-convex function of the motion vector, it is very diffIcult to solve the motion vector that globally minimizes the block matching error. Hence, many pixel locations are still required to be evaluated and the pixel location with the lowest block matching error is chosen. Similar to conventional quarter pixel search algorithms, computational efforts of these algorithms are still very heavy and these algorithms are still very ineffIcient. Also, if the true motion vector is with an irrational pixel precision, then an infInite number of pixel locations still have to be evaluated.
In this paper, we propose a non-linear block matched motion model with motion vectors having arbitrary pixel precisions. The optimal motion vector which minimizes the mean square error is solved analytically in a single step. Our proposed algorithm has the following salient features. 1) The block matching error is evaluated in a single step which globally minimizes the mean square error. As the calculation of the mean square error at a fine pixel location is not derived from the coarser pixel locations, the computational effort of our proposed algorithm is much lower than that of conventional quarter pixel search algorithms. 2) Our proposed algorithm could achieve the true motion vector even though the true motion vector is with an irrational pixel precision. Computer numerical simulations show that the mean square errors of various video sequences based on our proposed algorithm are lower than that based on conventional half pixel search algorithms and quarter pixel search algorithms. The rest part of this paper is organized as follows. Section II describes our proposed non-linear block matched motion model. Section III derives analytically the optimal motion vector which minimizes the mean square error. Computer numerical simulations are presented in Section IV. Finally, a conclusion is drawn in Section V.

v=,
Zk ,l ,p Z k ,l ,P = 0 , then we do not consider that the global minimum is on the boundary qk = 1 V Pk E [0 ,1 ]. For these Similarly, Vk E Z+ , denote the set of motion vectors corresponding to the stationary points of MS EfR( Pk ,q k )' MS E f (Pk ,q k ) and MS E ! / (Pk ,q k ) (including the point ( 0,0 ) ) as F k uR, F /L and F /R, respectively. The algorithm for finding the globally optimal motion vector can be summarized as follow:

Algorithm
Step 1: Implement an existing full integer pixel search algorithm so that (PO ,k ,q o ,k ) is obtained Vk E Z+ .
Step 2: Vk E Z+ , evaluate Fk uL , Fk uR , Fk u and Fk LR • Step 3 : Vk E Z+ , evaluate arg{ min UL MS Ef L (Pk ,q k )}' (Pk,q,)EFk arg{ min Pk ,q k =arg arg{ min LL MS Et L (Pk ,q k )} ' (p"q,)EF, Vk E Z+ , take (p; ,q ; ) as the globally optimal motion vector of B k • Since the global minimum of the mean square error is not necessarily located at rational pixel locations, while the full integer pixel search, full half pixel search and full quarter pixel search algorithms only evaluate at rational pixel locations, the mean square errors based on these conventional methods are very large and these conventional methods are very ineffective. On the other hand, our proposed method guarantee to fmd the motion vector that globally minimizes the mean square error no matter the motion vector is located at either rational pixel locations or irrational pixel locations. Hence, our proposed method is more effective that conventional methods. Besides, as integer pixel locations, half pixel locations and quarter pixel locations are particular locations represented by our proposed model, the mean square error based on our proposed method is guaranteed to be lower than that based on these conventional methods.
The computational effort of our proposed algorithm can be analyzed as follows. As the orders of the polynomials in (1), (2) and ( 3) are 5, 4 and 2, respectively, 0:::; Mf L :::; 5 V k E Z+ . Hence, V k E Z+ , if Mf L ;:: 1 , then the maximum evaluation points of our proposed method are less than or equal to 2 1. Vk E Z+ , if Mf L = 0 , as the maximum number of points in � U L is 5, the maximum evaluation points of our proposed method are less than or equal to 17. For full quarter pixel search algorithms, there are 25 evaluation points. Hence, the total number of evaluation points of our proposed method is lower than that of full quarter pixel search algorithms. As conventional block matched motion estimation algorithms evaluate block matching errors from coarse pixel locations to fme pixel locations, the computational efforts grow exponentially as the pixel precisions get fmer and fmer. From this point of view, the conventional methods are very inefficient. On the other hand, our proposed method does not require searching from the coarse pixel locations to the fme pixel locations. Our proposed method is more efficient than the conventional methods particularly when the required pixel precision is higher than or equal to the quarter pixel precisions.

IV. SIMULATION RESULTS
In order to have complete investigations, video sequences with fast motion, medium motion and slow motion are studied. The video sequences, Foreman, Coastguard and Container [7], are, respectively, the most common fast motion, medium motion and slow motion video sequences. Hence, motion estimations are performed to these video sequences. Except the first frame of these video sequences, the mean square errors of all the frames of these video sequences are evaluated. Each curr ent frame takes its immediate predecessor as the reference frame. The sizes of the marco blocks are chosen as 8 x 8 and 16 x 16 and the sizes of the search windows are chosen as 32 and 40, which are the most common block sizes and window sizes used in international standards. The comparisons are made with the full integer pixel search algorithm, the full half pixel search algorithm and the full quarter pixel search algorithm.
The mean square error performances of our proposed method, the full integer pixel search algorithm, the full half pixel search algorithm and the full quarter pixel search algorithm with the size of the marco blocks 8 x 8 and the size of the search windows 32 applied to the video sequences Coastguard, Container and Foreman are shown in Figure  It can be seen from the Figure 1 that the improvements on the average mean square errors of the full half pixel search algorithm, the full quarter pixel search algorithm and our proposed method over the full integer search algorithm for the video sequences Coastguard are 1.4894xlO-4 , 2.2242xl0-4 and 2.7294xlO-4 , respectively, which correspond to 17.8531% , 28.8039% and 37.5835% , respectively, that for the video sequences Container are 1.4406 x 10-6 , 3.6476 X 10-6 and 2.0374xlO-5 , respectively, which correspond to 1.0115%, 4.4170% and 32.3070%, respectively, and that for the video sequences Foreman are 1.5788xl0-4 2.2863 X 10-4 and 2.5897 x 10-4 , respectively, which correspond to 24.7674% , 39.1977% and 46.4394% , respectively. Similar results are obtained for different size of marco blocks and different size of the search windows. Figure 2 shows the improvements on the average mean square errors of various algorithms with the size of the marco blocks 16x16 and the size of the search windows 40 applied to the same set of video sequences. The improvements on the average mean square errors of the full half pixel search algorithm, the full quarter pixel search algorithm and our proposed method over the full integer search algorithm for the video sequences Coastguard are 1. 7838 x 10-4 , 2.5650 X 10-4 and 3.0888 x 10-4 , respectively, which correspond to 18.4666% , 27.6579% and 34.6995% , respectively, that for the video sequences Container are 1.8757 x 1 0-6 , 2.5444xl0-6 and 1.8031xl0-5 , respectively, which correspond to 0.7710% , 1.5106% and 26.9046% , respectively, and that for the video sequences Foreman are 2.1073xl0-4 2.9528xlO-4 and 3.3051xlO-4 respectively, which correspond to 21.6021%, 34.2148% and 40.4725% , respectively. From the above computer numerical simulations, it can be concluded that the mean square error performances of our proposed method are always better than the full integer pixel search algorithm, the full half pixel search algorithm and the full quarter pixel search algorithm for all of the above three video sequences. In particular, for slow motion video sequences, such as the video sequence Container, our proposed