Pan Correction through Overlap Estimation in a Multi-Camera Environment

: Problem statement: Multiple cameras are employed for surveillance of larger environment. In such a case there is a need to maintain overlap in the adjacent cameras for correct object registration. Overlap may get disturbed by natural or manual factors. This study proposed an automatic camera pan correction by determining the area of overlap from multi-view images. Approach: A closed loop system which used feature extraction using SIFT, feature matching using descriptor ratio method and Mean Absolute Error (MAE) over Gaussian scale space, followed by overlap estimation is implemented for restoring the camera position. Results: The proposed method was experimented with the datasets acquired in the environment where surveillance involves two cameras. The matched points of the two images were used to calculate the overlap percentage. The overlap percentage estimated by the surveillance server was communicated to the pan controller to re-orient the camera to its original position. Conclusion: The proposed algorithm identified the robust and distinctive features that are invariant to translation, rotation and scaling. These features help in the accurate estimation of overlap percentage, which is further used to automatically correct the pan of the camera.


INTRODUCTION
In multiple images the location of common objects must be correctly registered, as this is the first step to many computer vision systems. Overlap in the Field Of View (FOV) of multiple cameras is needed for correct registering and labeling of an object in an environment monitored by a system with multiple cameras. Change in overlap in FOV due to camera pan will affect the correct working of such systems. A method automatically performing pan correction still remains one of the field's biggest problems. A primary use of information available in initial overlap is the basis for estimating camera pan and its correction. A common approach to the problem is to first detect a number of distinguishable features independently in each image and then to determine which features originate from the same region in the scene. Shape dependent features by edge detection  can be used, but discarding the poor edges is not easier. The image can be segmented into the regions and segmented regions can be used as features for matching (Mittal and Davis, 2001). Invariant moments and point based approaches are also used. But the earlier works show that the points extracted by SIFT algorithm gives better features . Scale Invariant Feature Transform (SIFT) detects features over varied scales of an image (Lowe, 1999;2004). RANSAC can be used to reduce the iterative processing to find points (Chen et al., 1998). An algorithm to estimate Pan, Tilt and Zoom (PTZ) parameters of a PTZ camera are obtained from Frame-2-Frame (F2F) correspondences at different sampling rates for a realtime video surveillance and automatic object tracking system. PTZ estimation accuracy is maintained as long as its error is small (Wu et al., 2006). A wide area surveillance system that detects tracks and classifies moving objects across multiple cameras has been presented (Javed et al., 2003).
Image variance density is used to optimally estimate camera pan and tilt values by incrementally refining image registration using overlapping images from prior frames. Performance of this algorithm is reduced with increase in noise and fast moving objects (Song et al., 2006). A method of automatically performing the registration of two range images that have significant overlap is discussed (Roth, 1999). In the study (Trajkovic, 2002), the camera position and orientation is determined by pointing the camera at several points in the area which is at the height of the camera and by applying the linear algorithm. The estimation of camera position and orientation is done only if the height of the camera and the initial points assumed are known. In the proposed study, feature matching is done through two steps: First being matching the SIFT descriptors by using descriptor ratio method followed by matching using correlation in scale space. In this study, attention is paid to the correction of camera movement in the situation when one of the cameras is moved due to an unknown cause. This makes the change in FOV of a camera which is depicted in the Fig. 1.

MATERIALS AND METHODS
This study proposes a method to automatically detect the camera pan due to external disturbances and restores the camera to its original position. The proposed method has four stages viz. Feature Extraction, Feature Matching, Overlap Estimation followed by final stage of camera pan correction. The overall flow of the proposed method is depicted in Fig. 2.
Feature extraction: Feature points are extracted from an image using SIFT algorithm proposed by Lowe (2004;. The SIFT consists of four steps namely Detection of scale space extrema, Local extrema detection, Orientation assignment and Keypoint Descriptor formation which is explained below. Building Gaussian scale space: Gaussian Scale space L(x, y, σ) is constructed by convolving an initial image I(x, y) with Gaussians of different variance G(x, y, σ) which is expressed in Eq. 1. This contributes an octave. Next octave of Gaussian scale space is built from the initial image by down sampling it by a factor of 2 and this process is repeated. Adjacent Gaussian images are where, ⊗ is the convolution operation in x and y and: Local extrema detection: Keypoints are identified as local maxima or minima of the DoG images across scales. Each pixel in the DoG images is compared to its 26 neighbors in 3×3 regions at the current and adjacent scales. The keypoints with low contrast points and in edge responses are excluded.
Orientation assignment: For each image sample L(x, y), the gradient magnitude m(x, y) and orientation θ(x, y) is computed using the pixel differences as shown below in Eq. 4 and 5. The key point orientation is determined from a gradient orientation histogram. This histogram is computed from the contribution of each neighboring pixel which is weighted by the gradient magnitude and a Gaussian window with a 'σ' that is 1.5 time the scale of the key point. The peaks in the histogram which is greater than 80% of the maximum peak value are considered for feature extraction: 2 2 (L(x 1, y) L(x 1, y)) m(x, y) (L(x, y 1) L(x, y 1))

Key point descriptor:
A keypoint descriptor is formed using the gradient magnitude and orientation at each image sample point in a region around the keypoint location. The 128 element feature vector of a keypoint is obtained by 4×4 location descriptor with 8 orientation bins each. These descriptors are normalized to make it illumination invariant as normalization reduces the brightness and contrast changes. The feature points that are extracted through SIFT are robust and invariant to translation, rotation, scale and affine transforms.
Feature matching: Feature matching phase comprises of Descriptor Ratio method using the SIFT feature descriptors followed by Correlation based matching in the varied scales of image as shown in Fig. 3.
Descriptor ratio method: The descriptors obtained from SIFT are used for feature matching by descriptor ratio method. Dot product of descriptors of each keypoint in Camera 1 (Left) image with descriptors of all keypoints in Camera 2 (Right) image is computed and then inverse cosine is taken: Where: * = The dot operator D i and D j = Keypoint descriptors in camera 1 and 2 images each of size 1×128 'n' = The number of keypoints in camera 2 image This approach is computationally cheaper and effective. The resulting angle values are sorted into a vector. The ratio of least value of angle vector 'c' and its successor value 'd' are compared with the threshold 'Th' (ranging from 0.1-0.9) for feature matching which is expressed below. This procedure is repeated as many times as the number of keypoints in camera 1 image: Where: i = 1,2,3…M k = 1,2,3…N l = 1,2,3…N Error 'e' is calculated for all the points in all the scale pairs of 'k' and 'l'. For each matched point the scale pair ('k' and 'l') producing the minimum error is detected.
For 'M' feature points, there will be 'M' pair of scales that produces the minimum error. Now the pair of scales that produces the most number of minimum errors among the 'M' pairs is detected and is considered to be the closest scale between the camera 1 and 2 image. Let 'CsL' and 'CsR' be the closest scales of reference image and sensed image. After finding the closest pair of scales, the mean of errors M e for all the feature point pair in the closest scale pair is computed as given in Eq. 9: The pair of M points in closest scale pair, having their error less than the mean error is considered to be set of final matched points (m): Only the true feature points that are located in common FOV of both cameras gets matched. Those points that are located outside the overlapping regions are classified as mismatches and are rejected from further processing.

Overlap estimation:
The points that are deduced as matched feature points are used to determine the percentage of overlap. The region where the feature points get matched in both Camera 1 (Left-L) and 2 (Right-R) images is the overlap region (L∩R) as shown in Fig. 5. The overlap percentage in both Camera views is the percentage composition of (L∩R) in both the images considered together: Pan correction control: The camera pan (unwanted movement due to external forces) is corrected by the change in the percentage of overlap as shown in Fig. 6. An electrical signal is generated based on the change in overlap percentage and is sent to the external hardware through the parallel port. External hardware senses the electrical signal and runs a motor to correct the position of camera.

Change in overlap:
The initial overlap between the Camera 1 and 2 images is IOP. If the camera undergoes a pan then the two cameras have a new overlap percentage of NOP. The deviation of the overlap percentage is said to be the error ξ in overlap: Let R mat (X) and P mat (X) be the x-coordinates of any matched feature point in reference image and panned image. The direction of pan of camera can be deduced by calculating the difference between the coordinate values: An LUT (Look up Table 1) has been formulated for this purpose. The look up Table 1 matches the possible error overlaps with the corresponding time period needed to regain the camera back to its original position in such a way that the error gets nullified. Once the error gets nullified, it is said that the two cameras cover a particular region with the predefined overlap.   Signal transmission through parallel port: The parallel port cable DB25 is connected to the parallel port of the controller. The Control signal to drive the external hardware circuitry is sent through the parallel port of computer.
External hardware setup: External hardware setup consists of a parallel port data cable which connects the parallel port of the computer system to the hardware setup and the Stepper Motor. The stepper motor attaches the camera for controlling the panning mechanism. The error signal is sent through the parallel port to the hardware setup. The setup buffers and amplifies the signal to drive the stepper motor to correct the camera pan. The signal is sent through the 2nd, 3rd, 4th and 5th pin of the parallel port of the computer. The hardware setup consists of a buffer 74HC244 which receives the data from the computer at 2nd, 4th, 6th and 8th pins. The buffered data are sent through 13th, 15th, 17th and 19th pins. The buffered data is then fed as input to the SLA4061. The SLA4061 amplifies the signal and then the amplified signal is used to control the stepper motor. The hardware setup is shown in the Fig. 7.

RESULTS
The proposed method is experimented in the environment where two cameras (Camera 1 and 2) are employed to acquire a scene. The images obtained from these cameras have a constant overlap percentage in their FOV. The scene captured by the camera 1 and 2 is shown in Fig. 8a and b. The overlap percentage between these two images is estimated as per the proposed algorithm. For experimentation purpose, the camera 2 is subjected to have a change in its pan. This disturbance in pan makes the FOV of the camera 2 to change. Hence this change in FOV of the camera results in the deviation of its overlap percentage from the initial stage. The FOV of the panned camera 2 is shown in Fig. 8c.

DISCUSSION
Feature points are extracted separately in the camera 1, 2 and panned camera 2 images using the SIFT algorithm. The extracted points from these images are matched with one another using descriptor ratio method and correlation ratio method. The matching is done between the image pairs of reference Camera 1 and 2 image (pair1), Camera 1 and panned Camera 2 image (pair2) and reference Camera 2 image and panned Camera 2 images (pair3). The final matched points are shown in Fig. 9a-f as white colored '*' marks along with their corresponding red colored labels. The region of the image where the feature points get matched is considered as the overlapped region. The percentage of overlap in image pairs is computed separately. As the camera 2 has panned away from its original position, NOP will be lesser than IOP. The overlapped regions between the images are estimated as shown in Fig. 10a-f and their values are shown in Table 1. The coordinate values of the matched feature between the reference camera 2 image and panned camera 2 image are used to detect the direction of pan. The reduction in percentage of overlap is calculated and is used to generate the electrical signal to control the stepper motor. In the Fig. 11, the region of overlap that got reduced is indicated by red region. The blue region indicates the new overlap region. The common area of overlap in reference image of camera 2 and panned image is indicated by white colored vertical lines. The direction of pan correction is estimated from the overlap percentage. And finally the electrical signal is transmitted to the external hardware setup to correct the camera pan. The proposed algorithm is tested for the video data taken using two web cameras simultaneously in Digital Signal Processing Lab in Electronics and Communication Department. Acquired video is converted to AVI format for simulation purpose. The resolution of the video is 320×240 and the frame rate is 15 frames sec −1 . The Fig. 12a and b show the first frame captured using two cameras namely Camera 1 and 2. The initial overlapping region in the camera views is determined by the proposed algorithm as shown in Fig. 12e and f. This overlapping region of initial frame of a video can be used for correct labeling of an object in view of multiple cameras for video tracking.

CONCLUSION
The proposed work aims at automatically correcting the camera pan by estimating the percentage of overlap between adjacent camera images. The algorithm implemented in this study has the advantage of identifying the robust and distinctive features that are invariant to translation, rotation and scaling. Descriptor ratio method used for feature matching reduces the computation complexity. The issue due to different zooms of camera is resolved by using Correlation over scale space. The overlapping error is used to correct the panned camera orientation. Extensive simulations were carried out on a variety of images and the proposed algorithm was found to perform with accuracy.