Radiometric Invariant Dense Disparity Estimation for Real Time Stereo Correspondence

: Computer stereo vision tries to mimic human vision by grabbing multiple views of the same scene and cognizing it. The stereo correspondence will find out the matching pixels between the two views based on the Lambertian criteria, which results in disparity. The distance of the objects from the camera can be calculated using this disparity. But in the real world scenario, this Lambertian assumption may not work always due to the radiometric variation between the image pairs and the conventional approaches results in erroneous disparity. In this work, for doing the radiometric invariant stereo matching, the simple local binary pattern is used. The correspondence is done by using semi global block matching method, which can handle the depth changes of curved surfaces and slanting surfaces by adding suitable penalty terms. The performance evaluation of the proposed shows lesser error rate in the range of 0.14% - 0.3883% and run time requirement of 0.20 milliseconds only. This radiometric invariant stereo correspondence attains accuracy as that of global method with run time speed as that of local method and is suitable for most of the real time stereo vision applications.


Introduction
In recent years, binocular stereo vision a has been playing a major role in computer vision applications such as robotic vision, medical imaging, autonomous vehicles and augmented reality. Stereo vision tries to achieve the abilities of human vision by electronically grabbing and cognizing the images (Szeliski, 2010). By measuring the difference in relative positions between the matching pixels, the stereo correspondence can infer depth of the objects from two views of a scene. The matching is done based on a cost function that gives the similarity measure between the conjugate pairs. The aggregation of this cost results in the Disparity Space Image (DSI). Using this DSI, the depth of objects from the camera can be calculated with the help of triangulation. The correspondence can be of either local methods or global methods . Often, the real time applications in computer vision faces the trade-off between speed and accuracy, hence they usually rely on local area based correspondence. The crux of stereo correspondence always lies in finding the similarity between two images under challenging conditions such as high dynamic range imaging (Park et al., 2017), radiometric variations, occlusions, textureless regions and repetitive patterns. The radiometric variations can be due to the change in illumination conditions or the camera exposure time variations. In such real world scenerio, the traditional approaches for the stereo correspondence will result in increased error. The common non-parametric approaches to tackle these radiometric variations are Census and rank transforms (Ramin and Woodfill, 1994) followed by a window based cost aggregation. Most of the approaches suggested by the stereo vision researchers to handle these kinds of challenges are done at high computational expense. In single view still image applications, a number of binary operators are used to handle non-uniform texture regions that arises from the sudden illumination changes. Of these the Local Binary Pattern (LBP) operator transforms these images captured under radiometric variations to an illumination invariant one in a better manner (Ojala et al., 2001). This work incorporates simpler and accurate LBP method for the radiometric invariant conversion of stereo images.
The cost aggregation strategies of local stereo correspondence are faster than the global approaches. Hence, local approaches are used for real time applications where the speed of the algorithm is the main concern. Common cost functions used for the correspondence search are Sum of Absolute Differences (SAD) (Hamzah and Hamzah, 2016), Sum of Squared Differences (SSD) (Marghany et al., 2011) and Normalized Cross Correlation (NCC) (Shen, 2011) or their adaptive versions (Yoon and Kweon, 2006) on both gray scale and RGB images. Of these, the correlation window approach, also known as area based approach assign the same disparity value for all the pixels within a widow. This approach is based on the planar surface assumption, i.e., all the pixels within a window are assumed to be at same distances from the camera. They can provide dense disparity map but fail to handle small depth changes such as slanted and curved surfaces. Semi-Global Methods (SGM) takes the advantages of global and local correspondences (Hirschmuller, 2005), which use the pixel wise matching as in global methods and can provide faster results as in local methods. Therefore the planar surface assumption used by the local correspondence search may not always works. Compared to global methods, the semi global matching can give faster results, which can meet the real-time requirements. Also semi global approaches always outperform the local methods in terms of its accuracy. By incorporating the window based approach to SGM method, the speed of the algorithm can be further increased. This work uses the faster window based approach of semi-global matching algorithm, termed as Semi-global block matching (SGBM) method. The algorithm can attain accuracy as in global methods without sacrificing the run time speed of local methods. Often, the real world scenes contains curved surfaces and slanting surfaces such as the edge of a bottle and the roof of a building. To handle the small and large depth discontinuities of such surfaces, smaller and larger penalty terms have been used to get accurate depth results. In this system, the LBP will transform the input stereo pair to radiometric invariant form. SGBM can handle curved and slanting surfaces and can provide accurate dense disparity map at a higher frame rate. Thus a new LBP-SGBM method is presented here for the accurate radiometric invariant stereo correspondence.
This work targets in the formulation of stereo vision system for autonomous robots, which is suitable for various real-time indoor and outdoor applications. This passive range finding system can handle curved surfaces such as a bottle and can be used for the pick and place robots. It can also be used as the sensor for the navigation of autonomous robots in unknown environments.

Related Work
Based on the application requirements, a large number of stereo correspondence approaches have been developed in the recent years. But most of the local stereo correspondence methods are done based on the assumption that the pair of input images are taken under same radiometric conditions. But in real world scenario, this may not happen. There can be radiometric variations due to illumination changes and camera exposure time variations. In such conditions, the conventional stereo correspondence approaches can give only inaccurate disparity results. To handle these challenges, a couple of pre-processing steps have been proposed by some researchers. Two non-parametric transforms are proposed for visual correspondence in (Ramin and Woodfill, 1994). Mutual information have been used for the radiometric invariant visual correspondence in (Mustafa and Kalkan, 2015), (Guanying et al., 2018). A contrast invariant local stereo correspondence employs different spatial frequency channels in (Ogale and Aloimonos, 2005). Binary descriptor-based line scan stereo correspondence is presented in (Valentín et al., 2017). ANCC have been proposed for radiometric invariant stereo matching (Heo et al., 2011). Combined census and adaptive window for handling radiometric changes is presented in (San and War, 2017). Census filtering and hamming distance based cost aggregation is proposed in (Sarika et al., 2015). Evaluation of stereo matching cost for varying radiometric images is introduced in (Hirschmuller and Scharstein, 2009;Stolc et al., 2016) proposes stochastic binary local descriptor for the stereo correspondence. LBP is used for vehicle detection in (Neumann et al., 2017). Often the real time applications rely on dense local correspondence methods as they can provide decent disparity results with higher frame rate. But they fail to handle small depth discontinuities as in curved surface and in slanting surface. Semi global methods use penalty terms for dealing with these depth discontinuities (Hirschmüller et al., 2012). Khomutenko et al. (2016) used direct fish eye stereo matching with semi-global approach. Disparity steps changes are handled by using penalty terms in (Taimoor and Afanasyev, 2018). The iterative SGM method uses neighbourhood disparity values to get the distance maps (Hermann and Klette, 2012). SGM with surface orientation priories is used for stereo matching (Daniel et al., Scharstein, 2017). Eunah et al. (2017) proposes a hierarchical stereo matching approach for low resolution images. Andreas et al. (2010) uses triangulation on a set of support points for robust stereo matching. Qingxiong (2012) presents an adaptive nonlocal cost aggregation approach for the stereo matching. Mei et al. (2013) proposes tree-based cost aggregation with segment tree construction algorithm (ST-1) and aggregation with enhanced segment tree algorithm (ST-2). Shenyong et al. (2017) presents a non-local stereo matching method with an initial cost and multiple weight.
Most of these approaches are memory intensive and their complexity will depend on the complexity of the scene. The main objective of this proposed system is the development of an accurate and radiometric invariant stereo correspondence that can give precise disparity maps at depth discontinuities such as curved surfaces and slanting surfaces without sacrificing the run time speed of local methods.

Features and Contributions
This is the prime stereo matching approach that exploits the advantages of LBP, local correspondence and global method. Contributions of this work include: • This method incorporates computationally simpler yet efficient local binary pattern for getting the radiometric invariant stereo matching. This highly discriminative operator will convert the input image to an illumination invariant one • For getting the accuracy and run time speed, this framework uses semi global block matching stereo correspondence This SGBM approach is not based on planar surface assumption as in conventional window based approaches. It uses the smaller and larger penalty terms for handling the depth discontinuities in the curved surfaces and slanting surfaces On error evaluation with different input images taken under different radiometric conditions, this fast and accurate stereo matching approach shows less error rate in the range of 0.14%-0.3883% and run time requirement of 0.20 milliseconds only. The performance analysis shows that, the algorithm gives stable accurate results. It is insensitive to various parameters such as selection of window size, choice of penalty functions and radiometric changes. This method is suitable for most of the range finding real time applications.

Radiometric Invariant Stereo Matching
The proposed radiometric invariant stereo matching consist of transformation of input image pair to LBP followed by SGBM method. Figure 1 shows the system overview.

Local Binary Patterns
The common form of Local binary pattern assigns values for the image pixels with in a 3x3 window, based on a threshold level. This operator can be used for different sizes (Ojala et al., 2002) by using circular neighbourhood of any radius. The LBP operator can be derived as follows.
Assume the gray scale image I G (X, Y) and let the g p represent the intensity value of a pixel at position (x, y). If g s represents the intensity value of sample points in equal radius r from the centre point C, then: Let us assume that the texture of the image I G (X, Y) with in an area is described by intensity distribution of C+I G (C >0): (4) If the difference between the gray level value of centre pixel and the neighbouring pixels are taken, then the gradient information will be retained: Consider that centre pixel is independent of these differences, then the Equation (5) becomes: The term t (g p ) is the intensity value at the centre point of I G (x, y) and it does not contain any information regarding the texture variations. But the second factor in the Equation (6) gives the texture variations in that area. But the extraction of this texture details from the image is a difficult task. To meet this, vector quantization was used by (Li et al., 2012). In this the features points are quantized to reduce the dimensionality. But in this type of quantization, though g sg p are invariant to mean gray level of image, this will not hold for other gray value changes. To achieve the invariance, only the signs of the differences are taken: The thresholding function: Taking the weighed sum of threshold differences, the modified LBP operator can be written as: The local binary pattern operator transforms an image into an image of integer labels, describing small-scale appearance of the image. Figure 2 shows the formation of LBP codes centered on a pixel for a 3×3 fragment of an image. Here the intensity level of centre pixel is compared with its eight neighbourhood ones, in clockwise direction from top-left corner of the block. The encoding is done based on the Equation (8). The resultant pattern is shown in Fig. 2b, in which the smaller and higher values compared to the centre pixels are encoded as 0 and 1 respectively. The weight assigned to each pixel in the manner of 2 S is shown in Fig. 2c. This block level operation can be completed for entire image using sliding window method.

Semi-Global Block Matching Algorithm
SGBM is the faster window based approach of SGM. It is performed by taking the mean data from each block as pixel energy. In this method the cost arrays are formed by arranging the cost values corresponding to minimum disparity value to the maximum disparity value for the entire pixels. The matching cost is based on (Hernandez-Juarez, 2016;Birchfield and Tomasi, 1999). A search is carried out in five directions for the minimum path cost based on the pixel energy. Penalties are added for the disparity differences with the neighbouring pixels along the path. The cost minimum corresponds to the best matching pixels and the disparity value is computed for this minimum cost value.
The algorithm is insensitive to various parameters such as selection of window size, choice of penalty terms and radiometric changes. The streaking effect that happens in global method has been eliminated in SGBM by symmetrically computing the matching cost through multiple paths instead of single scan line as shown in Fig. 3. The path accumulation along the direction 's' is given by L's (o, d): The first term, C (s, d) is the cost function for the similarity measure, which can be calculated as: Here the I L and I R are the left and right input images. The difference in intensity level between the two neighbouring pixels of the left image is given as: The second term in the Equation (10) gives the regularity of the disparity field. The cost value of current pixel, C(o, d) is added to the previous pixel cost, along the direction 's' will give the final cost for finding the disparity value. The following cost values are used for this: • The cost at the pixel with disparity values d−1 and d + 1 with smaller penalty term P 1 (for handling smaller disparity steps) gets added • The cost at the previous pixel with disparity values less than d−1 and greater than d + 1 with higher penalty term P 2 (for handling larger disparity steps) gets added • (iii) The minimum value of the previous pixel is subtracted in order to limit the increasing value of Ls (p, d) along the path. The maximum value of The penalty term P 1 is added for dealing with small disparity changes such as curved surfaces and slanted surfaces. Term P 2 is added for higher disparity changes such as depth discontinuities at object boundaries. On aggregation of the cost value in all 5 directions will give the final disparity map. The aggregated cost for the pixel p with disparity d in the direction of s is given as: Semi global method targets to minimize the global energy function E G , for the disparity space image, D: Here o and q are pixel indices in the image and N o is the number of pixels in the neighbourhood of the pixel o. C T (o, D o ) is the pixel-wise cost over the entire image with disparity D o . P 1 is the penalty for small changes in disparity value, where the change in disparity to the neighboring pixel is not more than one pixel. P 2 is the penalty term for higher disparity changes to neighbouring pixels.
In block based stereo matching, a window of desired size centred at a pixel of interest is taken from the reference image. The similarity check is done using this window to find its conjugate pair in the target image. Small correlation windows are preferred for getting faster results and can preserve the sharp object boundaries. But they lack in accuracy. On the other hand

Parameters used for SGBM Implementation
(a) minDisparity -It is the minimum value assigned to the disparity levels. Normally it is taken as zero. If there is any pixel shift occurring after the rectification, a minimum disparity is to be added to the obtained disparity in order to compensate for this shift (b) numDisparity -Numdisparity shows the number of disparity levels in the obtained disparity map. The difference between maximum and minimum disparity value gives the Numdisparity (c) Block Size -It is the size of the window used for stereo matching. It should be of sizes 3, 5, 7…. to 15 (d) P 1 -To handle small depth discontinuities such as slanted surfaces and curved surfaces, small penalty term P1 is used (e) P 2 -To handle higher depth discontinuities such as object boundaries, higher penalty term P 2 is used. The value of P 2 will be always larger than the value of P 1 (f) disp12MaxDiff -Maximum permissible difference in the left-right disparity check. To disable check, this value is set as a non-positive one (g) preFilterCap -It is the clipping level of the prefiltered image. The algorithm first takes the xderivative of each pixel value and clips off it within the interval [-preFilterCap, +preFilterCap]. Cost computation is done for these filtered values (h) uniquenessRatio-It is the percentage margin allotted for the cost minimum. Cost minimum is taken based on "winner-takes-all" optimisation strategy for a block of preferred size. The margin for the cost variation is in between 5-15 (i) speckleWindowSize -It is the maximum size of smooth disparity regions, while considering the speckle noise in that region. To disable speckle filtering, set it as 0. Otherwise, set the size in 50-200 range (j) speckleRange -It gives the maximum variation of disparity levels within each connected components

Result and Analysis
The algorithm is implemented and tested in Open CV 2.1.12 on Intel Core i3 processor with 2.10 GHz clock frequency and 4GB of RAM. Rectified stereo images from Middlebury data sets are used as input.
(http://vision.middlebury.edu/stereo/data/). The algorithm has been tested for different stereo input image pairs, with variable window sizes varying from 3,5,7 up to 15.

Performance Evaluation
For assessing the performance of the proposed method, it has been evaluated based on the accuracy, amount of bad matching pixel  and the execution time. Analysis is done based on ground truth disparity map. The Root Mean Square Error (RMSE) in the obtained disparity can be calculated using the ground truth disparity map (Scharstein and Szeliski, 2003) based on the Equation (15): Here 'N' is the number of pixels in disparity maps. The number of bad matching pixels in the computed disparity result can be calculated using the Equation (16): where, δ d is the threshold level of the bad matching pixels. Here it is taken as one.
The evaluation of the proposed method is carried by using different stereo image pairs taken with varying illumination condition and exposure time. Figure 4a-d shows the input image, ground truth disparity, LBP image and the disparity map results obtained for LBP-SGBM. Figure 5 shows the results of LBP-SGBM stereo correspondence on image pair Art with varying (left illumination/right illumination). Illumination combinations used for the correspondence search are (1/1), (1/2), (1/3), (2/2), (2/3) and (3/3). The error evaluation shows a minimum error of 0.175167% and a maximum error of 0.3083% in the result of image Art taken under illumination 1 and Reindeer taken under illumination 3. Table 1 shows the percentage of bad matching pixels. Figure 7 shows the RMS Error in the obtained disparity map of different input images, Aloe, Baby Cones, Bowling, Wood and Tsukuba with varying window sizes. Table 2 and 3 shows the performance evaluation based on RMS error and bad matching pixels for image pairs taken with different exposure time. In this the image Art shows a minimum error of 0.198067% under illumination 2 and image Reindeer shows a maximum error of 0.3883% under illumination 3. Table 4 shows the amount of bad matching pixels in image pairs Aloe, Art, Mobies and Dolls with varying (left illumination/right illumination). Illumination combinations used are (1/1), (1/2), (1/3), (2/2), (2/3) and (3/3).           Ramin and Woodfill (1994) used non-parametric approach for stereo matching. Eunah et al. (2017) proposes a hierarchical stereo matching approach in twoscale space for low resolution images. Andreas et al. (2010) uses triangulation on a set of support points for robust stereo matching. (Qingxiong, 2012) presents an adaptive Non-Local cost aggregation approach on a tree structure for the stereo matching. Mei et al. (2013) proposes a tree-based cost aggregation with segment tree construction algorithm (ST-1) and aggregation with enhanced Segment tree algorithm (ST-2). Shenyong presents a nonlocal stereo matching method with an initial cost and multiple weight. This proposed method uses local binary pattern for radiometric invariance and semiglobal block matching for handling the curved surfaces and slating regions with the real time speed. From the table, it can be seen that this approach gives accurate results compared to the other ones.

Conclusion and Future Scope
In this work we combined the run time efficiency of local stereo correspondence and the accuracy of global correspondence along with the radiometric invariant performance of local binary pattern. Other than the existing window based methods that are based on planar surface assumption, this method can handle the curved and slanting surfaces. Here smaller and higher penalty terms are added to handle small and larger disparity changes. The performance evaluation shows lesser error rate in the range of 0.14%-0.3883% and the run time requirement of 0.20 milliseconds only. This work targets in the formulation of stereo vision system for autonomous robots, which is suitable for various real-time indoor and outdoor applications. This passive range finding system can handle curved surfaces such as a bottle and can be used for the pick and place robots. It can also be used as the sensor for the navigation of autonomous robots in unknown environments.
Also this system can be used as robotic vision system for precise assembly in automotive industry. This binocular stereo vision can be extended to multi view stereo for view synthesis and panoramic vision. Another application of our stereo matching system is in aerial imaging for finding the range of buildings from an air vehicle.
Most of the real time applications in the area of stereo vision demand algorithms with higher frame rate without compromising its accuracy. As the robots started to take over hard and risky tasks in many aspects of our modern life, incorporation of the latest outbreaks in neuroscience findings in computer vision will help in solving the open issues in intelligent robot vision. Future scope of this work targets on the incorporation of deep learning in our system for getting the cognitive capabilities of human vision to deal with the occluded areas and high dynamic range views. Such an intelligent vision system will be of great use in the indoor and outdoor applications such as rescue robots and industrial robots.