Metric for the Fusion of Synthetic and Real Imagery from Multimodal Sensors

: We described a method to improve night vision for vehicle navigation that combined images from a thermal infrared camera and from a public database of stored images. Such an approach allows a night scene to appear as if it were daytime for automotive applications thereby increasing safety. We described a new metric to evaluate the fusion of such an augmented reality system and compared leading fusion algorithms to determine the efficacy of our approach.


Introduction
Darkness and poor visibility at night limit what a vehicle operator can detect causing an unsafe navigation condition. Ideally, a system is needed that can make low-visibility situations like driving at night appear as if they were in daytime, as such a system would provide a particularly effective approach for vehicle navigation in darkness.
There already exist a variety of systems to aid vehicle navigation in low-visibility situations. Thermal imagers have been reported to identify pedestrians or animals at night (Li et al., 2010;Geronimo et al., 2010, George et al., 2012. Warm objects can be displayed on a supplemental screen and even segmented and identified. Augmented systems can be used to help identify lane markings where visibility is limited. However, although these systems can be useful, they focus on specific situations and a more general approach is needed that can be used for a variety of conditions. Infrared or thermal cameras have often been used in night vision systems, but they provide limited data about a scene. Combining images from additional sensors can improve results. A range of night vision systems have been developed to improve the ability to see at night (Bishop, 2000;Vu et al., 2012;Bhatnagar et al., 2011). However, problems still remain. Thermal infrared cameras are not particularly good at capturing some elements of a scene such as trees, leaves and grass in a natural landscape. These are not visible at night with visible-light sensors, so adding additional sensors has been one traditional approach to this problem.
In contrast to these subsequent approaches, we used stored images of a desired scene acquired in daytime to augment the same scene acquired at night in real-time. The public database Google Earth is a ready asset that is being continuously updated by multiple sources. It contains daytime images of a large number of scenes from around the world. Rather than use additional sensors, our approach combines sensor data acquired from a thermal camera with the stored daytime image from Google Earth. In this way, we augment a real-time scene with the image of that scene from the database. Such an idea has been previously proposed but involved color mapping rather than image fusion (Qadir et al., 2014). Because the database does not provide real-time information, we prioritized the fusion of information between the real-time and database data using a metric. In the next sections, we describe our system followed by a description of our metric. We then compare two different scenes with and without a pedestrian with different fusion algorithms.

System
As previously stated, we created a system that uses a stored database of visible daytime images to enhance thermal imagery captured at night. Basically, information from the stored database of images would be used when there is little or no thermal signal. This could improve safety by identifying objects or features that are not visible to the naked eye or thermal camera.
The system inputs are from the thermal camera and a GSP sensor; the visible display is the output as illustrated in a block diagram of the system shown in Fig. 1. The camera's input is that of a night scene and the GPS sensor provides the vehicle coordinates. These inputs constitute two of the three inputs to the computer. Based on the GPS coordinates, a wireless request is sent to a public database that also includes camera angle information and it returns a visible daytime image of the scene corresponding to the thermal camera image (Salmen et al., 2012). The output of the system is the fusion of the registered thermal and database images. The process is then repeated based on the difference in location of the previously requested image.

Metric Selection
Most image fusion metrics were designed to consider imagery from sources of the same scene taken at the same time. Our application demands that one image has priority over the other because the thermal image contains live data whereas the database image is used to enhance the thermal imagery. Although image fusion metrics can be adjusted to weight one image when compared to another, we introduce a ratio that gives priority of the thermal image over the visible image. We show that our ratio is more definitive in terms of information than a weighting factor and illustrate the significance of the metric with two examples.
We initially computed the Mutual Information (MI) between the each source image where MI X and MI Y refer to the mutual information of the thermal and database images respectively. We can write our metric as: where, β is a fixed value. The metric uses a constant ratio of the two values of mutual information in forming the sum of two values. We used a simple fusion algorithm to examine the effect of the ratio β. In our particular case, the thermal image is represented by X and the database image by Y. The fusion algorithm uses pixels values from either one image or the other depending on a threshold. The fused image was represented by F and described as Equation 2: (2) on a pixel-by-pixel basis, where T is a threshold.

Results
We used two different scenes to gather results and acquired thermal imagery with and then without a pedestrian present in each case. The database images were acquired from the Google Street View database. We compared three different fusion algorithms, as well as the simple one algorithm described in Equation 1. The additional fusion algorithms were based upon a Laplacian Pyramid (LAP), Shift Invariant Discrete Wavelet Transform (SIDWT) and Principal Component Analysis (PCA) methods. Initially, the two images were combined into an intermediate result R using a fusion algorithm. Then, the final result was described as Equation 3: where, T 1 and T 2 are thresholds. The idea here is that the data is combined when data from both sensors are similar. Therefore, the results from the fused image R were applied when pixel values of the input images were between two thresholds. The thermal and database images are shown in Fig. 2. We examined results for β ~0.2, 0.9 and 1.5. These values corresponded to when the database image had priority, the two source images had about equal priority and when the thermal image had priority, respectively. To obtain these values of β, the threshold T was simply changed until that desired value was obtained.

Database Image Dominant
A value of β ~0.2 meant that the MI between the resulting fused image and the database image is a factor of five greater than the MI between the thermal and fused images. For this value of β, the fused image will most likely be dominated by the visible image except for in relatively warm areas of the thermal image. Figure 3 and 4 showed the results of the four different algorithms for the two different scenes. Table 1 and 3 show the values of the MI for these images, the total MI and the MI of the thermal and database images written as MI X and MI Y respectively.
Both Fig. 2 and 3 show similar results in that the pedestrian is clearly visible with most of the remaining image due to the database image. Each term of the MI as well as the total value was highest for the simple algorithm. Table 2 and 4 show the MI values of the same scene but without the pedestrian present.
The results are similar to that of when the pedestrian was present. However, the MI between the thermal and fused images is generally lower in the scene without the pedestrian. This makes sense because the pedestrian comes entirely from the thermal image.

Images with Similar Priority
A value of β~0.9 meant that the MI between the resulting fused image and the database image was similar to that of the MI between the thermal and fused image. For this ratio, the visible image and thermal image will have a similar amount of information in the fused image. Figure 5 and 6 show the results of the four different algorithms for the two different scenes. Table 5 and 7 shows the values of the MI for the two scenes with the pedestrian and Table 6 and 8 show the results without the pedestrian. Figure 5 shows that the simple and PCA results look similar but different from that of the other two algorithms, LAP and SIDWT, which look similar to each other. The results from the simple and PCA algorithms visually contained more data from the thermal image.
The MI values were also higher in these cases when compared to the other two algorithms. However, the MI values of the simple algorithm were higher than in the PCA case. Figure 6 gave similar MI results as that of Fig. 5, but the image using the PCA algorithm seemed to have more contribution from the visible image in areas of vegetation, with less in the road region. When the pedestrian was not present, the MI of the simple algorithm remained similar, but the MI values from the PCA result dropped significantly.

Thermal Image Dominant
A value of β ~1.5 indicates that the MI between the resulting fused image and the thermal image is a factor of 1.5 greater than the MI between the database and fused images. For this value of β, the thermal image should have a significant contribution to the final image. Figure 7 and 8 show the results of the four different algorithms for two different scenes. Table 9 and 11 show the values of the MI for these images containing a pedestrian and Table 10 and 12 show the results without the pedestrian.
Both Fig. 7 and 8 show similar results to when the images had similar priority. The results looked similar for the four algorithms, with the simple and PCA results appearing similar to each other and the remaining two algorithms appearing similar to each other. The MI values were also higher in the simple and PCA algorithm cases when compared to the other two. However, the MI values of the simple algorithm were higher than in the PCA case, much like the previous case.

Conclusion
We described an image fusion system that can lead to improved safety for nighttime vehicle operation. The use of a public database gives results not possible with other approaches. We introduced a new metric to prioritize images that was suited to our approach. We found that using the maximum of the sum of MI values while maintaining a constant ratio allowed us to compare fusion algorithms. We found that when giving the database image priority over the thermal image, the fused image appeared the most daytime-like and easily showed a pedestrian. As the priority of the thermal image increased, the background of the image was formed by both thermal and database images. A simple background replacement algorithm gave the best results in our experiments suggesting that one or the other image dominated a particular region. Developing improved fusion algorithms to properly combine such images could further improve the results for nighttime navigation.

Author's Contributions
R. Nahas: Participated in collecting and analyzing data and contributed to writing of the manuscript.
S. Kozaitis: Organized the study, participated in analyzing data and contributed to writing of the manuscrpt.

Ethics
Parts of this work is from R. Nahas's Ph.D thesis and has not been publlished elsewhere.