Enhancement of GPS Position Accuracy Using Machine Vision and Deep Learning Techniques

Email: ashwani.ist@sliet.ac.in Abstract: The accuracy of GPS position estimation in urban cities is an issue which need to be resolved using machine vision and deep learning techniques. The accuracy of GPS in horizontal direction is better than in the vertical direction. Although for most of the navigation applications in intelligent transportation systems, horizontal positioning accuracy is vital, but vertical position accuracy gives idea about road slanting conditions. Several statistical methods like median filtering, homomorphic filtering and k-means clustering, etc., can be used to improve upon the position accuracy of GPS signals. Such methods are useful for offline applications where a lot many GPS measurements are taken at a single point and afterwards filtering is applied to batch of measurement. In this study, the GPS positioning errors which are caused by sensor noise, ionospheric effects, occlusions by building facades, etc., have been considered for online improvement in position estimation using computer vision and deep learning methods by empirically choosing hyper-parameters.


Introduction
GPS is a very common positioning method for estimation of position of a vehicle in the outdoor environment. The GPS is a cluster of satellites which transmit radio waves with data about positioning. The GPS receivers decode the information embedded in the received radio waves to calculate the position of the receiver with respect to center of earth. The position of the GPS receiver obtained with respect to centre of earth is then converted into more useful coordinate system. The most commonly used reference coordinate system in GPS positioning is WGS84 coordinate system. The WGS84 is earth centered earth fixed terrestrial reference system (Kumar et al., 2013). The most commonly used coordinate system for GPS position is latitude and longitude. The latitude of a point on earth varies from -90° to +90°. A latitude of -90° corresponds to south and +90° corresponds to north of equator. The equator has a latitude of 0°. The longitude of GPS position varies from -180° to +180°. The longitude of -180° is west and +180° is east of prime meridian. A longitude of 0° is the prime meridian. The latitude and longitude give the horizontal positioning of the GPS receiver. The altitude information is also obtained from GPS received which is capable of receiving enough number of satellite signals at a given position. The accuracy of horizontal and vertical positioning of the GPS receiver depends on the signals coming of number of satellites orbiting the earth. More number of satellites does not always mean better positioning accuracy. The positioning accuracy also depends on the geometry of the satellites from the point of observation. There are few parameters which represent the geometry of satellites at some position. One of the parameters which is important for position accuracy is dilution of precision, lower the dilution of precision, better is the GPS position accuracy. Several GPS refinement algorithms use dilution of precision information while refining the GPS accuracy at a given point. Figure 1 shows the latitude and longitude coordinate system used for GPS positioning. For the estimation of positioning in outdoor environments, it is not the set of GPS satellites only which are used, but others sets of satellites like GLONASS from Russia, Galileo from European Union, Beidou from China commonly known as Global Navigation Satellite System (GNSS) are used.
The GPS works on three segments known as space segment, control segment and user segment. Figure 2 shows the three segments of a GPS system. The space segment incorporates set of satellites which transmit radio waves. The control segment is ground based which communicates with the space segment for enhancement of position accuracy and for knowing the status of satellites. The user segment is the GPS receiver which receives GPS signals and calculates its position using trilateration.
There are several errors due to a number of factors. Among many, some of the factors are atmospheric disturbances, satellites and receiver clock mismatch, sensor noise, multipath error, non-line of sight errors etc., (Aggarwal, 2015a). Equation (1) gives the delay due to propagation of radio waves while traversing through ionosphere layer of the atmosphere. The inospheric delay is estimated are necessary corrections are applied to the obtained GPS position estimates to improve on the positioning accuracy: where, v is delay due to propagation of radio waves in ionosphere, c is the speed of light, f is the frequency of the GPS signal, Ne is number of free electrons per m 2 . The GPS position estimation also becomes erroneous due to mismatch between the GPS satellite clocks and GPS receiver clocks. The clocks used in GPS satellites are atomic clocks which are highly accurate. However, the clock used in GPS receivers are not accurate enough which leads to errors in position accuracy because the position is calculated based on time of travel of radio waves from the satellites to the GPS receivers. The clock error is given in Equation (2): where, t is the satellite clock offset. t0 is calculated from navigation messages received by the GPS receiver and ϵ is the clock correction term applied to compensate the errors due to orbital eccentricity. The researchers have attempted to identify the reasons for errors in position estimation due to many other reasons and have applied several techniques for the removal of such errors. The number of satellites The some of the techniques are based on computer vision-based methods while others are based on deep learning-based approaches (Song et al., 2011).
The computer vision-based methods make use of images captured in the vicinity of the environment where the GPS receiver is sued to estimate its position. The images captured are used to extract feature points. The feature points are used to combine image sequence to develop 3D model of the environment. The 3D model is used to identify the non-line of satellite signals. Such signals are removed from the position calculation and hence leading to improvement in position accuracy. The 3D camera is sometimes used to obtain a point cloud and that point cloud is used to identify the reflected signals from the surfaces using surface models. There are several challenges in applying computer vision-based methods for improving the accuracy of position estimates. Some of the challenges are occlusion problems, non-uniform illumination conditions and error in estimation of camera intrinsic parameters, etc.
The deep learning methods are being used in GPS position accuracy enhancement over past few decades. The deep learning methods work by collection of huge amounts of image data to train the network for its use in improvement of GPS accuracy. The convolutional neural network-based architecture of deep learning is quite useful to enhance the position accuracy of GPS estimates. Sometimes, due to of lack of huge data, transfer learning methods are employed for deep learning methods to work better. A convolution network consist of several hidden layers of neural network as shown in Fig. 3. The network has several parameters and hyper-parameters, some of which are chosen empirically so that the trained neural network performs in an optimal way giving predicted value close to ground truth and with least computational time. The computational time is reduced by using several 11 filter mask. A combinational of various filter sizes and various types of activation functions are used throughout the deep neural network to obtain desired performance form the network.
The computational time is reduced by using several 11 filter mask. The convolution operator is used at various layers so that features with low complexity like straight lines to higher complexity like triangles, hexagons could be learned during the training phase of deep neural network.

Related Work
The indoor positioning using active RFID trilateration has been done in (Retscher and Fu, 2008). The authors of the paper use active radio frequency identification method in situations where GPS signals are not reachable to the mobile robot due to many obstacles surrounding the robotic vehicle. The support vector regression has been used for indoor positioning in (Xu et al., 2018). The authors overcome the drawback of poor reception of GPS signals in position estimation with RFID positioning technique using weighted path length method. The unmanned aerial vehicles have been used in computer vision applications in (Zhang et al., 2014). Color based features from the images have been extracted and used for retrieval of images from a huge database in (Arora and Aggarwal, 2018). A prediction model for GPS receivers has been discussed in (Yang et al., 2019). The location of vehicles in outdoor environment has been improved using collaborative driving in (Demetriou et al., 2018). The hybrid tracking in urban environment has been achieved for augmented reality using computer vision techniques (Fong et al., 2009). The 3D model in world coordinate system has been reconstructed using a laser range scanner in the presence of poor GPS signals in (Ashwani et al., 2013). The image based and 3D camerabased 3D model has been used for GPS position accuracy enhancement in (Kumar et al., 2014a). A deep learningbased approach has been used for mitigating the effect of multipath errors to enhance the accuracy of GPS position estimates (Quan et al., 2018). The computer vision techniques have been used in transportation systems for detection of license plates (Ashwani, 2019).
The 3D model reconstructed with the help of GPS receiver and laser range sensor for preservation of cultural heritages is discussed in (Kumar Aggarwal, 2017). An ensemble of wavelets has been used for removal of noise in EMG signals in (Thukral et al., 2019). The extrinsic parameters of a camera used for estimation of position of GPS receiver has been estimated using a sequence of images in (Kume et al., 2010). The use of multispectral images which do not only work in the visible region but also in the nearinfrared and far-infrared regions of the electromagnetic spectrum are sometimes useful to augment the information obtained from conventional camera. The fusion of images obtained from multispectral images has been discussed in (Aggarwal, 2020). The feature point extraction-based image matching methods are useful to obtain reflection of GPS signals from the surrounding buildings. The Hough transform is a useful technique to detect straight lines in an image which can be used detect the presence of building facades in the vicinity of GOPS receiver. the detection of parallel lines for detection of building facades has been done in (Sharma and Aggarwal, 2004). The role of computer vision techniques in improving outdoor positioning accuracy has been discussed in (Steinhoff et al., 2007). The error due to multiple reflections might or might not be accomplished with reception of direct signal. In the scenario where direct signal is also obtained along with reflected GPS signals, double chock antenna is useful. However, the scenario where direct signal is not obtained, the reelected GPS signals has been identified and necessary correction have been applied in (Kumar et al., 2014b). The navigation of automated vehicles in intelligent transport systems in the presence of reflections of GPS signals has been improved using image based methods (Kumar, 2015a). There are many papers which discuss about positioning for various tasks in computer vision and intelligent transport systems. Out of several papers, a summary of related papers with their salient features is given in Table 1. The several issues and challenges faced by intelligent transport systems in urban cities where the GPS signal undergoes multipath propagation due to reflections from building facades, trees and dynamic obstacles has been discussed in (Kumar, 2019). A convolutional neural network architecture has been used to train the network for precise localization in large MIMO-OFDM system has been discussed in (Wu et al., 2019). The mobile robots can perform their tasks efficiently only if the current position of mobile robots is estimated to a considerable accuracy. The machine vision-based techniques for estimating the current position of mobile robots has been discussed in (Aggarwal, 2015b). The 3D images obtained by using 3D camera are very useful to estimate the position of a GPS receiver in the outdoor environment. The point cloud obtained by 3D camera are used to obtain depth images. The several depth images taken from different viewpoints under considerable overlap have been stitched together to enhance the GPS position accuracy in (Kumar and Ikeuchi, 2012). A fish eye camera has been used to detect non-line of sight signals received by GPS receiver in an attempt to mitigate the effect of reflected GPS signals for position estimation in the outdoor environment (Horide et al., 2019). The application of GPS position estimation in predicting the behavior of animals with the help of deep learning techniques has been discussed in (Browning et al., 2018). The use of inertial navigation system along with differential GPS system for lane level detection for real-time navigation application is discussed in (Vu et al., 2012). Image data captured by a conventional camera is two-dimensional data in which each pixel gives intensity usually in the range of 0-255. Image data has been converted into audio signals as an assistive tool for rehabilitation of blind in (Aggarwal, 2014). The improvement of GPS position accuracy can be achieved by collection of several measurements at a single point and then analyzing the pattern of position estimates using statistical techniques is useful for static position applications. A statistical approach towards improvement in the GPS position accuracy using median filtering and k-means clustering has been attempted in (Kumar, 2015b). The GPS signal suffers from various types of errors while propagating through the various layers of atmosphere. The various uncertainties in the GPS positioning are discussed in (Oxley, 2017). The drive assistance services are widely used in intelligent transportation systems which rely on the accuracy of GPS position estimates. The deep learning based technique for detection of lane change behavior for intelligent transport systems has been discussed in (Wei et al., 2019). The images captured using an unmanned aerial vehicle have been used for localization utilizing deep learning methods in (Zhang et al., 2020).

Machine Vision Based GPS Positioning Enhancement
The machine vision techniques make use of images captured either a conventional camera or a depth camera to obtain images of the environment surrounding the GPS receiver. The GPS receiver might be standalone GPS receiver or mounted on the top of an autonomous vehicle. The GPS receiver which is mounted at a fixed coordinate offset from that of camera captures the GPS position at a point whereas the image captured by the camera is with respect to another reference point. As the offset between the two coordinate systems is known, appropriate transformation is applied on the image obtained by the camera so that it is mapped to the point of reference of GPS receiver. Figure 4 shows the coordinate offset between the GPS receiver and that of the camera used. The transformation of coordinates is done using Equation (3): where, [Uc(t),Vc(t),Wc(t)] and [Ug(t),Vg(t),Wg(t)] coordinates of camera and GPS receiver in their respective coordinate systems at any time t. R is the rotation angle between the two coordinate systems. Tx, Ty, Tz are the translation between the two coordinate systems in x, y and z directions respectively. The role of computer vision in GPS positioning is in finding the occlusions using various feature points in the images captured using a camera. The deep learning methods are applied for enhancing the GPS positioning after detecting occlusions using computer vision techniques.

Deep Learning Based GPS Positioning Enhancement
The deep learning based methods for improving accuracy of position estimates of GPS receiver rely on huge set of images which are used to train the neural network. The data augmentation is achieved by carrying out several transformations on the images. The images are zoomed to different scales and are flipped horizontally and vertically. The various transformations applied on the captured images make the deep learning methods robust and more accurate than using image data without any data augmentation. Out of several deep learning architectures, convolutional neural networks are preferred for training the networks with images. The number of hidden layers in a convolutional neural network are large so that many features can be learnt from the training images. The deeper the layer in the convolutional neural network, more complex features are learnt by the layer. The hidden layers are accompanied by fully connected layers. The most commonly activation functions in convolutional neural networks are softmax, sigmoid, tanh, Rectified Linear Units (ReLU), leaky ReLU. The softmax activation function is most commonly used in multiclass classification, for example, classifying buildings, pedestrians and trees etc., in the images captured by camera mounted on autonomous vehicle along with GPS receiver. The tanh activation function is similar to sigmoid function with a difference that tanh is symmetric around origin whereas that is not the case with sigmoid activation function: The tanh activation function is given in Equation (4). The range of tanh is between -1 and +1. The gradient for tanh activation function is more than that of sigmoid function. The ReLU activation function is linear for all positive values of input and is zero from all negative values of input. The ReLU activation function is given by Equation (5): where, x is the input given to ReLU activation function and y is the output of ReLU function. The other version of ReLU activation function is leaky ReLU function in which the slope is changed for negative input value that causes extension in the range of ReLU. The input may be the concatenated image into column vector at the input layer of the output from any of the previous hidden layer. The various architectures of convolutional neural networks used in GPS positioning are LeNet, AlexNet, VGGNet and ResNet. The frameworks used for implementing algorithms for deep learning-based GPS positioning enhancement are PyTorch, TensorFlow, Keras, MXNet and DL4J and Theano, etc. The TensorFlow framework is highly optimized to work on Graphics Processing Units (GPU) which uses tensors, a generalization of vector data of various formats in the high dimensions. Keras framework works as a wrapper to TensorFlow framework for deep learning applications. GPS data collection The structure of the system depicting its components is shown in Fig. 5. The GPS data is collected using a GPS receiver. The data is pruned using first and third quartiles. The pruned data is used for refinement of positions by removing occlusions using machine vision techniques. Thereafter, convolutional neural networkbased position enhancement is carried out to obtain the improved GPS position estimates.

Conclusion
The GPS position estimation finds its applications in diverse domains viz. autonomous vehicles, mobile robots, augmented reality and gaming industry etc. However, the challenge in using the precise GPS position information lies because of errors caused by sensor noise, atmospheric conditions, dynamic obstacles and canyon of buildings, etc. The use of computer vision and deep learning-based techniques have been discussed in this paper for the improvement of GPS position accuracy. The drawback of collection of several measurements over a period of time using statistical methods of GPS position accuracy enhancement can be overcome by using techniques based on computer vision and deep learning methods.

Future Scope of Work
The GPS position estimation can be improved using a hybrid technique in which outliers are removed based on collection of measurement from a subset of positions and then the refined position estimates are improved using computer vision and deep learning techniques. The use of wireless networks could be employed for locations where the signals from less than four GPS satellites is obtained. Also, the inertial navigation systems and Kalman filtering could be used to enhance the position accuracy. The algorithms for improvement of GPS position estimation can be evaluated using many benchmarks available in the literature.