An Improved Object Detection Technique for Hazard Avoidance Systems

: Hazard detection and avoidance at construction sites working with heavy equipment and moving vehicles is one of the biggest issues in modern surveillance. Background subtraction using a Gaussian Mixture Model (GMM) is widely utilized for identification of moving objects with most existing methods leading to improvements but lacking accuracy of object detection. This paper aims to improve accuracy and processing time for object detection. The proposed algorithm consists of a correlation coefficient to reduce the existing geometric error and provide more accurate detection of moving objects by comparing foreground and background pixels in every frame. A Kalman filter is used for keeping track of the object. The results demonstrate that the proposed algorithm outperforms existing applications in terms of accuracy of object detection. On this basis, it is recommended that object detection with a correlation coefficient of background and foreground pixels of objects can be used for hazard detection in real-time monitoring systems such as traffic monitoring and detection and tracking of humans.


Introduction
Workplace health and safety is a high priority issue from ethical, legal and economic perspectives. Hazards at construction sites generally develop due to the highpressure environments these sites represent. The constant movement of heavy equipment and moving vehicles reduces worker sensitivity to their surroundings and can result in accidents (DHHS, 2001). With an increased moral and legal responsibility of businesses towards their employees, implementing safe work practices and installing safety equipment also makes sound economic sense (WHS, 2018).
Current practices followed at construction sites generally involved safety training and the issuance of protective equipment, Remote Sensing (RS) technology, 3D CAD models to measure geometric accuracy, ground based photogrammetry to detect hazards and assessment of safety level (Zollmann et al., 2014). However, these measures do not provide comprehensive information of hazardous incidents in terms of location or type and thus do not enable immediate control measures (Grabowski et al., 2007). To overcome such limitations, real-time algorithms for image processing and object detection are used whereby the latter is the process of identifying real world occurrences such as faces, vehicles and buildings in images and videos, thus improving identification of accidents and non-motorized traffic in the form of pedestrians and cyclists (Samdurkar et al., 2018).
A range of technologies exists for recording real information. Among these are CCTV cameras which have been implemented at many construction sites. However, they are still the subject of research due to performance limitations of the motion-based object tracking algorithm (Kalman filter) (Mathworks, 2016). Motion based object tracking algorithms aim to detect a moving object and keep track of it over time. The detection is based on background subtraction where the previous frame is subtracted from the current frame (Mathworks, 2016) while categorization as moving object is facilitated through blob analysis. The significance of the Kalman filter lies in tracking the detected object based on its velocity or acceleration (Welch and Bishop, 1995;Kalman, 1960).
Current studies of object detection at construction sites use different algorithms for accurately detecting objects. The best result for systems using motion-based object detection algorithm under occlusion (Mathworks, 2016;samdurkar et al., 2018). achieved an accuracy of 95.7% and a precision of 92.04% demonstrating that there is still scope for improvement. The purpose of this paper is to increase the accuracy of object detection along with improving the processing time of detection. This study proposes an improved motion-based object detection algorithm consisting of correlation coefficients for segmentation purposes.

Related Work
To provide a realistic estimate of object detection, several algorithms have been introduced focusing on accuracy and processing time as major factors with some researchers using markers for classification of objects. Tatić and Tešić (2017) based their system on AR technologies. A mobile device is used to execute the application by accessing a database. The pre-requisites of the system are that a unique ID is assigned to each worker relevant to the worker's professional skill level. After the user logs in to the system, the relevant AR module is loaded. The system scans the markers and audio-visual instructions are displayed to the worker. The system is effective as it eliminated injuries in the workplace in the test application. Zhou et al. (2017) also used marker-based tracking to test the accuracy and precision requirement for segment displacement using an AR based system. Here an AutoCAD tool was utilized to generate Building Information Models (BIM) for inspection of items which were then linked to markers and attached to the designated location onsite. AR glasses or a mobile device is used to augment these BIM onto real work places for inspecting displacement between segments. The results are obtained using an image-matching process for comparison between the baseline model and the actual video. One of the drawbacks of this system is that it cannot cover a large range nor reduce the number of markers attached onsite.
In addition to using artificial markers, Hebborn et al. (2017) presented a solution for realistic occlusion handling in static and dynamic scenes. In this technique, foreground and background are separated based on color and depth information acquired through sensors and 3D rendering, after alpha value estimation, compacting results in a single image. This solution is highly effective, but the technique could not distinguish between foreground and background images of the same color.
There is also a body of research that makes use of natural markers for the detection of objects by utilizing a cascade classifier based on HAAR like features (Gomes Jr et al., 2017). The classifier is trained with positive and negative images from different pieces of equipment onsite. 'Positive' signifies the image of the object to be detected and 'negative' refers to arbitrary images from the work location. Since the system operates irrespective of the view of user, images from each side of the equipment are taken for appropriate identification. The system is also able to identify the equipment within the Field of View (FOV) of the user and sends data requests to the SCADA/EMS system (Gomes Jr et al., 2017). The main problem with this solution is caused by training the cascade classifier which was a time-consuming process.
Chen and Wang (2017) focused on archiving raw construction videos by removing redundant frames into a concise and structured set of key frames for better data storage, retrieval and analysis. Image feature identification was carried out based on color and gradient information. The main idea behind their work was frame differencing using scaled Euclidean Distance measurement and the color features. The results also revealed that color features outperformed gradient features but the Gabor filter for texture analysis was computationally demanding and was sensitive to minor changes in the image.
Ramya and Rajeshwari (2016) used a modified frame difference method whereby a correlation coefficient was used for segmentation of foreground and background pixels. Their results improved the frame difference method in terms of speed and detection accuracy. Samdurkar et al. (2018) presented a novel solution to find motion vectors of moving objects. A Diamond search technique was combined with a cross diamond search technique. The pixels in the current frame after background subtraction and frame difference were divided into macroblocks. Every movement of a macroblock was recorded through vectors which measure the motion vectors' displacement. The resulting motion vectors were helpful in tracking objects, but the solution was able to do so with only one object in a frame.
One form of 3D image reconstruction is presented by Yang et al. (2013) where they employed SIFT to extract feature points for fundamental attributes in experimental images. The researchers constructed the relationship between those points and the best pair image was used for reconstruction through Multi-View Stereo. In contrast to the 3D reconstruction done by Yang et al. (2013), an aerial 3D reconstruction to automatically capture work progress information was presented by Zollmann et al. (2014). The images created by aerial client using viewpoint sampling are used as an input for 3D reconstruction with reconstruction based on SIFT. The 3D reconstruction allowed comprehensive data analysis including edges of objects; however, major drawbacks of this solution were that the battery life of the aerial client was limited and the method was computationally expensive.
The work of Zhong et al. (2018) tested the feasibility of an aerial vehicle using an 8-rotor unmanned aerial vehicle for detection of concrete cracks in buildings. The system uses a range finder to measure the object distance and pixel resolution. The images are processed to obtain the number of pixels in the cracks used to determine the physical size of the crack width. The results in terms of image quality of the airborne images promote its use in future engineering practice. The obstacle detection system developed by Yankun et al. (2011) can detect various static and moving obstacles behind cars with the help of a rearview camera installed in the car. Their solution used a frame differencing method and missed the detection when the obstacle had a uniform texture or no edge information whereas, the edge guided image object detection of (Hu et al., 2016) focused on optimal image partition to be represented as corresponding geo-object with the relationship of edges and regions.
The real time path planning system developed by Kuenzel et al. (2016) for an asphalt road construction project can react to changing environmental, materialrelated and process-related disturbances or changes. The system uses GPS and network connections for the real time tracking and transfer of data that facilitates communication between the machines. The biggest disadvantage of this system is a loss of GPS which causes the machines to deviate from the compaction path generated by the system which affects overall efficiency.
The moving object detection by Ali et al. (2017) uses a standard GMM (Gaussian Mixture Model) and models the intensity values of a block instead of a pixel (Li et al., 2017). They employed a dynamic learning rate to overcome the trade-off in detection accuracy. Their results were four times more effective than the self-adaptive GMM.
Other research investigated statistical techniques for machine learning algorithms such as Support Vector Machines (SVM's) and Neural Networks. According to Kotsiantis (2007), SVM's and neural networks require a large sample size for maximum accuracy of prediction and are robust to multi-dimensions and continuous features whereas logic-based systems tend to perform better when dealing with discrete/categorical features.

Current Best Solutions
The current best solution used at construction sites utilizes a motion-based hazard avoidance system based on an object tracking algorithm.

Current Best Solution Components
The best current solution is the work of (Kim et al., 2017). The proposed system has 3 modules:

1.
A vision-based site monitoring module that utilizes an image capture device such as CCTV and wearable devices to identify site hazards. This module uses a background subtraction algorithm 2.
A safety assessment module that uses captured image data and fuzzy-based reasoning to evaluate the safety level of each object 3.
A visualization module that provides actionable information such as hazard orientation, distance and safety level. The safety information provided by the proposed system can mitigate hazards and improve construction site safety

Current Best Solution Process
The moving objects are detected in the first module by using a background subtraction algorithm based on GMM. The reason for using GMM is its robustness to background variations (Ali et al., 2017). Further, the process carried out segmentation and morphological operations and the location of moving objects in next frame was predicted using a Kalman filter from object's velocity or acceleration (Kalman, 1960). The main feature of this solution is that it captures images from both global and user perspectives, identifies workers of interest and delivers safety information such as distance of that worker from the approaching equipment.

Current Best Solution Components Features
Pre-Detection-In the pre-operative environment, the first frames from the CCTV detect workers of interest using a Histogram Of Gradient (HOG) feature and a Support Vector Machine (SVM) classifier. The HOG feature includes information such as geometric shape and orientation whereas the SVM classifier distinguishes connected pixels as either human or not. The worker who is equipped with a wearable device is identified by calculating the Euclidean Distance.
Intra-Detection-The safety level is visualized from a user perspective rather than from a global view. It consists of the safety level of the worker measured as distance between the worker and the equipment. It takes the centroids obtained in the monitoring module and calculates the vector to identify the equipment moving towards worker.
The safety information is visualised on the interface of AR based wearable glasses. Visualisation of hazard information was achieved using a colorcoded arrow which determines its color from a redgreen color spectrum. Red and green components are mixed so that the resultant color clearly shows the intended safety level.
The experimental validation revealed that the device enabled noticeable improvements in the response time of workers when the equipment was located at the rear -the side invisible to the worker. The system achieved 100% accuracy in safety assessment and the visualization model. Figure 1 shows the complete process of the current best solution with limitations.

Limitation of the Current Best Solution
Limitations arise from errors caused by the object tracking algorithm comprising of Kalman Filter and Gaussian Mixture Models. Although GMM are robust in terms of background variations, such as sudden illumination changes, they fail to recover fast from failure caused by such changes and sometimes classify objects into background. After the object is detected, based on its past trajectory, the Kalman filter keeps track of the moving objects over time. Since, there are errors in object detection, the Kalman filter fails to give accurate prediction results. This limitation can be overcome with the use of a more accurate object detection algorithm. The proposed solution overcomes this limitation to provide a more accurate moving object detection result.
The Table 1 shows the pseudocode of the current best solution for object detection consisting of Kalman Filter and a Background Subtraction Algorithm with Gaussian Mixture Models. Figure 2 illustrates the flow of information for the proposed model. A range of existing objects detection methods have been reviewed for this article, analyzing advantages and disadvantages of each method in-depth. The main issues are accuracy, processing time and occlusion handling. The Proposed Solution uses the modified frame difference method by Ramya and Rajeshwari (2016). One of the advantages of their work is that it accelerates background subtraction through block wise comparison and increased accuracy of foreground and background detection during correlation coefficient comparison. Therefore, the proposed solution incorporates correlation coefficient analysis for object detection into the current best solution.

Proposed Solution
The Correlation Coefficient is a statistical measure that determines the relationship between two variables. The value of the correlation coefficient ranges from -1 to 1, where a value of 0 means 'no' relationship exists between the two variables (ICRA, 2013). Further, correlation can be positive or negative, the sign indicates the direction of association whereas magnitude indicates the strength of association (PPMC, 2018). The object detection in motion-based object tracking algorithms (Mathworks, 2016) is carried out using the correlation coefficient whereas the rest of the procedure would remain the same. The objects, accurately detected by using a correlation coefficient, are tracked using Kalman filter and feature extraction is carried out using blob analysis.

Proposed Algorithm Process
For Current investigation, Correlation Coefficient defined by Rodgers and Nicewander (1988) and used by Ramya and Rajeshwari (2016) is adopted: where, m, n represents the width and height of the block: The frame size used for the current validation is 360×640. The first frame in the video sequence is set as the background image. The Current frame is divided into 5×5 non-overlapping blocks. The algorithm works by comparing every single frame with the corresponding pixel in the background frame. The threshold for classification of the entire block in this experiment is set as T = 0.85.
The correlation coefficient between a block in the current frame and the corresponding block in the background image is computed using Equation 1. If the value of the correlation coefficient is greater than T, the entire block is classified as background. For the remaining blocks, the absolute intensity difference between pixels in the current frame and the corresponding pixels in the background image is computed. If this value is greater than a threshold lambda set for the frame difference, the pixel is categorized as foreground; otherwise it is categorized as background. Figures 3a and 3b show the detected objects using the proposed algorithm and the current best algorithm for object detection.
Following the object detection using a correlation coefficient, the centroid of each detected object in the current frame is predicted using the Kalman Filter. In Fig. 3a, the object was detected and predicted more accurately with the proposed algorithm when compared to the object detection with Background Subtraction with Gaussian Mixture Models. In terms of accuracy for object detection, the proposed algorithm outperforms the current best one. Figure 4a and 4b show some of the objects detected accurately by the proposed algorithm.
In terms of occlusion, when the object is covered by other objects in front of it in such a way that they are mistaken as one, the proposed algorithm was able to correctly detect the object shown in Fig. 5a as compared to the object detected by the current best algorithm shown in Fig. 5b. The results are significant in terms of the detected position, shown by bounding boxes in both the Fig. 5a and 5b.  Step 1: Get input from the CCTV camera Step 2: Do Segmentation using Background Subtraction algorithm with correlation coefficient Step 3: Detect worker using Kalman Filter Step 4: If Detected Calculate Distance of worker from Equipment. If Distance < Threshold Go to Step 5 Else Not a Candidate Worker Step 5: Perform Safety Assessment Step 6: Display Safety data as AR using wearable device. END Figure 6 shows the complete process of the proposed solution and Table 2 gives the pseudocode for the proposed solution. Both diagrams show the changes made to the existing solution.

Experimental Validation
The system was implemented using MATLAB. The computer vision and image processing toolboxes in MATLAB were used for the site monitoring module. To experimentally verify the proposed solution, the distance between the centroid of the predicted and the centroid of the detected object was calculated from both the Current Best Solution and the Optimal results were recorded for T = 0.85. The video contained 600 frames of size 360×640. The first object detection occurred at frame 53 shown in Fig. 7. The less the distance between prediction and detection, the higher the accuracy of the object detection. The experimental results show that the proposed solution with correlation coefficient outperforms the current best solution. Figure 8 shows the difference between prediction and detection over the number of frames.  In the Fig. 9, the green color shows the difference between the detected and predicted location of the object using the current best solution whereas the red color shows the difference between the detected and predicted location of object using the proposed algorithm. The results are significant after frame 100 as the distance between detection and prediction, calculated using the current best solution, was higher than the distance calculated using the proposed algorithm.
The prediction is completely dependent on detection, so the accurate detection by comparing the correlation coefficient between every block of background and the current frame yields better results. Between frames 350-400, the prediction was more accurate as the two bounding boxes are classified separately and accurately using the proposed algorithm. Figure 10 shows the correct object detection and object prediction at frame 347 and Fig. 11 shows the correct object detection and object prediction at frame 413.
The noticeable difference at frame 195 is depicted in Figure 12a below, which shows the incorrect prediction by the current best solution. The bounding box 3, is incorrectly labelled as there is no object at that location. This limitation is improved by the proposed algorithm as shown in Fig. 12b.
The evaluation of proposed algorithm was done by computing the Accuracy of detecction as shown in Equation 2. The same is visualised above in Fig. 3. ( ) -.

Accuracy Distance Predicted Location
Detected Location

=
(2) All the samples revealed that the proposed algorithm yields higher accuracy as compared to the current best solution.

Conclusion
This paper proposed an improved object detection algorithm for hazard avoidance at construction sites. In the proposed algorithm, the object is detected using the background subtraction algorithm with a correlation coefficient. The segmentation of pixels determines the background and foreground pixels of the current frame image block and the corresponding background image. The classification, carried out using Blob Analysis, distinguishes moving and stationary object, with the Kalman filter keeping track of the detected object after detection. The proposed algorithm uses 8x8 image blocks as the first step. The proposed work is concerned with more accurate detection of moving objects in comparison with the existing solution. The experimentation Results obtained show that the proposed solution outperforms the current best solution in terms of accuracy of object detection. Detection and tracking were made accurate but at the same time pixel comparison between blocks increased the processing time. Hence, this research concludes that the proposed solution is better in terms of object detection. There is scope for future research into color information corresponding to moving objects which can be used for improving accuracy of detection.