An Effective History-based Background Extraction System

: Problem statement: In many visions-based surveillance systems, the first step is accomplished by detecting moving objects resulted from subtraction of the current captured frame from the extracted background. So, the results of these systems mainly depend on the accuracy of the background image. Approach: In this study, a proposed background extraction system is presented to model the background using a simple method, to initialize the model, to extract the moving objects and to construct the final background. Our model saves the history of each pixel separately. It uses the saved information to extract the background using a probability-based method. It updates the history of the pixel consequently and according to the value of that pixel in the current captured image. Results: Results of the experiments certify that not only the quality of the final extracted background is the best between four recently re-implemented methods, but also the time consumption of the extraction is acceptable. Conclusion: Since History-based methods use temporal information extracted from the several previous frames, they are less sensitive to noise and sudden changes for extracting the background image.


INTRODUCTION
Many visions-based surveillance systems such as Intelligent Transportation Systems are started by detecting changes in an image sequence. To identify changes, the first group of methods use algorithms which are designed based on inter-frame differences. These methods always detect larger moving object areas than the real. Thus they are not able to detect immobile objects correctly. In addition, they are liable for noise impact and they have the low detection precision and low reliability (Xiaofei, 2009). In the second group of background extraction methods, some researchers use algorithms which are based on the subtraction of two images; the input frame and a reference image. The latter image presents the background image of the scene. The efficiency of this group mainly depends on the accuracy of the extracted background image. In addition, this approach is affected by the adaptability of the background image extracted in various situations. This problem prompted researchers to find a more performance-based approach to model, initialize, adapt and subtract a more accurate background especially in real-time and outdoor environments.
The background objects are a combination of stationary objects like traffic lights and non-stationary objects such as waving bushes. On the other hand, pixels of the background image are a combination of static pixels and dynamic pixels. The static pixels present the stationary objects and the dynamic pixels belong to the non-stationary objects (Li et al., 2004). A proper background extraction algorithm should be able to handle both static and dynamic pixels.
The gray scale of a stationary pixel of each frame has a constant value; this constant is the intensity of that pixel. Gray values of the pixels in a motionless background image are not always the same. It is because of disturbances in lighting and intensity, cameras, atmosphere and the existence of moving objects for a short or a long period. So, the constant value allocated to each pixel may change in one or a few time slices. However, it seems that in a certain duration time, the value of each pixel in the background image of a video sequence is likely to be equal to the pixel with highest appearance frequency in corresponding scenes. History-based methods are based on this fact. They save the history of each pixel for a specific period of time and in the form of values allocated to one or a range of different gray scale values.
In the recent years several studies have been undertaken based on modelling the history of each pixel or a group of them (block or region) as the background model (Xiaofei, 2009;Kong et al., 2007;Chiu et al., 2010;Cui et al., 2009). In these methods, some information of each unit (pixel, block or region) should be saved separately. Finding the Information which is able to model the background more accurate and designing an updating which updates the saved information in the minimum time and with the highest performance is the object for the researchers in all these years.
In this study, we introduce a complete historybased background extraction system. This system is a combination of a simple method to model the background, a new algorithm to initialize the background, a fast algorithm to update the system and adapt the background, a fast technique to extract the binary objects, an enhanced median filter to reduce the existing noise and an effective algorithm to construct the final background.
The proposed method can be categorized as probability-based method. It saves the history of each pixel separately. The method uses the history in order to estimate the current value of the corresponding pixel in the background image. Once that current frame is captured by system, the history is updated.
The rest of this study is organized as follows: The first section reviews a number of more significant related works. The second section gives an overview of the proposed system. The following section provides the experimental results and the study is concluded in the last section.
Related works: Several studies have been undertaken over the last three decades to model, extract and adapt the background image. Speed, adaptability to periodic changes such as changes in sunlight, management of fast changes caused by moving objects entering the scene or leaving the scene are the most critical problems for these studies. Previous methods have used nonparametric and parametric methods to model the changing behavior of the scene, moving objects, background or some sections within them. In the following, some of the significant samples of these studies and their properties are highlighted.
As a simple nonparametric model, in order to estimate a new background, researchers construct a combination of the current background and one, two or more previous extracted backgrounds. The technique of combining the frames; the number of combined frames, the composition of these frames and the combination rates have been investigated several times.
The common goal of these previous studies is improving the adaptability of the final extracted background. Namely, Fuzzy logic and Cellular Automata are used to decide how to combine frames together more efficiency (Shakeri et al., 2008). Xiaofei (2009) introduced a background extraction system with an update algorithm which is self-adaptive. He also presented an extended method based on histogram statistics combined with multi-frame averages and claimed that this approach is faster and suitable for platforms based on a Digital Signal Processor (DSP). Also, Barnich and Droogenbroeck (2009) presented the ViBe program to combine frames randomly. Stauffer and Grimson (1999) supposed that a specific pixel may have values following a Gaussian distribution. In the Gaussian distribution model, the background of a scene is assumed as a set of pixels with a Gaussian distribution of between 2 and 2.5 standard deviations for each pixel (Peng and Horng, 2009). Subsequently, a Mixture of Gaussians (MoG) was introduced. According to MoG, an image is modelled in the form of a combination of k separate Gaussian models (usually k is between 3 and 5). Researchers set parameters for each Gaussian (mean, variance, weight and learning rate). They have presented various combinations of a different number of Gaussian models to adapt to the gradual background changes (Sheng and Cui, 2008).
Also, a Local Binary Pattern (LBP) is widely used as a texture descriptor. It has acceptable performance in fabric defect detection, texture classification, face recognition and moving region detection. In this method, local features which are considered as neighboring pixels are assigned to the texture features (Armanfard et al., 2009a). LBP has been used by Heikkila and Pietaikinen (2006) to model the history of each pixel in any block. Armanfard et al. (2009b) tried to reduce the size of the model.
In our own previous work, a combination of blocking and multi-scale methods was used. Blockbased techniques are appropriate to control the movements of non-stationary objects because they are less sensitive to local movements, especially in outdoor applications. They can be useful to reduce the effect of these objects on the extracted background. We also used a blocking method to select the regions in which temporal filtering has to be applied. This method is appropriate to extract moving objects when motion is not predictable. However its results are not stable. They are sensitive to any sudden motion.

Proposed model:
The proposed method is a combination of the following steps: • A probability-based method to model the background • An effective algorithm to initiate the background model • An accurate method to extract moving objects based on the adaptive model resulted from the previous phases • A post processing technique to reduce the noises in the binary image showing moving objects • A simple method to extract the final background To continue, each of these methods is described in detail below. Also, various parts of the proposed model are illustrated in Fig. 1.
Before making a decision concerning the model of the system, it is necessary to know how to transform the visual data for input into the system. Gray scales are usually used by different researchers and it is a key factor to judge the extracted background and moving-objects. Furthermore, to reduce the complexity of the algorithm, we process a color image and convert it to gray scales using a famous Equation, Eq. 1: Where, f (x, y) R , f (x, y) G and f (x, y) B refer to the RGB components of a pixel (x,y) of the current frame, respectively. Using a histogram method, we save the history of each pixel separately. To do so, the range of the grayscale values is divided into W slices. The size of each range is Clu_size. Also, the range of the gray scale in each slice is equal to Eq. 2: In the proposed model, each slice is explained by an ordered pair (µ i , C i ). µ i is the mean value belongs to the ith slice and C i is the number of frames which their corresponding pixel has a gray scale located in the i th slice. Whenever a new frame has a pixel in a specific slice, these two parameters of the corresponding slice are updated. The updating process will be explained in the background adoption section. In the proposed method, the number of slices, W, is 64. First, µ i value of the corresponding slice of each pixel is initialized with the value of that pixel in the first frame. Its C i is assigned as 1. This value for the other slices is set zero. The proposed initialization algorithm uses a distance classifier to classify the input image for each pixel until Fn th frame. Then, the probability of the grayscale of each pixel is calculated and a convergent value is used to extract the gray scale of the background pixel whose probability is maximum and greater than the convergent value. In the following steps, the background initialization algorithm is explained. Also, the pseudo code of the method is also illustrated in Fig. 2: Step1: Capture the first input frame and initialize the model as explained before. Step2: While there is any pixel in the background (Temp_BG in the pseudo code) not yet having a value yet, repeat steps 3-12. Step3: Capture the next input frame. (Counter is the number of grabbing frames). Step4: If there is any unassigned pixel in the captured frame, do steps 5-7. Step5: Calculate the corresponding slice of the grayscale for the fetched pixel according to Eq. 3: where, i and j are the coordinators of the corresponding pixel, Cur_Img refers to the most recently captured image and Clu_size is the size of each slice of grayscale values.

Fig. 2: Pseudo code of the initialization algorithm
Step 6: Set the input pixel as a member of the Clu_No group. It is necessary to update the parameters of this group as Eq. 4 and 5: Step7: If the Number of frames (Counter) is less than Fn go to the next step (Step8) else calculate probability value of the maximum cluster, Pmax(i,j), for the pixel using Eq. 6 and 7: [ ] Now, if (Pmax(i,j)>θ), pixel(i, j) of the background image (Temp_BG) is assigned with the mean value (µ) of the gray scale slice which has the highest probability value (pmax (i, j) for this pixel (Eq. 8). θ is the convergent value: Step8: To increase the speed of the extraction process of background pixels, the converted value, E, is updated as defined by (Eq. 9). This process is done whenever the counter is equal or more than Fn: n n (Counter F ),counter F θ = η × ω − > Where: η = The initial convergent value of f ω = The weighting value of the initial convergent Here, as in Chiu et al. (2010), we set η and ω to 0.7 and 0.95, respectively.
After initializing of the system, by capturing a new frame, its corresponding background should be extracted. First, to extract the adaptive background to the current frame, the moving objects should be obtained. Then, the algorithm updates the history of pixels and applies the effect of the captured frame on the updated history. In the next step, moving pixels are identified to specify the moving objects. After that, steps 5 and 6 of the initializing algorithm are repeated again. For each pixel of the input frame, the cluster of that pixel is calculated and then its history is updated likewise. In addition, the corresponding pixel of the extracted background is assigned by the mean value of the gray scale cluster that has visited most so far. The moving pixels are identified in the form of a binary image by subtracting of each pixel of the background image extracted in the previous step from the corresponding pixel in the current image. Equation 10 calculates the distance between pixel (i, j) from the current frame and its corresponding pixel in the temporal background.
In the next step, according to Eq. 11, if the computed difference value is not more than two thresholds, ∂ and ϕ, its corresponding pixel belongs to the static image. Else, it is a pixel of a moving object. ∂ and ϕ are two thresholds which are calculated by the gray scale difference histograms. We compute the value of these two parameters using the method explained by In both analog and digital signals, noise is an unwanted perturbation to a wanted signal. Here, after constructing the binary image, there are some scattered pixels which the algorithm has detected them as the moving object pixels incorrectly. By applying a simple noise reduction technique, the quality of the binary image is improved.
We use a binary median filter to decrease the effect of the existent noises in the binary image. In this filter, the number of pixels with value equal to 1 in a local area around of each pixel of the binary image is counted. If this amount is less than a static threshold, δ, that pixel is detected as a noise pixel and its value is changed to zero.
In this research, the value of δ is set to 2. This value is achieved experimentally and after that we examined different values. In addition, the pixels of a 3×3 square area around of each pixel are assumed as its local neighbors.
At the end, from the combination of the current frame and the temporal extracted background, the final background image is resulted. This combination is done according to the values of the binary image pixels. The binary image is the output of the previous stage explaining the moving pixels. If the value of the binary pixel is one, final background is assigned to the value of correspond pixel in the adapted background. Else, it will be replaced with a weighted combination of the corresponding pixel of the current image and value of the pixel in the temporal extracted background (Eq. 12): In this formula, Bin_BG, Temp_BG, Fin_BG and initial_BG refer to the value of the pixel (i, j) in the Binary image, the Extracted temporal background image, the Final background image and the Initial background image extracted in the initialization phase, respectively. The value of w is also set to 0.1.

RESULTS
In order demonstrated to evaluate the effectiveness of the proposed system and compute its efficiency in comparison to the other related works, we implemented our algorithm and four algorithms from recent studies, namely Xiaofei (2009); Kong et al. (2007); Chiu et al. (2010) and Cui et al. (2009). All the algorithms were implemented using VC++ 2008. They were run on a Pentium 3 GHz PC under the Windows XP Operation System. To have a fair experiment, we tested all of these five systems under the same conditions. We used the standard database ordered by PETS (IEEE International Workshop on Performance Evaluation of Tracking and Surveillance). Actually, the PETS2000 (P0), PETS2001 during the day (PD) and PETS2001at night (PN) were used. These datasets can be found at http://ftp.pets.rdg.ac.uk/. In addition, some other data sets were used as well. We examined the systems using Campus (CM), Fountain (FT), Shopping Mall (SM), Bootstrap (BS), Escalator (ES), Hall Monitor (HM) and Lobby (LB). These data are accessible at http://perception.i2r.a-star.edu.sg/bk_model/bk_index. Html.
In this research, the experiments were repeated several times using various frames a different number of frames. Also, the experimental results were investigated quantitatively and qualitatively. Figure 3 illustrates the results of the proposed method using various data sets and under different conditions in comparison to the others. It also plots the binary images extracted by these methods. A binary image shows the moving object as the corresponding method expects. The figure illustrates that the performance of the proposed method is acceptable.
To evaluate the efficiency of our algorithm quantitatively compare it with the other algorithms, four different scales were utilized: • The time consumptions of the algorithms were calculated to compare the time complexity. The averages of the time cost of the methods are shown in the first vertical section of      Chiu et al. (2010) in comparison to the others is less. Actually, the time complexity of their main algorithm (the Extraction Background Algorithm) is O (N 2 ) versus the others that their time complexity is O (N 3 ). N is the size of each dimension of the images. This difference is resulted from the fact that in their method just the previous extracted background is participated to update the value of each pixel in the new extracted background image. The others use N previous background images. In spite of this preference, the quality of their extracted backgrounds is not comparable to our results. As it is clear in Fig. 3, our extracted backgrounds using various datasets are more acceptable. They are more adaptable to changes occurring in the existence of moving objects especially in the crowded situations. In addition, all the quantitative factors used to evaluate the quality of output of the methods certify that our results are in the minimum state 20% better than the others.

Future works:
In this research we save the history of each pixel separately. Using a blocking technique reduces the time complexity of the algorithm and strengthens it against noise caused by small motions. Also, it seems that it is not necessary to adapt the background in each frame. Reducing such processing will result in executing the algorithm less time and thus reduce the execution time.
As part of our future work, we intend to investigate these problems further.

CONCLUSION
In this study, we have introduced a hybrid background extraction system. In this system, after that background is initialized using a proposed method, the history of each pixel is saved separately. Using a probability-based method based on the saved history, it is detected that each pixel is a background pixel or a foreground pixel. Various quantitative and qualitative experiments show that the accuracy of the extracted background is better than the other methods which are re-implemented here and also the time consumption of the proposed method is acceptable.