Low-Sampling Trajectory Reconstruction using Criteria-Based Routing over a Graph

: Location-based services mainly provide geo-location data. However, a moving object’s detailed trajectory route is lost when there is low-sampling of these location data. Previous works have been developed in order to find the possible trajectories by using the location history logged by users. These methods can be considered as reconstruction or imputation processes. In this study, we reconstruct trajectories using personalization features of the routing theory based on evaluation criteria over a graph. In addition, this trajectory reconstruction has only been considered in a confined environment, i.e., a road network.


Introduction
The fast development of technologies and mobile applications has arisen the need of analyzing the huge amount of geo-location data recorded regarding Moving Objects (MO). For example, users in mobile social networks such as Foursquare and Flickr use checking-in and sharing geo-tagged photos features to indicate their location.
However, usually it is difficult or impossible to get detailed data about the movement of a user due to privacy issues (Chow and Mokbel, 2011), energy saving or simply because people do not check-in in every place they visit. As a consequence, source (raw) trajectory data are not very accurate since there are missing data during the silent durations, i.e., the time durations of a trajectory when no data are available to describe the movement of an object (Hung et al., 2011). Thus, the trajectory between two consecutive data records is unknown. As a result, the following are some possible questions to be addressed: How does an object move during a silent duration? How well do the current methods describe the actual trajectory followed by a MO? Does an object move according to a certain criterion, e.g., trying to avoid traffic jams or slopes?
Previous works have focused on historical trajectory datasets of the same MO (Chang et al., 2011) or of similar MOs (Chen et al., 2011) as a way of inferring the routes or the movement patterns of a MO. For trajectory reconstruction (i.e., the imputation process for silent durations) some authors (Liu et al., 2011;Wei et al., 2012) use an uncertainty reinforcement approach (i.e., uncertain + uncertain → certain). However, these approaches may be inadequate if the silent durations in the trajectories followed by the same MO are long (they exceed an application threshold) and recurrent (i.e., there are recurrent trajectory segments where no trajectory data are available).
The problem of finding a route from one place to another, i.e., in the Route Finding Problem (RFP) (Da Silva et al., 2008;Schultes, 2008) is akin to the one of finding a trajectory between two consecutive lowsampled points. In recent years, several approaches (Hochmair, 2005;Da Silva et al., 2008;Schultes, 2008) have incorporated metrics other than distance (e.g., time) and user criteria (e.g., preference for the path with most touristic attractions) to the RFP in order to provide customized solutions. Hochmair (2005) offers a brief taxonomy to build the "best" route based on criteria such as speed, safeness, attractiveness and simplicity for traversing a Road Network (RN). This same need is addressed by the route planning theory, i.e., the integration of user criteria to get "better routes" (Hochmair, 2005). A novel and relevant task is the reconstruction of low-sampling trajectories based on the movement patterns and the geographical space where it occurs, e.g., the RN of a city (i.e., the possible locations of the MO are constrained by the geometry of the RN (De Almeida and Güting, 2005;Trajcevski, 2011)).  Orlando et al. (2007) No No Yes A simple interpolation is done for reconstructing trajectories as a previous step in a TDW proposal. Marketos and No No Yes The trajectory reconstruction is included in a module of a Theodoridis (2009) TDW using parameters such as temporal and spatial gap between trajectories, maximum speed and tolerance distance. Yuan et al. (2010;No Yes No Uses a Pattern-based approach for the offline preprocessing Chen et al., 2011; of historical trajectory data for discovering mining patterns Zheng et al., 2012) to infer routing information. However a route is inferred, not a trajectory. Chen et al. (2010) No Yes Yes The K best connected trajectories are given when a set of locations (queried points) is the input. Chang et al. (2011) Yes Yes No A familiar RN followed by a specific user is built using historical data. Routes are inferred from this familiar RN for the user. Chang et al. (2011) infers routes, not trajectories. Hsieh and Li (2013) Yes Yes No Uses Greedy search approaches, i.e., optimal local choices at every decision stage providing an online recommendation based on the best immediate location to be visited for constructing the route. However a route is inferred, not a trajectory.
The route among check-in data of a low sampling trajectory is built (filled in) with additional georeferenced data points and time-stamps. To help in this task, a graph that represents the RN is built where the vertices save geo-related information (longitude and latitude) and the edges describe the cost for reaching two vertices (Speičvcys et al., 2003). The routing algorithms rely on this representation to build the trajectory between two location points (Dijkstra, 1959;Hart et al., 1968).
Low-sampling data uncertainty management is a hard task to tackle. To facilitate this task, the trajectory reconstruction can rely on user preferences (a criterion), such as (minimize) distance or (visit) tourist attractions to try to fill in those silent durations. To the best of our knowledge, user preferences have not been considered in the low-sampling trajectory reconstruction problem. Our claim is that the movement of an object based on user preferences would generate some clues which may help in the trajectory reconstruction (Hung et al., 2011). Moreover, this may help to analyze the movement from different perspectives, i.e., depending on the criterion used for the reconstruction process.
In order to clarify the contribution of our paper, in Table  1 we refer to some research works based on route finding (or reconstruction of trajectories) using personalization and RN. Chang et al. (2011) and Hsieh and Li (2013) are the only ones who find the routes based on the RN and the user preferences. However, they only infer the route and the trajectory is not reconstructed.

Representation of Trajectories
Several models for representing trajectories have been proposed in the literature (Orlando et al., 2007;Spaccapietra et al., 2008;Chang et al., 2011). Most of them, except Spaccapietra et al. (2008), represent a trajectory as a sequence of geo-referenced points temporally ordered.
According to Orlando et al. (2007), a trajectory T i = (ID i , L i ) where ID i is the unique identification of the MO and L i is a sequence of M observations where is a set of time-points. L i ∈2 L , where L is the set of all possible observations. L i is temporally ordered, i.e.,

Reconstruction of Low-Sampled Trajectories
Given a trajectory T i of a MO where some pairs of observations j and j+1, 1≤j<M, may be separated spatially and temporally in such a way that they exceed a spatial user threshold β and a temporary user threshold τ, i.e., they are considered as low-sampled, our goal is to fill in each of these pairs with imputed observations so that β and τ thresholds are met. Our reconstruction process is based on a set of criteria Cset from the personalized route planning theory (Hochmair, 2005;Da Silva et al., 2008;Nadi and Delavar, 2011), such as time and distance.
We consider the network-constrained trajectories (TS, Ga), where TS is a set of trajectories and Ga is a directed and labeled graph representing the underlying constrained RN where the set of trajectories TS is constrained. The graph Ga is a two-tuple. Ga = (V, E), where V is a set of vertices {v i } (representing the intersections of the streets) and E is a set of edges {e k } (representing the segments of the streets). An edge e k has a source vertex (the initial part of an edge) denoted by v k,s , a target vertex (the end part of an edge), which is denoted by v k,t (the edge e k is traversed from v k,s to v k,t , but not the other way around) and an associated cost for traversing it denoted by c k ∈ ℝ, i.e., an edge is a tuple e k = (v k,s , v k,t , c k ). Each vertex v ∈ V can be described by a location x, y (longitude, latitude). We consider the graph Ga, which is derived from a RN, to be fully connected and without any isolated network segments. We consider the following functions, see also Fig • get_vertex_source: E → V. Function applied to an edge to get its source vertex • get_vertex_target: E → V. Function applied to an edge to get its target vertex • get_cost: E → ℝ. Function applied to an edge to get the cost of traversing the edge • get_x: V → X. Function applied to a vertex to get its longitude • get_y: V → Y. Function applied to a vertex to get its latitude The function road_distance: L X L X Cset →ℝ receives a pair of consecutive observations and a criterion of movement c and generates the road distance between them according to the given criterion. The road distance refers to the distance of a particular path followed by a MO between the two observations. It depends on the underlying RN and on the criterion c. Figure 2 shows three possible roads (depicted in solid lines) between consecutive observations A and B according to some criteria. The criterion c 1 (distance) used in the road drawn in green has the shortest road distance, followed by the road distance of the road drawn in blue using the criterion c 2 (time). Finally, the road distance is the longest when the criterion c 3 (tourist attraction) is used, i.e., road_distance(A, B, c 1 ) ≤ road_distance(A, B, c 2 ) ≤ road_distance(A, B, c 3 ). Note how the distance between these observations changes according to the criterion of movement and the RN that were used. Note also that the Euclidean distance, depicted as a dashed line, does not correspond to the road distance in any of the three cases.
We regard the trajectory i.e., the road distance according to a criterion c and a RN between two consecutive observations is longer than β (a user distance threshold) and their time difference is longer than τ (a user time threshold). We  Fig. 3. That is, when we consider raw trajectories with a RN, each point is mapped over a road segment by searching for its closest road segment. Because of this and following the approach of Zhixian (2011), the minimum distance between j i L and a road segment e k is computed by Equation 1: According to the RN mapping defined by Speičvcys et al. (2003) the end vertex of an edge e k is the initial vertex of the edge e k+1 , see Fig. 5. Therefore, get_x(get_vertex_target(e k )) = get_x(get_vertex_source(e k+1 )) and get_y(get_vertex_target(e k )) = get_y(get_vertex_source(e k+1 )).
Note that get_edge (    L ) = 2 pm + 3/4 = 2:45 pm Note that, after the reconstruction, it is possible that the imputed data points do not meet the β and τ thresholds. In this case, the longitude of the street segments is longer than the β threshold because this imputation stage only gets location points based on the edges of a graph Ga that represents the segments of a RN where a MO moves. Additional imputed data points can be obtained using interpolation methods between the imputed points, i.e., the start and the end vertex of an edge. The following equations find additional data points over a segment e k based on the line equation, see Equation 5:  Fig. 7 we show an example for finding additional data points for a segment e k , where get_x(get_vertex_source(e k )) = 3, get_y(get_vertex_source(e k )) = 1, get_x(get_vertex_target(e k )) = 6, get_y(get_vertex_target(e k )) = 5. Let β = 1.25, road_distance( j i L , 1 j i L + ,c) = 5, then we choose A = 1.25 and N = 4: • d 1 = 1.25, then x 1 = 3.75, y 1 = 2 • d 2 = 2.5, then x 2 = 4.5, y 1 = 3 • d 3 = 3.75, then x 3 = 5.25, y 1 = 4 Thus, the set of additional data points between (3, 1) and (6, 5) is {(3.75, 2), (4.5, 3), (5.25, 4)}. The timestamps for each of these points can be found by the proportional assignment of the time difference between observations. The results are also shown in Fig. 8, where we suppose that set_time(get_vertex_source(e k )) = 12 pm and set_time(get_vertex_target(e k )) = 4pm.

Implementation of the traj Function
Given (a) users check-in records describing a set of trajectories TS = {T i } from a certain location-based service and (b) a user criterion c; we claim that a "good" route should (a) meet the user criterion and (b) returns a more detailed trajectory T' I (as long as T i has at least a pair of low-sampled observations). Algorithm 1 calls the Function 1 (traj) for each pair of consecutive lowsampled observations of a trajectory T i .

2.
FOR EACH e k 3. // Use set_time function for setting the time to each vertex resulting from the routing algorithm

// Interpolate between O k and O k+1
Use Equation 6 and 7

How the Traj Function Works: An Example
To explain how the traj function works, let us consider a set of check-in data describing a trajectory of a particular user as shown in Table 2 and the RN of the city of Medellín, Colombia.
We get the nearest edges get_edge(Check-in A, Ga), get_edge (Check-in B, Ga) and get_edge (Checkin C, Ga) for each check-in observation. Next, the change of the imputed data of the reconstructed trajectories is shown as the criterion c changes. Let β be less than the actual road distance between each pair of check-in and c be less than the difference between time check-ins. Distance (Fig. 8), time (Fig. 9) and tourist attraction (Fig. 10) criteria were used. We also show the original trajectory, see Fig. 11.

Measuring and Comparing the Resulting Reconstructed Trajectories using Different Criteria with the real Ones
There are many approaches for measuring the similarity between trajectories in the literature review (Zhao et al., 2009;Tiakas et al., 2009;Hung et al., 2011). A similar approach proposed by Zhao et al. (2009) is followed: Two trajectories T 1 and T 2 are spatio-temporally similar, iff (a) Trajectories T 1 and T 2 have the same temporal granularity and the trajectories are spatially similar, i.e., SIM POI (T 1 ,T 2 ,θ) < θ, where SIM POI (T 1 ,T 2 ,θ) is a spatial similarity measure, see Equation 8, θ is a threshold to consider that two trajectories are spatially similar and that the Point Of Interest (POI) represents an interesting place.
The reconstructed trajectories have the same temporal granularity according to Zhao et al. (2009) because they have similar time-stamp assignment according to the method proposed here, in which the time-stamps are assigned proportionally. We consider the POIs as the road segments that a trajectory traverses. Next, we compute the SIM POI measured for 80 high-sampled trajectories in Medellín. The checkin data were simulated (time and location data were deleted) for those trajectories to get low-sampled trajectories and the (sub)trajectories were computed based on some criteria using the traj function between the simulated check-ins, see Fig. 12.  Check-in C Shop (-75.591672, 6.257514, 20140809173745) Note how the average SIM POI is higher when the distance criterion was used followed by the tourist attraction criterion, i.e., the best imputation process for this 80 trajectories could be achieved when some of these criteria were used. However, remember that the purpose of the trajectory reconstruction proposed here is to discover the new possibilities of reconstruction as an imputation process of the actual trajectories. The trajectory reconstruction procedure takes place in order to transform low-sampled location data into trajectories with a better sampling so that we can acquire some useful knowledge.

Technical Details
This technical details are intended to offer a more comprehensive understanding of the solution and it serves as a reference for future implementation of the system. It also pretends to provide the technical details to replicate the previously executed experiments. We used the following software tools: Next, we detail our sources. The source data can be extracted from multiple location-based devices and applications. For this technical proposal, JSON files were generated using Foursquare API and then read using Pentaho Data Integration. The Foursquare API was accessed using Apigee. We got details of the users from Foursquare (https://developer.foursquare.com/docs/users/users).
Data of the venues (POIs) registered in Foursquare in the city of Medellín, Colombia were also collected. For the points where the people make check-in, data of 80 active random users living in the Medellin, Colombia city were collected. Figure 13 shows an instance of the file gotten with this response.
Data of a list of check-in of the users described above were gathered during a week. A list of touristic points of Medellín, Colombia city were defined. Those were extracted from OpenStreetMap were people can tagged those places as touristic. See an example of this file in Fig. 14. The location for each one was also included. The idea behind this definition is to assign a lower cost to segments of the streets near to those touristic points. The Graph Map was gotten using osm2po-4.8.8. The traj function was implemented and carries out the reconstruction task proposed in this study. The implementation of the traj function, additional documents and all the software can be found at https://www.dropbox.com/sh/3mlfrveicpwjrgp/AADzZQ 8jneo9jpBFlofFkGSba?dl=0.

Conclusion
Valuable information can be extracted from trajectories. It can be useful for location-based services applications including trip planning, personalized navigation routing services, mobile commerce and location-based recommendation services. In this study, we reconstructed low-sampling trajectories using the personalization features of the routing theory based on a criterion evaluation over a graph. The traj function with different criteria can be used as an input for different mining algorithms over trajectories as a way to deal with analytics using uncertain trajectories. Here, we claim that analytics over reconstructed trajectories can change depending on the criterion used for their reconstruction. Moreover, this criteria-based reconstruction can be used to perform analytical tasks and to offer the possibility of formulating questions based on user criteria, such as: • How do regions of interest (Cao et al., 2005) change according to a chosen criterion of reconstruction during a determined time? • What are the main bottlenecks in the city in a determined period according to a certain movement reconstruction criterion? • What would be the fuel consumption if the vehicles moved according to a certain criterion in a determined period?

Author's Contributions
All the authors contributed equally to the writing of the manuscript. All the authors discussed and conceptualized the idea, contributed to analyses and interpretation of the results and to the preparation of the final manuscript.

Ethics
All the authors believe that there are no ethical issues that may arise after the publication of this manuscript.