3D Mesh Streaming based on Predictive Modeling

: The complexity in 3D virtual environment over the web is growing rapidly every day. This 3D virtual environment comprises a set of structured scenes and each scene has multiple 3D objects/meshes. Therefore the granular level of the block in a virtual environment is the object. In a virtual environment, it is required to give user interactions for every 3D object and at any point of time, it is enough if the system streams and brings in only the visible portion of the object from the server to the client by utilizing the limited network bandwidth and the limited client memory space. This streaming would reduce the time to present the rendered object to the requested clients. Further to reduce the time and effectively utilize the bandwidth and memory space, in the proposed study, an attempt is made to exploit the user interaction on 3D object and built a predictive agent which would minimize the latency in the rendering of the 3D mesh that is being streamed. The experiment result shows that the rendering time and cache miss rates are significantly reduced with the predictive agent.


INTRODUCTION
In recent times, 3D modeling and rendering has gained attention over the internet and most of the multiuser virtual environment renders the entire world once it is fully downloaded from the server. Therefore, to get the first response from the server, the clients/users ought to wait till the entire model is downloaded and rendered. Due to the increased complexity of the 3D model, even with the high bandwidth user has to wait for a longer time to view the model. To reduce the waiting time of the user 3D streaming technique is made available to the users. Based on the user camera/eye position and orientation, the visible portion of the model is made available to the user by culling the invisible portions. In this study, an attempt is made towards reducing the waiting time of the user further by predicting the operation that would be performed by the users. A Predictive Agent (PA) is built after successful offline analysis carried out on profiles collected from 50 different users (aged 18-22, from engineering institutions with good visual and computer senses). As part of the analysis, the speed of the key press and the pattern of the keys pressed are taken for analysis across various 3D models. For experimentation purpose, 3D models of various sizes ranging from a few KBs to few MBs are considered with various shapes. The PA contains the typical key press and patterns of all users which will be used further to predict their navigation. This in turn helps to optimize the 3D streaming and rendering over web by reducing the time delay between user request and response.

Related works:
Progressive Mesh (PM): Progressive Mesh (PM), a method proposed by Hoppe (1996) and Cheng et al. (2011), shows how an arbitrary mesh is stored as a much coarser mesh together with a sequence of N detail records that indicate how to incrementally refine exactly back into the original mesh. Each of these records stores the information associated with a vertex split, an elementary mesh transformation that adds an additional vertex to the mesh. The PM representatives thus defines a continuous sequence of meshes of increasing accuracy, from which Level of Details (LOD) approximations of any desired complexity can be efficiently retrieved.

Decimation of triangular meshes:
The goal of the decimation algorithm (Schroeder et al., 1992) is to reduce the total number of triangles in a triangle (polygon) mesh, preserving the original topology and a good approximation to the original geometry.
Streaming of 3D progressive meshes: Streaming of progressive meshes (Cheng, 2008) enable users to view 3D meshes with increasing level of details, by sending a coarse version of a mesh initially, followed by a sequence of refinements to incrementally improve the quality. This study concentrates on how to send refinements to quickly improve the quality. An analytical model is developed to investigate the effects of dependency when the progressive meshes are sent over a lossy network and also proposed a receiver driven protocol to stream progressive meshes based on the user viewpoint in a scalable way.
Efficient and feature preserving: Triangular mesh decimation: The method proposed by (Hussain et al., 2004) deals with a new automatic method for the decimation of triangular meshes in which at low levels of detail the system preserves visually important parts of the mesh and thus keeps the semantic or high level meaning of the model. The algorithm followed is based on greedy approach and exploits a new method of measuring geometric error employing a form of vertex visual importance that helps to keep visually important vertices even at low levels of detail and causes to remove other kinds of vertices, which do not profoundly influence the overall shape of the model.  (Lin et al., 2007). In this method, the mesh data of a 3D model were first converted into a JPEG 2000 image and then based on the JPEG 2000 streaming technique, the mesh data were then transmitted over the Internet as a mesh streaming.
View dependent mesh streaming with minimal latency: The study proposed by (Kim, et al., 2004) presents a framework for view-dependent streaming of multi-resolution meshes. Here, the server dynamically adjusts the transmission order of the detail data with respect to the client's current viewpoint. By extending the truly selective refinement scheme for progressive meshes to client-server architecture, it accomplishes an efficient view-dependent streaming framework that minimizes network communication overhead to facilitate minimal latency of mesh updates for varying viewpoints.
Design of geometric streaming systems: A system was designed to stream large graphics environments from a central server to multiple clients. The streaming is transparent to the user who can treat remote models just like local ones. The streaming system automatically adapts to the rendering capabilities, network bandwidth and latency of the client and transmits an optimized model (Deb and Narayanan, 2004).

General and Automated Polygon Simplification (GAPS):
The method uses an adaptive distance threshold and surface area preservation (Erikson and Manocha, 1998) along with a quadric error metric to join unconnected regions of an object. Its name comes from this ability to "fill in the gaps" of an object. The algorithm combines approximations of geometric and surface attribute error to produce a unified object space error metric.

Quadric based polygon surface simplification:
Automatic simplification (Garland, 1999) of highly detailed polygonal surface models into faithful approximations containing fewer polygons. The system, examines the hierarchical structure that is induced on the surface as a result of simplification. This resulting hierarchy can be used as a multi-resolution model-a surface representation which supports the reconstruction of a wide range of approximations to the original surface model.

Proposed Method: Predictive Model (PrM):
The proposed predictive model is based on understanding user navigation in the virtual world. It is built based on the current camera position and orientation. Therefore only the visible vertices and faces of the selected triangular meshes are brought to the client. Simultaneously, based on the previous history collated from various user inputs, the next set of predicted vertices and faces are also pushed to the client with the help of the Predictive Agent (PA). This would reduce the time delay between the user request and response.

Analytical model:
The main objective of the proposed study is to develop an analytical model based on the user interaction while viewing the 3D models over the network. The central idea is to predict the user navigation and construct an analytical model for every 3D object (3D meshes) using the PA. This predictive model hence would be useful in bringing the necessary surfaces during streaming so that rendering and response time can be reduced. To construct the predictive model (Predictive Agent: PA), the following notations have been used: Let S v be a set of mesh vertices in the server and S f be a set of corresponding mesh faces in the server for the selected 3D mesh and Let C v be the set of mesh vertices in the client where, C v ⊆ S v and C f be the set of corresponding mesh faces in the client where C f ⊆ S f .
On an Operation O i , which can be an arbitrary rotation (ϴ x , ϴ y , ϴ z ), C v and C f can undergo change ±∆ {V i } and ±∆{F i }.
For ±∆{V i }: Where: +∆{V i } = The set of vertices chosen from S v -∆{V i } = The set of vertices chosen out from C v For ± ∆{F i }: The set of faces chosen from S f -∆{F i } = The set of faces chosen out from C f Table 1 summarizes the notations used in our model.

Operation profiling:
To profile the interaction performed by the user, basically the Rotation operation Rϴ in any one of the directions: +ϴ x /-ϴ x , +ϴ y /-ϴ y , +ϴ z /-ϴ z is considered. For every key press during the rotation, a fixed angle of rotation is applied to the 3D object and outcome of the rotation generates updated eye position and eye orientation (eye refers to the camera position, which is the viewpoint of the user in the 3D world). Based on this operation, the speed of rotation is estimated which directly depends on the number of key pressed per second. The key presses would determine the amount of angle being rotated per second.
Based on the rotation output, the amount of change in the vertices and faces (+∆{V i } and + ∆{F i }) that ought to be transmitted to the client is predicted. The predicted faces and vertices only are transmitted to the client. The prediction, hence, would reduce the response time taken for rendering the visible portion of the 3D mesh based on the client input.
User profiling: To construct the predictive agent, an offline analysis has been carried out by considering 50 user profiles taken from a range of novice to professionals in interacting with 3D virtual world. The user profiles include, rate at which the key is pressed and the actual key that is pressed per user session on various 3D meshes considered for analysis. Using the collated user profiles, operation patterns are determined and predictive model is built. This process is considered to be a training session for the users before they actually navigate the virtual world.  Once trained, the users would be able to get the rendered 3D models with a better response time across the network while interacting with the 3D web.
Representation of 3D streaming system: The proposed predictive model is used to stream the 3D data from the server to the requested clients in an effective manner. The implementation details of streaming system are discussed here. Figure 1 and 2 shows the 3D streaming system and its components as schematic diagram.

Client module:
The client module comprises of 3 components namely Client Cache, Renderer and Visibility Culler. Also, the module receives a key press and the name of the 3D object to be viewed as the Client Input. The key pressed and the operation performed based on the key press is specified in the Table 2.
The operation performed as indicated in Table 2 would update the client's eye position and eye orientation for every key press during navigation into the virtual world. Client cache: Inspired by the cache memory model (Hennessy and Patterson, 2007), a cache is built on the client side during the rendering. Initially, a client module receives the 3D mesh data from the server based on the current client's eye position and navigation. Once the 3D mesh data is brought to the client, it is set as referred data at the server end. In the client side, the data are stored in the Client Cache. Further, based on the client input, the 3D mesh data is received only when it is referred for the first time.
Otherwise, data will be fetched from the Client Cache. In this case, retrieval is made from local and thus the transmission time and bandwidth is saved.

Visibility culler (for client):
The Visibility Culler is implemented using back face culling algorithm (Moller et al., 2008). Initially, based on the user key press, eye position and eye orientation are calculated. With the updated eye position and orientation, visible portion of the object is determined with the help of back face culling/hidden surface elimination algorithm. The client side visibility culler algorithm is activated when the required vertices and faces are already bringing to the client.

Renderer:
The Renderer (Moller et al., 2008) is implemented to render the 3D data which is visible to the user at that particular eye position and orientation. The rendering speed is maintained with the help of predictive agent that rests in the server.

Server module:
The server module comprises of 3 components namely Server Agent, Predictive Agent and Visibility Culler. Also, the module retrieves the 3D mesh data based on the client input from the underlying 3D Mesh database.  (Moller et al., 2008) that determines the set of vertices and faces that has to be sent to the client whenever there is a request corresponding to those faces and vertices through user navigation.

Server agent:
The Server Agent receives the client input and output from the visibility culler and store into the dynamic data structure with the reference bit is set against the corresponding vertices and faces that have to be sent to the client. Also, the server agent keeps track of the no. of times each vertices and faces have been referred.
Predictive agent: The Predictive Agent in parallel with the server agent also receives the client input and the output from the visibility culler. Based on the user profiling analysis carried out offline by collating the user interactions of 50 users across various models, the next key press is predicted and the corresponding 3D data are retrieved.   These 3D data's reference bits are also set and it is sent to the client along with the requested data. This prediction would minimize the rendering latency and increase the cache hits. The 3D meshes # used are tabulated in Fig. 3. This table also shows the rotated or zoomed (in/out) 3D meshes. In each row, top one shows the actual and bottom one shows one of the screen shots of 3D mesh during the user interaction. The attributes of the 3D objects are given in Table 3.

Experimental Results and Discussions:
To conduct the experiment and affirm that the predictive model for 3D mesh streaming and rendering would lessen the response time in rendering and reduce the cache miss rate, 16 standard 3D mesh models with various numbers of vertices and faces starting from simple 3D mesh model to the complex one are considered. Table  4-10 and Fig. 4-12 illustrates the various experimental results which indicate that the streaming using predictive agent is advantageous than downloading entire 3D model of the client. It is found from these results that, by exploiting the viewpoint of the client, the visible portion of 3D meshes are streamed and rendered, instead of downloading the entire object to the client. This avoids the initial waiting time of the client. The client can quickly view the first response received from the server without much delay. Also, before the client requests for the next chunk of data by changing his viewpoint, the predictive agent would determine the probable move the client might make and client cache is updated if it is the demanded data.  This prediction reduces the cache miss rate and also the rendering time as the future data is made available in the cache before it is requested. Table 4, Fig. 4 and 5 highlights the average number of vertices and faces brought to the client after multiple accesses across various models. It clearly shows that none of the instances all the vertices and faces is referred by the client. Therefore, we shall conclude that mesh saving and bandwidth saving can be achieved through streaming. Table 5, Fig. 6 and 7 shows the results for multiple user interactions across various models after multiple simultaneous accesses. Once again the results prove that mesh saving and bandwidth reduction is possible through streaming. Table 6 and Fig. 8 shows that number of meshes brought to the client initially and the result shows that on an average only about 40% of the meshes are saved in the server end itself and is not brought to the client. Table 8 and Fig. 10 shows the results of average user speed across various models in seconds for multiple degrees or key presses. This result is used to determine the time gap between the user interactions. This is studied for pushing the predicted 3D data to the client before it is requested.       Table 9 and Fig. 11 show the client cache hit/miss without including predictive agent. It clearly highlights when the complex model is accessed from multiple viewpoints, all the vertices and faces viewed already are not referred as a whole after quite a large number of accesses. Table 10 and Fig. 12 show the client cache hit/miss with predictive agent. Since the next move is predicted and the corresponding faces and vertices are brought to the client well in advance before it is requested by the client, it is considered as a cache hit. The result proves that predictive agent could bring in the probable vertices and faces that would be referred by the client in comparison with nonproductive approach.

CONCLUSION
The proposed study addresses the need for streaming with predictive agent. The system attempts to stream the 3D data from the server to the client based on the viewpoint of the client by predicting the user's next move.         It is proved that the predictive model reduces the waiting time of the client and he/she can see the first response quickly when it is compared with the full download of the model from the server to the client. Once the initial model is streamed and rendered on the client side, as per the client's further interactions, the referred 3D data are transmitted to the client from the server. If the required data is already in the client then the rendering process is carried out without streaming. In this working model, an additional flavor is added to predict the probable move of the client across models by profiling multiple user interactions. A predictive agent is constructed and the result shows that the rendering time and cache miss rates are significantly reduced. The study can be further extended for a scene.