Framework for Enhancing the Performance of Data Intensive MPI based HPC applications on Cloud
Ashwini Janagal Padmanabha and Sanjay Harogolige Adimurthy
Journal of Computer Science
Cloud computing is a new technology which is revolutionizing the current business model with pay-per-usage resource provisioning method. This model proves to be more profitable compared to traditional resource procurement and maintenance model. Data intensive High performance applications (HPC Application) handles large scale data sets on cluster/grid environment for enhanced performance. Most of these applications belong to the MPI category, where the work is assigned to multiple processes which communicate amongst each other to furnish the task. These applications prefer cluster/grid environment because of the homogeneity and high end resource availability. Cloud can be a better platform for these applications, as it consists of large quantity of resources. But, this technology is avoided by the HPC user community for the reasons of performance degradation, which is caused by the virtualization layer and sharing of resources. Static cluster instances as a resource provided by many cloud vendors like Amazon, CDAC etc. provides good performance by sacrificing the resource utilization factor. The work proposed here provides a framework for enabling data intensive MPI based HPC applications on cloud with dynamic cluster formation. Placement of the virtual machines hosting the individual processes and their distance to the data plays an important role in deciding the performance of application, as data transfer delay plays an important role in deciding the speed of execution. The framework provides two VM scheduling strategies towards improving the performance of data intensive HPC applications. The strategies with prioritized shared memory based communication of data to the process is implemented and tested on the private cloud. The work considers two most widely used data distribution models; Distributed volume and Striped volume. First VM scheduling strategy is implementable for distributed volume where complete data file will be hosted on single data server and the results show an improvement of around 88% in the best case. Second VM placement strategy can be used in more fine tuned distribution where stripes of single data file is distributed across different data servers. Here we have observed around 70% improvement in the performance of application compared to normal VM placement methods.
© 0000 Ashwini Janagal Padmanabha and Sanjay Harogolige Adimurthy. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.