Research Article Open Access

Framework for Enhancing the Performance of Data Intensive MPI based HPC applications on Cloud

Ashwini Janagal Padmanabha1 and Sanjay Harogolige Adimurthy1
  • 1 Nitte Meenakshi Institute of Technology, India

Abstract

Cloud computing is a new technology which is revolutionizing the current business model with pay-per-usage resource provisioning method. This model proves to be more profitable compared to traditional resource procurement and maintenance model. Data intensive High performance applications (HPC Application) handles large scale data sets on cluster/grid environment for enhanced performance. Most of these applications belong to the MPI category, where the work is assigned to multiple processes which communicate amongst each other to furnish the task. These applications prefer cluster/grid environment because of the homogeneity and high end resource availability. Cloud can be a better platform for these applications, as it consists of large quantity of resources. But, this technology is avoided by the HPC user community for the reasons of performance degradation, which is caused by the virtualization layer and sharing of resources. Static cluster instances as a resource provided by many cloud vendors like Amazon, CDAC etc. provides good performance by sacrificing the resource utilization factor. The work proposed here provides a framework for enabling data intensive MPI based HPC applications on cloud with dynamic cluster formation. Placement of the virtual machines hosting the individual processes and their distance to the data plays an important role in deciding the performance of application, as data transfer delay plays an important role in deciding the speed of execution. The framework provides two VM scheduling strategies towards improving the performance of data intensive HPC applications. The strategies with prioritized shared memory based communication of data to the process is implemented and tested on the private cloud. The work considers two most widely used data distribution models; Distributed volume and Striped volume. First VM scheduling strategy is implementable for distributed volume where complete data file will be hosted on single data server and the results show an improvement of around 88% in the best case. Second VM placement strategy can be used in more fine tuned distribution where stripes of single data file is distributed across different data servers. Here we have observed around 70% improvement in the performance of application compared to normal VM placement methods.

Journal of Computer Science
Volume 13 No. 8, 2017, 320-328

DOI: https://doi.org/10.3844/jcssp.2017.320.328

Submitted On: 5 January 2017 Published On: 7 August 2017

How to Cite: Padmanabha, A. J. & Adimurthy, S. H. (2017). Framework for Enhancing the Performance of Data Intensive MPI based HPC applications on Cloud. Journal of Computer Science, 13(8), 320-328. https://doi.org/10.3844/jcssp.2017.320.328

  • 2,809 Views
  • 1,542 Downloads
  • 0 Citations

Download

Keywords

  • Cloud
  • Data Intensive HPC
  • VM Placement
  • Data Distribution