The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

Khalid Jaber; Nur&rsquo;Aini Abdul Rashid; Rosni Abdullah

doi:10.3844/ajassp.2009.1368.1372

Research Article Open Access

The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

Khalid Jaber, Nur’Aini Abdul Rashid and Rosni Abdullah

Abstract

Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI) library and POSIX threads (PThread) to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.

American Journal of Applied Sciences

Volume 6 No. 7, 2009, 1368-1372

DOI: https://doi.org/10.3844/ajassp.2009.1368.1372

Submitted On: 3 April 2009 Published On: 31 July 2009

How to Cite: Jaber, K., Rashid, N. A. & Abdullah, R. (2009). The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering. American Journal of Applied Sciences, 6(7), 1368-1372. https://doi.org/10.3844/ajassp.2009.1368.1372

Copyright: © 2009 Khalid Jaber, Nur’Aini Abdul Rashid and Rosni Abdullah. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

4,313 Views
3,317 Downloads
11 Citations

Download

Keywords

Maximal cliques algorithm
parallel processing
protein sequence clustering
MPI
pthread