TY  - JOUR
AU  - Jaber, Khalid 
AU  - Rashid, Nur&rsquo;Aini Abdul 
AU  - Abdullah, Rosni 
PY  - 2009
TI  - The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering
JF  - American Journal of Applied Sciences
VL  - 6
IS  - 7
DO  - 10.3844/ajassp.2009.1368.1372
UR  - https://thescipub.com/abstract/ajassp.2009.1368.1372
AB  - Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI) library and POSIX threads (PThread) to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.