TY - JOUR AU - Jaber, Khalid AU - Rashid, Nur’Aini Abdul AU - Abdullah, Rosni PY - 2009 TI - The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering JF - American Journal of Applied Sciences VL - 6 IS - 7 DO - 10.3844/ajassp.2009.1368.1372 UR - https://thescipub.com/abstract/ajassp.2009.1368.1372 AB - Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI) library and POSIX threads (PThread) to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.