Genome Sequence Analysis: A Survey

Hassan Mathkour; Muneer Ahmad

doi:10.3844/jcssp.2009.651.660

Research Article Open Access

Genome Sequence Analysis: A Survey

Hassan Mathkour and Muneer Ahmad

Abstract

Problem statement: Sequence analysis problems are NP hard and need optimal solutions. Interesting problems include duplicate sequence detection, sequence matching by relevance, sequence analysis using approximate comparison in general or using tools i.e., Matlab and multi-lingual sequence analysis. The usefulness of these operations is highlighted and future expectations are described. Approach: This study described the concepts, tools, methodologies, algorithms being used for sequence analysis. The sequences contained precious information that needed to be mined for useful purposes. There was high concentration required to model the optimal solution. The similarity and alignments concepts can not be addressed directly with one technique or algorithm, a better performance was achieved by the comprehension of different concepts. Results: We had compared different approaches using exemplary data and found that ClustalW2 is fairly good tool in terms of analysis. We assigned different weight values for relevant features and obtained score 95 in comparison phenomenon and 45 in alignment. Conclusion: Different techniques and approaches had been evaluated and compared.

Journal of Computer Science

Volume 5 No. 9, 2009, 651-660

DOI: https://doi.org/10.3844/jcssp.2009.651.660

Submitted On: 29 June 2009 Published On: 30 September 2009

How to Cite: Mathkour, H. & Ahmad, M. (2009). Genome Sequence Analysis: A Survey. Journal of Computer Science, 5(9), 651-660. https://doi.org/10.3844/jcssp.2009.651.660

Copyright: © 2009 Hassan Mathkour and Muneer Ahmad. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,284 Views
4,290 Downloads
4 Citations

Download

Keywords

Genome
multi-lingual
approximate matching
nucleotide base pair
corpora
duplicate sequences