TY  - JOUR
AU  - Parvathavarthini, B. 
AU  - Kanna, B. Rajesh 
AU  - Rajeswaridevi, L. 
PY  - 2008
TI  - Development of Deduced Protein Database Using Variable Bit Binary Encoding
JF  - Journal of Computer Science
VL  - 4
IS  - 6
DO  - 10.3844/jcssp.2008.467.473
UR  - https://thescipub.com/abstract/jcssp.2008.467.473
AB  - A large amount of biological data is semi-structured and stored in any one the following file formats such as flat, XML and relational files.  These databases must be integrated with the structured data available in relational or object-oriented databases. The sequence matching process is difficult in such file format, because string comparison takes more computation cost and time. To reduce the memory storage size of amino acid sequence in protein database, a novel probability-based variable bit length encoding technique has been introduced.  The number of mapping of triplet CODON for every amino acid evaluates the probability value. Then, a binary tree has been constructed to assign unique bits of binary codes to each amino acid. This derived unique bit pattern of amino acid replaces the existing fixed byte representation.  The proof of reduced protein database space has been discussed and it is found to be reduced between 42.86 to 87.17%. To validate our method, we have collected few amino acid sequences of major organisms like Sheep, Lambda phage and etc from NCBI and represented them using proposed method. The comparison shows that of minimum and maximum reduction in storage space are 43.30% and 72.86% respectively.  In future the biological data can further be reduced  by applying lossless compression on this deduced data.