Journal of Computer Science

Arabic Short Text Compression

Iman Omer and Khalaf Khatatneh

DOI : 10.3844/jcssp.2010.24.28

Journal of Computer Science

Volume 6, Issue 1

Pages 24-28

Abstract

Problem statement: Text compression permits representing a document by using less space. This is useful not only to save disk space, but more importantly, to save disk transfer and network transmission time. With the continues increase in the number of Arabic short text messages sent by mobile phones, the use of a suitable compression scheme would allow users to use more characters than the default value specified by the provider. The development of an efficient compression scheme to compress short Arabic texts is not a straight forward task. Approach: This study combined the benefits of pre-processing, entropy reduction through splitting files and hybrid dynamic coding: A new technique proposed in this study that uses the fact that Arabic texts have single case letters. Experimental tests had been performed on short Arabic texts and a comparison with the well known plain Huffman compression was made to measure the performance of the proposed schema for Arabic short text. Results: The proposed schema can achieve a compression ratio around 4.6 bits byte-1 for very short Arabic text sequences of 15 bytes and around 4 bits byte-1 for 50 bytes text sequences, using only 8 Kbytes overhead of memory. Conclusion: Furthermore, a reasonable compression ratio can be achieved using less than 0.4 KB of memory overhead. We recommended the use of proposed schema to compress small Arabic text with recourses limited.

Copyright

© 2010 Iman Omer and Khalaf Khatatneh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.