Arabic Short Text Compression

Iman Omer; Khalaf Khatatneh

doi:10.3844/jcssp.2010.24.28

Research Article Open Access

Arabic Short Text Compression

Iman Omer and Khalaf Khatatneh

Abstract

Problem statement: Text compression permits representing a document by using less space. This is useful not only to save disk space, but more importantly, to save disk transfer and network transmission time. With the continues increase in the number of Arabic short text messages sent by mobile phones, the use of a suitable compression scheme would allow users to use more characters than the default value specified by the provider. The development of an efficient compression scheme to compress short Arabic texts is not a straight forward task. Approach: This study combined the benefits of pre-processing, entropy reduction through splitting files and hybrid dynamic coding: A new technique proposed in this study that uses the fact that Arabic texts have single case letters. Experimental tests had been performed on short Arabic texts and a comparison with the well known plain Huffman compression was made to measure the performance of the proposed schema for Arabic short text. Results: The proposed schema can achieve a compression ratio around 4.6 bits byte^-1 for very short Arabic text sequences of 15 bytes and around 4 bits byte^-1 for 50 bytes text sequences, using only 8 Kbytes overhead of memory. Conclusion: Furthermore, a reasonable compression ratio can be achieved using less than 0.4 KB of memory overhead. We recommended the use of proposed schema to compress small Arabic text with recourses limited.

Journal of Computer Science

Volume 6 No. 1, 2010, 24-28

DOI: https://doi.org/10.3844/jcssp.2010.24.28

Submitted On: 21 July 2009 Published On: 31 January 2010

How to Cite: Omer, I. & Khatatneh, K. (2010). Arabic Short Text Compression. Journal of Computer Science, 6(1), 24-28. https://doi.org/10.3844/jcssp.2010.24.28

Copyright: © 2010 Iman Omer and Khalaf Khatatneh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

5,743 Views
4,141 Downloads
3 Citations

Download

Keywords

Short text compression
Huffman coding
Arabic language
dynamic hybrid coding