Information Hiding: A Generic Approach

,


INTRODUCTION
Steganography is one of the information hiding techniques, defined as covered writing [1] It is the process of hiding data inside other data.For example, a text file could be hidden within an image or a sound file [2] .For the purpose of our research (textsteganography), we consider setagongraphy as a method of hiding a secret message in another message [3] .Hence, stegangraphy is about concealing the existence of the message.In contrast, cryptography is about concealing the contents of the message [4] .The resulting product of steganography is called stego-text, while the resulting product of cryptography is called cipher text.Despite the covert and malicious uses of both, they also allow legitimate uses such as privacy and security over communication channels.Textsteganogrphy proceeds according to the following scheme: • A secret message (embedded, hidden data) is concealed in cover-text using an embedding algorithm to produce a stego-text • The stego-text is then transmitted over a communication channel (Internet) • Upon its delivery, the secret message is recovered using an extracting algorithm • The embedding and the extracting algorithms are augmented by the so called a stego-key to encrypt and decrypt the hidden data respectively The secret message is concealed using the following methods: • Modification of the cover-text, such as insertion of spaces, misspelling, modifying the features (name, shape, position, color, size) of the individual characters [5] .• Substitution, such as replacement of insignificant data within the cover text by hidden ones [6] .• Generation, such as creation of a fake cover [7] .
The most recent efforts, techniques and tools are based on the presented scheme and make use of one or more of the above mentioned concealing methods.The ones related to our study are as follows: • Por and Delina [6] suggested an approach based on inter-word and inter-paragraph spacing to generate dynamic stego-text • Bender et al. [2] suggested a technique based on combining the following methods: Open space; syntactic ( punctuation) and semantic encoding (synonym words) • Kwan [8] developed a tool, called SNOW, based on open space concealing method combined with compression and encryption • Chapman and Davida [9] suggested a technique based on natural language processing and using the sentence structures as a place for concealing data • Bergmair [10] investigated the different setgosystems that are based on natural language processing and proposed a linguistic coding scheme Our proposed approach follows the presented scheme and defines a generic model for information hiding as the 5-tuple: GINM = (D1, D2, CO, SO, SD, CON, UCON) Where: • CO = CO1….COn and SO = SO1…SOn represent the secret object and the cover object respectively.Such that: COi and SOi are elements from a given domain D1 • CON (E (SO), CO)→SD is a concealing function.
Such that SD is the stego-domain respective to embedding the encoded form of SO in CO • E(SO): SO→Sm is a mapping function to encode SO into an object from the encoding domain D2 • UCON (SD)→CO is un concealing function that extracts the secret object from the respective stegodomain Based on the GINM model, the construction of a steganography system is reduced to instantiating the generic functions from which the GINM is composed.For example, considering the cover and secret domains (D1 and D2) as alphabets from a given natural language, the secret and cover objects (SO and CO) are instantiated as a secret message and a cover text respectively.Where: CO1….COn and SO = SO1…SOn are defined as characters from the language alphabet.A concealing function can then be defined based on different encoding and embedding methods.We borrow an example from [5] , where: The features of the individual characters (shape, position) are defined in a form of the so-called codewords and are represented in a codebook, used by both an encoder and a decoder.Given a secret message SO, the concealing function CON (E (SO), CO ) is then defined to substitute each SOi∈SO respective to COi∈CO by watermarked one (Sm).Where Sm is produced by the function E(SO) as a mapping (codeword (codebook, CO)) from the codebook.
In this study, we have implemented text steganography system using the proposed approach and as described in the following sections.In addition to its efficiency and generalization, the proposed system is distinguished from similar ones by the following: • The system permits its use as an encryption system • The system is based on a generic approach and a generic implementation framework.Hence, it combines different encoding and embedding techniques The system is a multi lingual.In addition, it accepts and generates both dynamic and static secret massages, as well as stego texts respectively.

MATERIALS AND METHODS
The main objective of this research is to develop an efficient and a generalized information hiding approach that contributes to the privacy and security of messages over communications channels.
Based on such approach, a generic steganogrphy system is defined based on instantiation of the proposed GNINM model by the 5-tuple: GSTS = (L1, L2, CT, SM, ST, CON, UCON) Where: • CT and SM are a cover text and a secret message respectively, represented by characters from a given natural language L1 • L1 is a binary {0, 1} encoding language The implementation of the steganography system GSTS is then reduced to the implementation of an interaction context and the functions: CON, E and UCON.They have been implemented according to the algorithms given below using C#.NET 2005 as a programming tool.As a result, a steganography system has been constructed with a data flow diagram as shown in Fig. 1 and the following functionality: • The interaction context involves two users (a sender and a receiver) and the following activities.
The sender interaction context facilitates: User authentication; browsing of the Secret Message (SM) and the Cover Text (CT) from their respective text files and initiation of the concealing process.In addition to authentication, the receiver interaction context facilitates: Browsing of the stego-text from its respective text file and initiation of the un concealing process SM is then decomposed based on a generic matching criterion as follows: For a given text SM, a generic matching predicate at position i of SM is defined as MPi (SM) = {0,1}.The text matching criteria is defined as: such that MPi = 1.The encoding function is then defined as: E(SM) = ((MC i, T i (SM)×rws i )∪(MC j ,T j (SM×rws j ) ∪…∪(M(C n ,T n (SM)×rws j ))→(Pi∪…∪Pn) (1) Where: T i (SM)∈SM = A string of characters from SM up to the position (i ) rws i [11]   = Term rewriting rules, defined based on the encoding strategy Pi = (MC i , T i (SM)×rws i ) = A pattern, obtained as a result of rewriting T i (SM) according to rws i Thus, the encoding function E(SM) as defined by display (1) can be implemented using any of the linguistic encoding methods, either using the syntactic ones or using the semantic ones.Further more, E(SM) can be used as a stand alone encrypting function.For example, the rewriting rules (rws i ) can be defined as substitution ones to replace characters, words and paragraphs from the secret text (SM) by a respective synonyms from the same language or from a different language.In addition patterns can be transmitted by the sender in agreed upon order, where then they are assembled by the receiver according to the same order.
For our research purpose, the function E(SM) is implemented as summarized by the algorithm given in algorithm 1 and as discussed below: • The decomposition of secret text (SM) is performed based on syntactic methods, where SM is considered as composed of multiple lines.Each line is decomposed into subsequences T i (SM) based on the number of white spaces within each subsequence.Hence, the matching predicate at position i of SM is defined as MPi (SM) = 1, if the character at that position is blank (white space).
The text Matching Criteria MC(SM) is then defined as number of the blanks up to a given position within SM.Subsequently, SM is decomposed into the subsequences T i (SM), T j (SM),…,T n (SM) based on such criteria.To simplify the implementation of the function E(SM), the number of the subsequences is determined based on the context of the secret message.In our implementation, we have assumed a maximum of three subsequences per line.Hence, a line i of SM is represented as Ti1(SM) Ti2(SM) Ti3(SM).
Therefore, the text is represented as i ∪ Ti1(SM) Ti2(SM) Ti3(SM).Where i ∈ [1,2,…,n] represents the line number • The individual subsequences (T i (SM)) obtained from step 1 are then processed by the respective rewriting rules (rws i ), defined as the composite function: RSW i : Compress (Encrypt (Binary (T i (SM))→Pi Where: • Pi is a pattern representing the subsequence T i (SM) in encoded form • Binary is a function that converts T i (SM) into a binary according to two methods.The first method uses the UTF-8 encoding to facilitate dynamic secret messages and subsequently dynamic stegotext.The second method uses Huffman code [12] with squeezing to facilitate static, but efficient encoding • Encrypt is a function that encodes the stream of bytes as generated by the function Binary using the built-in C#.NET encryption tools.Where, such stream is "exclusive-ored" with a random key Where: • The embedding method is represented by embedding criteria ECi… ECi and respective rewriting rules er i ,.. Based on the definition as given by display (2), the implementation of the concealing function is reduced to its instantiation by a particular embedding method.We adopt a method that is similar to the one suggested by Por and Delina [6] .But, with appropriate modifications.The modified method is a combination of the open space method and the syntactic method.Its implementation algorithm is given in algorithm 2. Where: • The compressed patterns Pi,…, Pn are rewritten as respective sequence of white spaces.Such that the digit "1" is rewritten as two spaces and the digit "0' is rewritten as one space • The embedding criteria and the rewriting rules are defined based on the white spaces and the punctuations occurring within the cover text to meet the following objectives: • To select the covers (CT|i,…, CT|n)∈CT that are suitable for embedding the corresponding patterns Pi,…,Pn.Hence, the embedding criteria is reduced to degree of suitability in terms of the number of the white spaces needed by the individual patterns.Based on such criteria, a function (split) is defined to decompose the cover text into individual covers (CT|i), consisting of one or more cover lines • To rewrite each cover CT|i∈CT by inserting the white spaces respective to its corresponding pattern Pi∈T(SM) • To distinguish between the white spaces as they occur within the text CT and the ones used for rewriting the individual patterns.
Hence, punctuations are used as end markers for the individual patterns • To contribute to the quality of information hiding in terms of its security and robustness.Hence, the embedding criteria and subsequently, the function split are extended by the requirement for a random allocation of the individual covers rather than a uniform one Algorithm 2: Embedding algorithm: Input: The cover text CT and the set of patterns {P1,….,Pn}Output: The stego text represented by the set {ST1,…,STn}.Method: For each pattern Pi {CTi = Split(CT); For j = 1 to Pi.length {If (pi[j] = "1") {STi = STi + CTi[j] + " "} Elseif {STi = STi + CTi[j] + " "} } STi = STi+"end marker" STi = STi+Remaining (CTi); Return STi Un concealing Function: A generic implementation for the un concealing function UNCON is defined as: Where: • The decoding method is represented by respective criteria DCi… DCi and rewriting rules dr i ,..

RESULTS
Based on the proposed approach and its respective implementation methodology, a steganography system has been developed with an interaction context represented by two forms as given in Fig. 2 and 3 respectively.The first form is denoted by encoding facilitates interaction with the presented encoding and concealing functions.Furthermore, it is augmented with quality indicators such as the size of the secret message and the browsed cover text as well as the hiding ratio.The latter gives the utilization percentage of the cover text by the hidden message.The second form of the interaction context is denoted by decoding and facilitates interaction with the presented un concealing function.
Through its interaction contexts, the proposed steganography system has been tested using several multilingual texts (Arabic and English).The results are summarized as follows: Results for static stego-texts: The static-stego texts are generated using Huffman code with a compression.We have tested texts with different size.Representative results are given in Table 1 in terms of: 1) the size of the Secret Message (SM); the cover text (CT) and the Stego Text (ST) and 2) the number of the patterns that are hidden in the stego-text as respective encoding of the secret message.Figure 2 shows a secret message and a cover text that has a 25% utilization percentage.

DISCUSSION
The experimental results have demonstrated the efficiency and flexibility of the proposed text steganography system.Furthermore, the system is based on an approach that is formalized in a generic way.This enables different methods to be combined and adopted for its implementation and contributes to its improvements robustness and scalability.For examples: • Two encoding methods have been adopted, the binary and the Huffman encoding.Such adoption permits comparing their advantages and disadvantages.It was found that first method is characterized by its flexibility.It enables hiding dynamic secret texts.However, it requires cover texts with a considerable larger size.On the other hand, the second method is associated with the extra overhead, in terms of time and space, needed for encoding and decoding • The adopted embedding criteria combine suitability and randomness to ensure robustness of the stegotext Compared to similar systems such as the one suggested by Por and Delina [6] , the proposed system has a comparable results in terms of ratios between the sizes of the secret messages and the cover texts.Our approach has achieved 1:4 ratio while in [6] a cover text with a size <16 kb is required for a secret message with a length <4 kb.However, our system has better quality indicators such as less utilization percentage of the cover text and its robustness.Furthermore, it can be adopted for dynamic and static secret texts as well as cover texts.

CONCLUSION
In this research, a generic information hiding model has been suggested.Based on such model, a text steganogrphy system has been implemented.The system is characterized by its generality, scalability and flexibility.Although the proposed system has better quality indicators than the ones for similar system, more improvements are needed for such indicators.Mainly, the capacity of the cover text and the robustness of the stego text.Hence, efforts in this direction constitute a future research.

3 . 3 :
dr n. with an objective to decode the embedded white spaces within the individual stego-texts STi into their respective patterns Pi • The individual patterns Pi are then decompressed, decrypted and decoded into the respective parts SMi of the secret massage SM Based on display 3, the implementation of the function UNCON is reduced to the instantiation of its generic definition by specific algorithms.For example, the decoding of the individual patterns Pi into their respective part SMi of the secret message MS is give in algorithm Algorithm Decoding algorithm: Input: The individual patterns Pi Output: The respective part SMi of the secret Message Method: c = 0 For j =0 to (Pi.length -1) { For n =7 to 1 {if (Pi[c++] = "1") { SMi[j] =( SMi[j] | (1<<n)) } Return SMi

Table 1 :
Testing results for different secret messages No. of Patterns Size of SM (KB) Size of CT (KB) Size of ST (KB)