Text Summarization Using Morphological Filtering of Intuitionistic Fuzzy Hypergraph

: Text Summarization has been an area of interest for many years. It refers to creating a concise text of a document without any lose of information. Researchers in the area of natural language processing have developed many abstractive and extractive methods for creating summary. Abstractive summaries modifies the sentences and creates a modified concise form, while extractive summaries pick relevant sentences. The extractive method used in this study is a novel one which models the document as an Intuitionistic Fuzzy Hypergraph (IFHG). This IFHG is subjected to morphological filtering in order to create a concise summary. This is the premier work which applies morphological operations on IFHG that is modeled on a text. The method has generated summary which is almost similar to a human generated summary and showed more accuracy when compared with other machine generated summaries.


Introduction
Overview Intuitionistic Fuzzy Hypergraphs (IFHG) were introduced in (Parvathi et al., 2009), where the authors have also mentioned the (α,β) cut and the dual intuitionistic hypergraph. The same authors (Parvathi et al., 2012) have also developed many operations like complement, join, intersection etc on IFHG. Intuitionistic fuzzy sets and its applications in career determination has also been (Ejegwa et al., 2014) developed. They have used a normalized Euclidean distance method for finding out the suitable career for students depending upon their marks for various subjects. Rather than merely having a membership and non membership value, a hesitation margin was also introduced for every node in the IFHG. Generalized strong IFHGs were introduced (Samanta and Sumit, 2014), which can be used for partitioning and clustering. The modeling of a document as a hypergraph and its spectral partitioning (Dhanya et al., 2017) resulted in text clusters. Our paper shows how to model a document as IFHG and apply morphological filering on it to create a summary report. The organization of the paper is as follows. Section 2 is the related works in the field of text summarization, section 3 describes how a document can be modeled as intuitionistic fuzzy hypergraph, section 4 is an illustration of the application of various morphological operations like dilation, erosion on a document. Section 5 shows the filter operator on text, the design of summary filter with dilation and erosion, implementation of the system. The advantages of the system are given in section 6. The result analysis and comparison done with existing methods are given in section 7, section 8 is the conclusion and finally the references are included.

Graph based Methods
The Google brain team has developed and open sourced the tensor flow model (TST, 2016) for text summarization for generating news headlines on an annotated English giga word, where tensor flow is an open source library for numerical computation using data flow graphs. Interesting parts of the document are extracted using some metric (tf-idf) to create summary. There are many other graph based summarization methods, out of which five methods like HITS, positional power function, page rank methods, undirected graphs and weighted methods are compared (Mihalcea, 2004) and HITS and page rank seems to provide a better performance. Weighted directed graphs (Borhan et al., 2014) are also created by taking in to consideration the distortion measure. There, an edge is formed only if the distortion (semantic distance between node) is below a predefined threshold. In multigraph method (Fatima et al., 2015), there are more number of edges between two nodes (sentences). The number of edges equals the number of common words in both sentences. Results of this method are being compared against many online summarizers available and they have shown good performance. Lexical centrality (Erkan and Radev, 2004) is being used in LexRank method, where the sentences similar to many other sentences are found to be central to a topic. Given the similarity of each sentence to other sentences, the overall centrality of a sentence is calculated. The system has shown better results when compared to human summaries. On creation of a graph with multiple documents, sentence selection is done with segmented bushy path (Ribaldo et al., 2012) and depth first path method. Redundancy removal is being done at the end.

Neural Network based
Neural text summarization (Karthik, 2016) has defined the work as a task which generates an output sequence y 1 ,y 2 ,.....y m for an input sequence x = x 1 ,x 2 ,.....x n . The best summary, the one under the scoring function argmax (x, y) is used. A subset of sentences of Document D is created by predicting the label y L ∈0,1, where 0 stands for non inclusion in summary and 1 stands for inclusion in summary. All sentences are labelled by considering model parameters θ (Jianpeng and Mirella, 2016). Seven features of the document are extracted to create a feature vector, fed to the neural network, feature fusion is done (Kaikhah, 2004) and sentences are filtered. The system is tested for news articles and they got good accuracy. A set of eight features are extracted from a document and fed to a neural network with input layer, hidden layer and output neuron. After finding high ranked sentences, rhetorical structure theory (Kulkarni, 2015) is applied to find better summary. In the encoder decoder model (Urvashi, 2016) of text summary, an encoder reads the input sequence and computes the hidden state representation h x , decoder uses the h x to generate the target sequence y. Errors are back propagated from the decoder to encoder through h x and a minimum entropy model is created. A feed forward auto encoder (Mahmood and Len, 2017) is trained to encode the input x in a concept space c(x). In the encoding phase, the dimensionality is reduced to give a number of codes. Here features are learned by auto encoder rather than manually engineering them.

Genetic Algorithm Based
In a genetic algorithm based method (Carlos et al., 2004), each sentence of the document is represented by an attribute vector consisting of position, size, average tfsf, similarity to title, similarity to keywords, cohesion w.r.to other sentences, w.r.to centroid, depth of sentence in tree, direction of sentence in a tree which is obtained after applying hierarchical clustering algorithm, indicators of main concepts, presence of anaphors, proper nouns, discourse markers. They have applied a multi objective GA and a single objective GA and they have shown better results for multi objective GA. Document is represented using a DAG (Vahed et al., 2008) where every sentence S i is added to the graph in chronological order. Weights are then assigned using tf-isf, which are further used for calculating the similarity between two sentences. These similarities are used as edge weights. With these similarities yet other features like topic relation factor, cohesion factor, readability factor are formed. A fitness function designed based on the above three factors is used to calculate the fitness of a chromosome which consists of 1s and 0s, where 1 represents the inclusion of the sentence in the summary and 0 represents the non inclusion of the sentence in the summary. The system has demonstrated better results compared to others.

Fuzzy Logic Based
A lot of methods have been developed for text summarization using fuzzy based systems. A number of parameters like sentence position in paragraph, sentence length, similarity to title, similarity of keyword, similarity to text concept, proper noun, sentence cohesion are used in fuzzy systems. The authors (Farshad et al., 2008) have compared the score of vector based method and fuzzy method given by five judges and the fuzzy based summary gave a summary which reflects 77% of the concepts as opposed to 66% performance by the vector based method. In another method, the vector features  created for each sentence in the document include title feature, sentence length, term weight, sentence position, sentence to sentence similarity, numerical data etc. The results compared with word summarizer, copernic summarizer has shown a better result. Almost the same set of features are used by a triangular membership function (Babar and Pallavi, 2015) which fuzzifies each score to three values low, medium and high. A parallel summary using latent semantic analysis is also taken and both are merged to get the final summary. Experimental results have shown an average precision of 89%. A comparison of fuzzy system is done with neural network with features like cue pharses, legal vocabulary, paragraph structure, citation, term weight, named entity, similarity to neighbouring sentences, absolute location etc and a better result is demonstrated by fuzzy based system (Megala et al., 2014;Rajesh et al., 2014). A fuzzy logic based inference system computes the score of each sentence from highest to value above a threshold. The results compared with word summarizer shows a better output for fuzzy system (Farshad et al., 2010).

General Methods
Sentiment computation  of sentences are done which is further used for text summarization. Here the total, absolute and average sentiment scores of sentences are calculated to generate a P% summary.
Different sentence selection methods (Babar and Pallavi, 2015) are implemented such as term weighting, similarity measure and coverage upon which, a human learning algorithm is being applied. In a DAG-structured topic hierarchy (Ramakrishna et al., 2015) method, submodular optimization is being done. They have tried it on 1 million topics and 3 million correlation links. Many features like transitive cover, truncated transitive cover and several quality notions like specificity, clarity, relevance, coherence etc were considered. Text summarization has been used for sentiment analysis (Rupal and Yashvardhan, 2017) of reviews of different products like iPhones, camera, hard disks. The reviews in four languages namely English, German, French, Spanish are extracted from amazon. in, conducted a language translation, aspect identification, text summarization and finally sentiment analysis. Here the summary is based on sentence to centroid score, cue phrase score, sentence position score, numerical data and tf-idf score. Textual summaries of long videos (Shagan et al., 2017) are generated using recurrent networks where key frames are taken from impactful segments and are converted to textual annotations. The sequence of events in the video are summarized to generate a paragraph description.

Preliminaries
Let [H IF , (µ n , γ n ), (µ e , γ e ), H n , H e ] be an intuitionistic fuzzy hypergraph with membership degree µ n and non membership degree γ n defined on the set of nodes H n and membership degree µ e and non membership degree γ e defined on a set of hyperedges H e of H IF . While using the concept of hypergraphs in document modeling, the sentences in the document forms the hyperedges H e and the words in the document forms the nodes H n . The same method can be used in the case of an IFHG where it includes membership and non membership degrees for nodes and hyperedges. The membership value µ n of a node H n is the term priority p n of a word. i.e., the membership value of a word depends on the priority of the word. The words which are having less priority will have a high non membership value, so also the node H n which represents that word will have a less membership value µ n and high non membership value γ n . The words which are having high priority will have a high membership value, so also the nodes H n which represent those words will have a high membership value µ n and less non membership value γ n . The membership and non membership values of the words are assigned according to the Table 1 to 3 respectively. All other words in the document other than those given in Table 1 to 3 will have µ n = 0.5 and γ n = 0.5. Those words are medium words whose presence won't affect the result of morphological operations which are defined on sub IFHG X IF of H IF .

Assigning Membership Degrees
The membership degree µ(n i ) of some node H n is the sum of (normalized term frequency, membership value (as given in Table 1,2)) of the word. For such words, non membership degree is <= 1-µ(n i ). The non membership degree γ(n i ) of some of the node H n is the sum of (normalized term frequency, non membership value of the node (as given in Table 3)). Here the normalized term frequency is the count of the word in the document/number of words in the document. For such words, the membership degree is <= 1-γ(n i ). The membership degree of a hyperedge can be written as follows: It is the supremum of the non membership degrees of all the nodes H n in it, provided atleast one H n belongs to the non priority set nP j . The membership degree of such edges will be <= 1-γ(e j ). Let us illustrate this IFHG modeling with a small sample text. The text under consideration as in Fig. 1 is a preprocessed one from which the stop words are removed and which is subjected to lemmatization.
This sample text consists of seven sentences. The membership value and the non membership value of these words are calculated from Table 1 to 3. This membership/non membership value along with the normalized term frequency give the membership and non membership degree. For all words other than those in Table 1 to 3, the membership and non membership values are 0.5. Here we consider that the sum of the membership degree and non membership degree of the node (word) is <= 1 (Parvathi et al., 2009). i.e., µ(n i ) + γ(n i ) <= 1. So also the sum of the membership degree and non membership degree of the hyperedge (sentence) is <= 1. i.e., µ(e i ) + γ(e i ) <= 1 (Parvathi et al., 2009). The IFHG for the above sample text can be drawn as in Fig. 2.   In Fig. 2, we can see sentences modeled as hyperedges and words modeled as nodes. Nodes are having both membership degree µ(n i ) and non membership degree γ(n i ). The hyperedges are also having both membership degree µ(e i ) and non membership degree γ(e i ). Since there are seven sentences in the sample text in Fig. 1, there are seven hyperedges in Fig. 2. The hyperedge having the nodes n 1 , n 2 , n 8 and n 9 is an edge with only priority words so that it is having good membership degree. Due to the presence of nodes n 11 and n 15 which are having high non membership degree, the corresponding hyperedge is having less membership degree and high non membership degree. I.e., the presence of a single word with high non membership degree γ(n i ) influences the non membership degree of the hyperedge.   Here X IF ⊂ H IF , such that X IF consists of nodes with membership degree > 0.5 . The hyperedges in X IF has at least one node with membership degree > 0.5 and it should not contain any node with non membership degree > 0.5. i.e., the membership degree can be greater than 0.5, but the non membership degree should be less than 0.5. Now X IF is a collection of priority sentences and priority words as given in Fig. 3. Now let us apply morphological operations (Bino et al., 2017;Dhanya et al., 2018a;2018b) on this X IF . Let X n be the node set in X IF and X e be the edge set in X IF .

δ n (X e )-Dilation with Respect to nodes
This morphological operation is defined as: Take all edges in X IF . This will result in X e . Take all nodes X n in X e . Here we are selecting all hyperedges from H IF , which have atleast one node with membership degree >0.5 and which does not contain any node with non membership degree >0.5. Once we select such edges, we select the nodes in it with membership degree >0.5. This will ultimately give δ n (X e ). This retreives a collection of priority words within priority sentences as shown in Fig. 4.
Take all nodes X n . Find from H IF all the hyperedges which include X n . Here we select from X IF all nodes with membership degree >0.5. Find from H IF all hyperedges which contain those nodes. This will give all hyperedges which contain atleast one node with membership degree >0.5. These hyperedges may or may not contain nodes with non membership degree >0.5. This dilation selects all text which has atleast one priority word as shown in Fig. 5.  which overlap with the priority sentences. This is shown in Fig. 6.

∆(X e )-Dilation
This dilation can be written as: Find all hyperedges X e . Find all nodes in X e . Let it be X n1 . Find all hyperedges H e and the nodes in it. Let it be H n1 . For all X n1 ∩ H n1 ≠ empty, find the hyperedges from H IF . This will retreive all sentences which has atleast one priority word in priority sentences of X IF . The same is represented in Fig. 7 Take all nodes X n in X IF . Take all hyperedges in H IF which consists of these nodes only. This erosion as seen in Fig .8 strictly retreives priority sentences.

ε n (X e )-Erosion w.r.to node
The erosion ε n (X e ) can be written as follows: Take all hyperedges X e . Take its complement edges X e' in H IF . Take all nodes X n which are not in X e ∩ X e' . This will retrieve all priority sentences which do not overlap with any non priority sentences. Now take the priority words in it as shown in Fig. 9.

ε(X e ) -Hyperedge Erosion
The erosion ε(X e ) is defined as the following: Take all nodes in ε n (X e ). Take all edges from X IF which fully contains these nodes. This will retreive all priority sentences which do not overlap with the non priority sentences. This is illustrated in Fig. 10.

[δ,∆](X IF ) -Dilation
This dilation can be written as the following: As seen in Fig. 11, this is obtained by joining ∆(X e ) and δ e (X n ). Take all edges which are common in δ e (X n ) and ∆(X e ). Include all such hyperedges and its nodes as output. For other edges in δ e (X n ), include only nodes in it. This will retreive all sentences which overlaps with the priority sentences and the words in it. It also retreives all words in sentences which has both priority and non priority words and which do not overlap with others.

Implementation
The implementation of the summarization as shown in Fig. 12 and algorithm 1, is done with the help of a filter system developed in python for input English news taken from online news sites. The English news related to various topics are being subjected to stop word removal and stemming. The preprocessed text is then represented as a weighted hypergraph (Dhanya et al., 2017). The weighted hypergraph is subjected to spectral partitioning. Spectral partitions lead to text clusters. The summary filter is then applied to each cluster formed. The sentences which do not fall under any of the clusters are treated as outliers and are removed. A Malayalam summarization system is also developed using the same method, where a Malayalam stemmer (Dhanya et al., 2018c) is used to stem the words.

Filter Design
Filter is an operator which is idempotent and increasing defined on domain D. Let X IF be the sub IFHG defined in section 4, then If f(f(X IF )) = f(X IF ), then f is idempotent. If X and Y are sub IFHGs then if f(X)⊂f(Y), then f is increasing. F is a filter if both of these are satisfied. Let ε be the erosion operator and δ be the dilation operator. Then let ε○δ be an operator and if ε○δ (ε○δ(X)) = ε○δ (X) then ε○δ is a filter. That is, here filter consists of a erosion which is composed of dilation or we can say that we have dilation followed by erosion. Such a filter can be used for text summarization. Text summarization basically can be considered as a filter which removes all unwanted sentences from a text. We can also call summarization as a filter operator which selects only the needed sentences from the given text.

Summary Filter
Text summarization can be done with the help of this filter operator which is applied to the intuitionistic fuzzy hypergraph created from the text under consideration. This filter is designed as a combination of two morphological operators namely dilation and erosion. Here dilation is designed as a conditional one as explained in section 5.2.1 and erosion explained in section 5.2.2 is designed as the one which performs complement operation. For implementing this conditional dilation, let us assume that our text consists of certain star words, whose occurance in sentences are valid even if they cooccur with non priority words. So for this summary filter, let us assume that our text consists of words which are of high prority, words which are of low priority, words with neutral priority and star words. Let us redefine the intuitionistic fuzzy hypergraph as [H IF , (µ n , γ n ), (µ e , γ e ), H n , H *n , H e , H *e ], where H *n is the star node and H *e is the edge which has the star node H *n . These star words are domain independent. Some of the star words are given in Table 4. Sentences which contain star words are definitely included in the summary text. To illustrate this, let us add one more sentence to our sample text as the following. "The arrest of the famous player.....". Now this will result in new hyperedge with the following nodes. famousn 21 playern 8 arrest-n 15 . The modified intuitionistic fuzzy hypergraph after the addition of the above sentence is given in Fig. 13. The sub IFHG X IF is also getting modified since it will have the star nodes also in it. The modified X IF can be shown as in Fig. 14.

Conditional dilation -δ c (X IF )
This conditional dilation is applied such that while dilating the sub IFHG X IF we consider the condition specified by c, where c is designed such that it selects all hyperedges in H which consists of star nodes given in Table 4: This conditional dilation will retrieve all edges from the intuitionistic fuzzy hypergraph, such that it consists of all edges H *e , which consists of star nodes H *n as given in Fig .15. Even though the non membership degree of the edge H *e is 0.7, it is retreived in the dilation operation which is applied, since it contains the star node H *n .

Erosion -ε(H *e , X e )
This erosion will retrieve all edges ε' from H IF which are not in H *e . Also take all edges ε" from H IF which are not in X e . The intersection of the two will result in the retrieval of non priority edges. Now the complement of this will yield the priority edges from the hypergraph H IF . This erosion will eliminate all duplicate edges from H *e and X e and retrieve us the most important sentences which itself is the required feature of a summary. This erosion can be written as the following: where, H *e' is the complement of H *e and X e' is the complement of X e . The intutionistic fuzzy sub hypergraph retrieved after filter can be shown as in the Fig. 16. Algorithm 1 Summarization of text 1: Collect news related to various topics from online sites 2: Preprocess the sentences by subjecting to stop word removal and stemming 3: Create weighted hypergraph H wτ of the text τ 4: Cluster the text τ using spectral partitioning (Dhanya et al., 2017) of hypergraph H wτ 5: for each cluster C i do 6: Assign µ(n j ) and γ(n j ) for all words C i 7: Assign µ(e j ) and γ(e j ) for all sentences in C i 8: Create intuitionistic fuzzy hypergraph H IF with nodes H n having (µ(n j ), γ(n j )) and hyperedges H e having (µ(e j ), γ(e j )) 9: Create subgraph X IF of H IF with hyperedges X e having µ(e j ) > 0.5 and nodes X n having µ(n j ) > 0.5 10: Apply conditional dilation H *e = δ c (X IF ) 11: Apply erosion ε(H *e , X e ) to form the summary 12: end for

Advantages Over Existing Systems
The summarization system which is designed here as a filter applied on IFHG has many advantages over existing summarization methods developed so far. They can be listed as the following.

Variety of Summary Filters
As we all know, a filter is basically a composition of dilation and erosion or erosion and dilation. The proposed new method helps in the creation of series of different types of filters by combining the morphological operators like dilation and erosion discussed in section 4. Using these different types of filters, different types of summaries can be generated. Some of the filter designs other than the one discussed in section 5 are shown below: • Filter 1 -δ(ε n (X e )) This filter is a composition of erosion ε n (X e ) and dilation δ. The erosion will retrieve all nodes in X e ∩X e' . Now the dilation operation will retrieve all hyperedges H e which contains the nodes retrieved by the erosion operator. This summary filter will retrieve all sentences from the text with atleast one priority word. But this summary will consider star words only if they are part of priority edges in X. Well, this summary is not that short.
• Filter 2 -ε(δ n (X e )) This is a composition of dilation δ n (X e ) and erosion ε. The dilation operator retrieves the collection of priority nodes within priority edges. The erosion operator will retrieve all hyperedges H e in H which consists of only the nodes returned by the dilation operator. This summary retrieves only pure priority sentences that have no non priority words in it. This is a very short summary.
• Filter 3 -ε(δ e (X n ))) This is a composition of dilation δ e (X n ) and erosion e. The dilation defined by δ e (X n ) takes all nodes in X and retrieves all edges from H which consists of these nodes. The erosion will take the double complement of δ e (X n ). This is also a very short summary and it will be almost similar to the summary generated in section 5.2. More number of filters can be designed by combining the morphological operators defined in sections 4.1 to 4.8 resulting in the generation of different types of summaries.

Customized Summary
The summary generated by the filter is a customized one as it requires the priority of the user to be submitted before the summary being generated. Thus the summary generated is not a blind one as it takes in to consideration the preferences of the reader. The reader can give as input the priority and non priority words and the summary will be generated accordingly. So the summary report will definitely be a one which satisfies the reader.

Result Analysis
The system is tested on google cloud platform with 8 cores, 30 GB memory. A comparison of the proposed system with the existing online text summarization systems like tools4noobs, summarization. net, splitbrain.org/services is done for various data set. The data set consists of English news taken from online news sites. The news belongs to various domains like travel, politics, health, sports, gadgets etc.The same is uploaded in Mendeley repository. First of all, the news is subjected to clustering and then to summary generation using IFHG method. The summaries generated by each of the above system is compared with human summaries created. About 50 human summarizers are asked to create summaries for each of the data set. The maximum repeating sentences amoung all the 50 summaries are output to create the final human summary with which the existing systems and the IFHG method are compared. The Rouge-L, Rouge-2 and Rouge-1 scores are calculated and summarized in Table 5 to 7. In the following tables 'P' stands for the Precision, 'R' stands for recall and 'F' stands for F-measure.The proposed work has shown an average precision of 0.88, average recall of 0.84 and average F-measure of 0.86. The similarity of the output of the proposed system and the three online systems are compared with the human summaries as shown in Fig. 17. For all the datasets, the system has generated summaries which has more than 90% similarity with human summaries.

Conclusion
The system developed here has successfully modeled text using IFHG, where words become nodes and sentences become hyperedges. Membership degrees and non membership degrees are assigned for nodes. Based on that, membership degrees and non membership degrees of hyperedges are calculated. Various morphological operations are defined on IFHG. Summary of the text is created by applying a filter operator on IFHG. The system has given a better performance when compared to other existing systems. The summary filter has shown more similarity with human summaries generated. The system combines multiple text and treat it as a single one. The system can also be extended with multiple documents, where important words can be modeled as nodes and documents as hyperedges. In our system, there is only a single sub IFHG with which morphological operations are defined. Other enhancements like creating more than one sub IFHG and morphogical operations with intersection/union of those are also possible. All these are left as future enhancements of the proposed work.