Fuzzy Swarm Based Text Summarization

: Problem statement: The aim of automatic text summarization systems is to select the most relevant information from an abundance of text sources. A daily rapid growth of data on the internet makes the achieve events of such aim a big challenge. Approach: In this study, we incorporated fuzzy logic with swarm intelligence; so that risks, uncertainty, ambiguity and imprecise values of choosing the features weights (scores) could be flexibly tolerated. The weights obtained from the swarm experiment were used to adjust the text features scores and then the features scores were used as inputs for the fuzzy inference system to produce the final sentence score. The sentences were ranked in descending order based on their scores and then the top n sentences were selected as final summary. Results: The experiments showed that the incorporation of fuzzy logic with swarm intelligence could play an important role in the selection process of the most important sentences to be included in the final summary. Also the results showed that the proposed method got a good performance outperforming the swarm model and the benchmark methods. Conclusion: Incorporating more than one technique for dealing with the sentence scoring proved to be an effective mechanism. The PSO was employed for producing the text features weights. The purpose of this process was to emphasize on dealing with the text features fairly based on their importance and to differentiate between more and less important features. The fuzzy inference system was employed to determine the final sentence score, on which the decision was made to include the sentence in the summary or not.


INTRODUCTION
The aim of automatic text summarization systems is to select the most relevant information from an abundance of text sources. A daily rapid growth of data on the internet makes the achieving of such aim as a big challenge. To overcome that challenge, much work is still required and may be it is beyond of only one technique. Therefore in this study, we investigate the incorporation of fuzzy logic with swarm intelligence. In automatic text summarization where the sentence score is based on the weights of the features, choosing those weights can be imprecise and uncertain, by the incorporation of fuzzy logic with swarm intelligence, so that risks, uncertainty, ambiguity and imprecise values can be flexibility tolerated.
Automatic text summarization researchers since Luhn's research [1] , they are trying to solve or at least relieve the challenge by proposing techniques for generating summaries. The summaries serve as quick guide to the interesting information, providing a short form for each document in the document set; reading summary makes decision about reading the whole document or not and it also serves as time saver.
The machine learning approaches [2][3][4][5][6][7][8] have proven their ability in improving the summarization performance. In our previous study [9] , we used Particle Swarm Optimization (PSO) [10] as machine learning for features selection problem in order to study the feature structure effect on the feature selection, one of the results obtained from that study is the learned features weights. In this research, we will try to apply the features weights produced by PSO in a proposed method for automatic text summarization problem. Based on our literature survey for applying fuzzy swarm for text summarization problem, we found that fuzzy swarm has not been introduced for such problem. PSO was successfully applied in some related problems like text classification and data clustering and could achieve high performance. Ziegler and Skubacz [11] proposed a method for extracting the text content from news web pages, where the web pages are in HTML format, particle swarm optimization was used to learn features thresholds from training set, the features having score greater or equal PSO-determined threshold are counted for their text to be classified as signal text and the features having score less than such threshold are not counted to their text and due to the later, the text may be classified as noise. The signal text is extracted as text content. Merwe and Engelbrecht [12] introduced PSO based clustering algorithm, where each particle in PSO represents a different group of centroids. Also in the same study, a hybrid PSO and K-means as clustering algorithm was investigated, where the Kmeans is run first and its results (clusters centroids) are used for initialing one particle in PSO while the remaining particles are initialized randomly. Cui et al. [13] have applied PSO for documents clustering. The particle components are same as they were used in [12] , where each particle consists of a number of centroids. The researchers have taken into account execution time of PSO compared with K-means which has less execution time in the dealing with large data set, so they introduced a hybrid of PSO and K-means, where the role of PSO is to determine the initial centroids for K-means algorithms. Wang et al. [14] presented PSO for web document classification, the highest weight terms were used as feature for the classification. The PSO was adopted for the classification problem, where each particle represents those highest weight terms. F-mean measure for precision and recall was used as fitness function.
Applying the fuzzy logic for text summarization still needs more investigation; a few studies were done in this direction, here we present some works which used fuzzy IF-THEN rules for scoring the sentences, Kiani-B and Akbarzadeh-T [15] presented text summarization system in which the features are used as input for the fuzzy system, based on the fuzzy rules each sentence receives score in the range between zero and one, the fuzzy rules were optimized using hybrid GA and GP. Kyoomarsi et al. [16] proposed fuzzy logic based text summarization, following Kiani-B and Akbarzadeh-T's way [15] , the difference between these two studies is in the later, the fuzzy rules were not optimized.
The good performance of PSO and fuzzy logic in above studies promises that their integration in single method can do well for automatic text summarization as well.

Particle Swarm Optimization (PSO):
Particle swarm optimization [10] originally relates to artificial life (Alife) in general and specifically it connects with bird flocking and fish schooling [17] . The Intelligence in PSO as any other swarm technique is a collective intelligence resulting in the collective behaviors of (unsophisticated) individuals interacting locally and with their environment causing coherent functional global patterns to emerge. Particle Swarm Optimization (PSO) which is inspired by the social behavior of bird flocking or fish schooling and Ant Colony Optimization (ACO) which is inspired by behavior of ants are the primary computational parts of swarm intelligence.
Particle swarm optimization was introduced by [10] as a stochastic, population-based evolutionary algorithm for problem solving. The key idea of PSO method is to simulate the shared behavior happening among the birds flocks or fish school.
PSO depends on its methodology, a population of individuals to discover favorable regions of the search space. Every member in the population is called particle and the group of all particles is called a swarm. Each particle flies in the search space with a velocity that is dynamically adjusted according to its own flying experience and its companions' flying experience and retains the best position it ever encountered in memory. The best position ever encountered by all particles of the swarm is also announced to all particles. The study of PSO starts by initially randomizing a group of solutions (particles), the swarm updates its best value every cycle based on (1) and (2) and then after several iterations finds the optimized solution: Where: V id (t) = The velocity of the particle i in the time point t in the search space along the dimension d P id (t) = The best position in which the particle previously got high fitness value, it is called pbest x id (t) = The current position of the particle i in the search space r 1 and r 2 = Random generated numbers in the range [0,1] p gd (t) = The overall best position in which a particle got best fitness value, it is called the gbest c 1 and c 2 = Acceleration parameters W = Inertia weight, its value is decreased linearly over the time from 0.9-0.4 [18] : Where: X id (t+1) = The new position which the particle must move to x id (t) = The current position of the particle V id (t+1) = The new velocity of the particle resulting in the calculation in (1) which mainly determines the new position of the particle The velocity of the particle must be in the range There are two types of PSO: Continuous particle swarm optimization which is to optimize continuous nonlinear problems [10] and binary particle swarm optimization [19] which is extension of continuous PSO, in which the particle position is represented as bit string rather than real numbers; the update of the position in continuous PSO is done directly by adding the velocity to the previous position but in binary PSO, the velocity is used only in the sigmoid function as in (3) to calculate the probability of the bit value to be changed to 1 or 0, where the value retrieved from the sigmoid function is compared with random generated value in the range between zero and one: Text Features: The features used in this study are five [9] : • Sentence Centrality: The sentence centrality consists of three features: The similarity, shared friends and shared n-grams between the sentence in hand and all other document sentences, normalized by n-1, n is the number of sentences in the document • Title feature: This feature is formed as average of two features which are title-help sentence (THS): The sentence containing n-gram terms of title and title-help sentence relevance sentence (THSRS): The sentence containing n-gram terms of any titlehelp sentence Where: 0.1 = Minimum score the sentence gets in case it's terms are not important W ij = As in (5) is the term weight (TF-ISF) of the term t ij in the sentence s 1 LS = Summary length and HTFS is highest term weights (TF-ISF) summation of a sentence in the document: • Key word feature: The top 10 words whose high TF-ISF score are chosen as key words • The similarity to first sentence: This feature is to score the sentence based on its similarity to the first sentence in the document, where in news article, the first sentence in the article is very important sentence Fuzzy logic: The term "fuzzy logic" resulted in the development of the theory of fuzzy sets by Zadeh [20] . Due to the limitation of classic logic is that deals only with two values, true or false created the need for extending it to be able to handle the partial truth (neither completely true nor completely false). The fuzzy logic is extension of the classical logic in form of generalization of the classical logic inference rules (like modus ponens, modus tollens and hypothetical syllogism) which has ability to deal with approximate reasoning [21] . The fuzzy set is an elaboration for the traditional set "crisp set" in which each member has a degree of membership to that set determined by membership function. The membership function is a function assigns membership degree to each member in the target set, the range of membership degree between zero and one. The computer can translate linguistic statement into actions based on a set of such IF-THEN rules of the fuzzy logic. The fuzzy IF-THEN rules are normally created as the form "if A then B" in which the condition is connected with actions, where A and B are fuzzy sets. The fuzzy logic has advantage in terms of simplicity of development and modification because the rules are well understandable and easy to modify, add new rules or remove existing rules. The typical fuzzy inference system consists of the following stages: Fuzzification: Fuzzification is a kind of uncertainty that requires fuzzy sets [21] , in which the input values are translated into grades of membership in the range between zero and one for linguistic terms of fuzzy sets using a membership function which is used to assign a grade to each linguistic term. Inference: The inference is the core part of a fuzzy system, which merges the facts obtained from the fuzzification part with a series of production rules to perform the fuzzy reasoning process [21] .
The most important fuzzy inference methods are Mamdani's fuzzy inference method [22] and Sugeno or Takagi-Sugeno-Kang method [23] . The rule consequent in Mamdani fuzzy systems is represented using fuzzy sets, while in Takagi-Sugeno-Kang fuzzy systems; it is form as linear functions of input variables. Typical fuzzy system is shown in Fig. 1.

Defuzzification:
The goal of defuzzification is to convert the fuzzy results of the inference into a crisp output.
The swarm based summarization: The swarm model is defined as combination of text features scores as in (6), where those features scores are adjusted using the weights resulting in the training of the Particle Swarm Optimization (PSO).
Therefore the first part in this model is for training PSO, 100 documents were selected from Document Understanding Conference (DUC) [24] data collection, DUC 2002 and used as training and testing data.  For more details and the experimental results refer to [25] . In this study, we employ the fuzzy logic for scoring the sentences instead of the formula in (6).

The fuzzy swarm based summarization:
To implement our fuzzy system, we use built-in Mamdani's fuzzy inference method [22] of Matlab fuzzy logic toolbox. Below the main parts of the fuzzy inference process are described:

Fuzzification:
The inputs are crisp numerical values of five features used in our, those values are limited to the universe of discourse in the range [0, 1]. The features values are adjusted using the weights resulting in the training of the particle swarm optimization (PSO; this forms the central point of merging of the fuzzy logic with swarm intelligence. To determine the degree to which the input values belong to each of the appropriate fuzzy sets, we use the trapezoidal membership function due to its simplicity and widely use. Three fuzzy sets are used: Low, medium and high. The trapezoidal membership function contains four parameters (a, b, c and d) with the four breakpoints of the trapezium which determine the shape of the function. Moreover the membership function is described by the two indices i and j. For example, the membership function A ij (a ij , b ij , c ij , d ij ) belongs to the i th fuzzy set and the j th input variable. B i (a i , b i , c i , d i ) is the output membership function of the i th fuzzy set. The rapezoidal curve is a function of a vector, x, (the j th fuzzy variable) in the i th fuzzy set and depends on the four scalar parameters a, b, c and d, as given by: where, ij ij ij ij a b c d ≤ ≤ ≤ must hold.
Or in short form: The parameters a and d locate the "feet" of the trapezoid and the parameters b and c locate the "shoulders." The output of the trapezoidal membership function is a fuzzy degree of membership (in the range [0, 1]) in the fuzzy set. Figure 5 shows the membership functions of fuzzification of the input value of the Sentence Centrality feature (SC).

Inference:
The facts resulted in the fuzzification step need to be merged with a series of the production rules (IF-THEN rules) to perform the fuzzy reasoning process; we defined around 200 IF-THEN rules for that purpose. The following is an example for those rules:

• If (WSS is H) and (SC is H) and (S_FD is M) and (SS_NG is H) and (KWRD is H) then (output is important)
The antecedent of the fuzzy rule in this example has more than one part. To get the output of such antecedent rule, the fuzzy operator is applied to obtain one number which will then be applied to the output function. We use AND operator, it was set as min (minimum) to select the antecedent part with minimum value as output of the antecedent rule. The output of the antecedent rule is used as input for implication process. In this process, we use min (minimum) to reshape the output fuzzy set by truncating it; the output fuzzy membership function is used in this study is the trapezoid membership function as shown in the Fig. 6.
The implication process is implemented for each fuzzy rule. The next sub-step in the inference process is aggregation of outputs of all fuzzy rules and combining them into a single fuzzy set which represents the final output variable.
Defuzzification: The last step in the fuzzy inference process is the defuzzification which is to convert the fuzzy results of the inference into a crisp output which represents the final score of the sentence. We use the centroid method [26] for defuzzification Eq. 9, which returns the center (one crisp number) of the area under the curve of the output fuzzy set resulting in the aggregation process: Where: z = The center of mass u c = The membership in class c at value z j After getting the scores of all sentences produced by the fuzzy inferences system, the sentences are reranked based on those scores in descending order, then the top n sentences are selected as summary, where n is equal to the compression rate which is 20% of the total number of the document sentences.

Generalizing the proposed method results via confidence limits:
The aim of generalization is to get one value which can express all values in the population of the results. For each summary, evaluation values (recall, precision and f-measure) are created using the evaluation measure ROUGE [27] . Measuring the performance of the proposed method needs to check each evaluation value separately. Doing so is tough job and a waste of resources. The solution is to use the sample of results (summaries evaluation values) to calculate a range within which any value in the population is likely (95% of the time) to fall. Therefore the range is called the 95% confidence interval. The minimum and maximum values in that range are called the confidence limits. The interval is all values between the confidence limits. The ROUGE [27] generalizes the evaluation results using bootstrapping (resampling) method.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) toolkit [27] is used for evaluation, where ROUGE compares a system generated summary against a human generated summary to measure the quality. ROUGE is the main metric in the DUC text summarization evaluations. It has different variants, in our experiment, we use ROUGE-N (N = 1 and 2) and ROUGE-L, the reason for selecting these measures is what was reported by same study [27] that those measures work well for single document summarization.
In DUC 2002 document sets, each document set contains two model or human generated summaries for each document. We gave the names H1 and H2 for those two model summaries. The human summary H2 is used as benchmark to measure the quality of our proposed method summary, while the human summary H1 is used as reference summary. Beside the human with human benchmark (H2-H1) (H2 against H1); we also use another benchmark which is MS word summarizer (Msword). Table 1 shows a comparison between the proposed method evaluation and the other three methods (the swarm model and the two benchmarks (Msword and H2-H1)) based on the average recall, precision and F-measure using ROUGE-1, ROUGE-2 and ROUGE-L, where those averages for the four methods (the proposed method, the swarm model, Msword and H2-H1) were generalized using the confidence limits (95%-confidence interval). The Fig. 7-9 visualize the same results drawn in the Table 1. The purpose of using the human summarizer (H2-H1) as benchmark is to show how much the performance of the proposed method, the swarm model and Msword summarizers is acceptable compared with that performance of the human (H2-H1). Based on the generalization of the results obtained by the four methods (the proposed method, the swarm model, Msword and H2-H1), the proposed method performs better than the swarm model and the Msword summarizer. The low overlapping between the summaries generated manually by the human makes achieving high evaluation values difficult even for human summarizer; we found that the overlapping between the two human summaries which we used in this study is 49% similar to each other.

DISCUSSION
The experimental results lead to two interesting observations. Firstly, the proposed method based on fuzzy logic and swarm intelligence (PSO) for text summarization problem showed good performance compared to other methods used in this study. Secondly, the low overlap between the summaries generated manually by the humans made achieving high evaluation values difficult. For instance, we found that the overlapping between the two human summaries (H2 and H1) which we used in this study is 49% similar to each other. The weights suggested by PSO promoted the scores of the highly important features, which give each text feature the right score it was worth. The experimental results supported the incorporation of fuzzy logic with swarm intelligence to make the risks, uncertainty, ambiguity and imprecise values for choosing the weights (scores) of the text features to be flexibly tolerated. For our future study, we will incorporate the proposed method with diversity based methods in a different hybrid model.

CONCLUSION
In this study, we introduced a method based on fuzzy logic and swarm intelligence (PSO) for text summarization problem. The weights suggested by PSO were used to adjust the text features scores. The fuzzy inference system was employed to use the adjusted features scores as inputs, based on which the sentences are evaluated and the most relevant sentences are selected to be included in the summary. The results showed that the proposed method has better performance outperforming the swarm model and the benchmark methods used in this study.