A Review on Automatic Text Summarization Approaches

: It has been more than 50 years since the initial investigation on automatic text summarization was started. Various techniques have been successfully used to extract the important contents from text document to represent document summary. In this study, we review some of the studies that have been conducted in this still-developing research area. It covers the basics of text summarization, the types of summarization, the methods that have been used and some areas in which text summarization has been applied. Furthermore, this paper also reviews the significant efforts which have been put in studies concerning sentence extraction, domain specific summarization and multi document summarization and provides the theoretical explanation and the fundamental concepts related to it. In addition, the advantages and limitations concerning the approaches commonly used for text summarization are also highlighted in this study


Introduction
It has been more than 50 years since Luhn started his initial investigation on automatic text summarization (Luhn, 1958). Since then, various techniques have been successfully used to extract the important contents from text document to represent document summary (Gupta and Lehal, 2010;Nenkova and McKeown, 2011;Saggion and Poibeau, 2013). The aim of automatic text summarization is similar to the reason why we humans create summaries; i.e., to produce a shorter representation of the original text. Through these years, a number of researchers have defined the definition of summary from their own perspective. For instance, Sparck Jones defines a summary as a "reductive transformation of source text to summary text through content reduction by selection and generalization on what is important in the source" (Jones, 1999). Hovy defines a summary as "a text that is produced from one or more texts, that convey important information in the original text(s) and that is no longer than half of the original text (s) and usually significantly less than that" (Hovy, 2005).
Automatic text summarization systems can be categorized into several different types (Nenkova and McKeown, 2012;Saggion and Poibeau, 2013). The different dimensions of text summarization can be generally categorized based on its input type (single or multi document), purpose (generic, domain specific, or query-based) and output type (extractive or abstractive).
Single document summarization produces summary of single input document. On the other hand, multi document summarization produces summary of multiple input document. These multiple inputs are often documents discussing the same topic. Many of the early summarization systems dealt with single document summarization.
Generic summarization purpose is to summarize all texts regardless of its topic or domain; i.e., generic summaries make no assumptions about the domain of its source information and view all documents as homogenous texts. The majority of the work that has been done revolves around generic summarization (Nenkova and McKeown, 2011). There have also been developments of summarization systems which are centred upon various domain of interest. For example, summarizing finance articles, biomedical documents, weather news, terrorist events and many more (Radev and McKeown, 1998;Verma et al., 2007;Wu and Liu, 2003). Often, this type of summarization requires domain specific knowledge bases to assist its sentence selection process. Query-based summary contains only information which are queried by the user. The queries are typically natural language questions or keywords that are related to a particular subject. For instance, snippets produced by search engines is an example of query-based application (Nenkova and McKeown, 2011).
Extractive summaries or extracts are produced by identifying important sentences which are directly selected from the document. Most of the summarization systems that have been developed are for extractive type summaries (Aliguliyev, 2009;Ko and Seo, 2008). In abstractive summarization, the selected document sentences are combined coherently and compressed to exclude unimportant sections of the sentences (Ganesan et al., 2010;Khan et al., 2015).
In this study, the study will focus on extractive based text summarization and will primarily review approaches concerning sentence extraction, domain specific summarization and multi document summarization methods.
The following section presents the details on approaches to sentence extraction. Then, the discussion on domain specific summarization is given. Following that, the discussions on multi document summarization approaches are presented and finally the paper ends with conclusion.

Approaches to Sentence Extraction
The key concept of extractive summarization is to identify and extract important document sentences and put them together as a summary; i.e., the generated summary is a collection of original sentences. There are several approaches to sentence extraction. The following subsections will describe three approaches, namely, frequency based approach, feature based approach and machine learning based approach.

Frequency Based Approach
As we discussed in the introduction section; in the early work on text summarization, which was pioneered by Luhn, it was assumed that important words in document will be repeated many times compared to the other words in the document (Luhn, 1958). Thus Luhn proposed to indicate the importance of sentences in document by using word frequency. Since then, many of the summarization systems use frequency based approaches in their sentence extraction process (Klassen, 2012). Two techniques that use frequency as a basic form of measure in text summarization are: word probability and term frequency-inverse document frequency.

A. Word Probability
It was assumed that one of the simplest ways of using frequency is by taking the raw frequency of a word i.e., by simply counting each word occurrence in the document. However, this measure can be greatly influenced by the document length. One way to make adjustment for the document length is by computing the word probability. The probability f(w) of a word w is given by Equation 1: (1) Where: n(w) = The frequency count of the word w in the document N = The total number of words in the document The findings from the analysis carried by Nenkova et al. (2006) on human-written summaries indicate that people tend to use word frequency to determine the key topics of a document. SumBasic (Nenkova and Vanderwende, 2005) is an example of summarization system that exploits word probability to create summaries. The SumBasic system first computes the word probability (as given in Equation 1) from the input document. Then for each sentence Sj, it computes the sentence weight as a function of word probability (Equation 2): Based on the sentence weight, it then picks the best scoring sentences. Despite its simplicity (using only word probability), the SumBasic system was able to perform well in the Document Understanding Conference (DUC) 2004.

B. Term Frequency-Inverse Document Frequency
Term frequency-inverse document frequency (tf-idf) has been traditionally used in information retrieval to deal with frequent occurring terms or words in a corpus consisting related documents (Jurafsky and Martin, 2009). Its purpose was to address the following question: Are all content words that frequently appear in documents are equally important? For instance, a collection of news articles reporting on earthquake disaster will obviously contain the word 'earthquake' in all documents.
Thus the idea of tf-idf is to reduce the weightage of frequent occurring words by comparing its proportional frequency in the document collection. This property has made the tf-idf to be one of the universally used terminologies in extractive summarization (Filatova and Hatzivassiloglou, 2004;Fung and Ngai, 2006;Galley, 2006;Hovy and Lin, 1998). Here, the term frequency (tf) is defined as: Where: n i,j represents the frequency count of the word i in document j.
Each word is then divided or normalized by the total number of the words in document j. This term weight computation is similar to the word probability computation given in Equation 1. Next, the inverse document frequency (idf) (Jones, 1988) of a word i is computed: where, the total number of documents in the corpus is divided by the number of documents that contain the word i. Based on Equation 3 and 4, the tf-idf of word i in document j is computed:

Feature Based Approach
One of the natural way to determine the importance of a sentence is to identify the features that reflects the relevance of that sentence. Edmundson (1969) defined three features deemed indicative to sentence relevance i.e., sentence position, presence of title word and cue words. For example, the beginning sentences in a document usually describes the main information concerning the document. Therefore, selecting sentences based on its position could be a reasonable strategy. The following features are commonly used to determine sentence relevance (Gupta and Lehal, 2010).

Title/Headline Word
Title words appearing in a sentence could suggest that the sentence contains important information.

Sentence Position
The beginning sentences in a document usually describes the main information concerning the document.

Sentence Length
Sentences which are too short may contain less information and long sentences are not appropriate to represent summary.

Term Weight
Words or terms which have high occurrence within a document is used to determine the importance of a sentence.

Proper Noun
Proper noun and named entities such as person, organization and location mentioned in a sentence are considered to be carrying important information. Figure 1 depicts the general model of a feature based summarizer. The scores for each feature are computed and combined for sentence scoring. Prior to sentence scoring, these features are given weights to determine its level of importance. In this case, feature weighting will be applied to determine the weights associated to each feature and the sentence score is then computed using the linear combination of each feature score multiplied by its corresponding weight: Where: w i = The weight of feature i f i = The score of feature i Binwahlan et al. (2009) proposed a text summarization model based on Particle Swarm Optimization (PSO) to determine the feature weights. Bossard and Rodrigues (2011) used genetic algorithm to approximate the best weight combination for their multi document summarizer. Differential evolution algorithm has also been used to scale the relevance of feature weights (Abuobieda et al., 2013a). Investigation on the effect of different feature combination was carried by Hariharan (2010), where it was found that better results were obtained by combining term frequency weight with position and node weight.
In later works, the incorporation of fuzzy rules was studied by  for scoring sentences. For instance, one of their constructed rules states "if (NoWordInTitle is VH) and (SentenceLength is H) and (TermWeight is VH) and (SentencePosition is H) and (SentenceSimilarity is VH) and (ProperNoun is H) and (ThematicWord is VH) and (NumbericalData is H) then (Sentence is important)". Their experimental finding (tested on the DUC 2002 data set) showed that the fuzzy logic based method could outperform a general statistical method. A recent study also supports the advantages of using fuzzy reasoning to determine the importance of a sentence (Babar and Patil, 2015).

Machine Learning Approach
Machine Learning (ML) approach can be applied if we have a set of training document and their corresponding summary extracts (Neto et al., 2002). The objective of machine learning can be closely related to a classification problem, i.e., to learn from a training model in order to determine the appropriate class where an element belongs to. In the case of text summarization, the training model consists of sentences labelled as "summary sentence" if they belong to the reference summary, or as "non-summary sentence" otherwise. Sentences are usually represented as feature vectors.

Fig. 1. A feature based summarization model
After learning from the collection of documents and its summaries, the trained model will be able to identify potential summary sentences when a new document is given to the system. Next we will discuss some related works on machine learning methods.

A. Naive Bayes
One of the early works that incorporated machine learning was the study done by Kupiec et al. (1995). They used a Naive Bayes classifier for learning from the data (corpus of document/summary pairs). Their method uses the features that were derived from Edmundson (1969), where the features were independent of each other. Given a sentence s, the probability of it being chosen to be included in the summary is: Where: F 1 , F 2, …, F n are the sentence features (assuming the features are independent of each other) for the classification and S is the summary to be generated.
Each sentence is then scored according to Equation 7 and ranked for summary selection.
The Naive Bayes classifier was also used in later works but with richer features. Aone et al. (1999) include feature like tf-idf using noun words and named entities, where they used the corpus consisting of news documents for their experiments. Another extensive investigation using the similar framework was carried by Neto et al. (2002). The authors employ a large variety of features, including both statistical and linguistic features. Their method which uses the Naive Bayes classifier significantly outperformed all the baseline methods. From their findings, they also reported that the choice of classifier can strongly influence the performance of the summarizer.

B. Neural Network
Some researchers have utilized the advantages of neural network learning capabilities to learn summary sentence attributes. Kaikhah (2004) used a three layered Feed-forward network model to learn the patterns in summary sentences (Fig. 2). Seven features were extracted from their input sentences. Once the network learns the features that best represent summary sentence, feature fusion was performed by removing and combining certain features. The pruned network model is then applied to determine the summary sentences.
In another related work, a single document summarization system called NetSum was developed (at Microsoft Research Department) by Svore et al. (2007). The system was built to generated summaries using a neural network model. First, the training set (articles collected from CNN.com) is used to train the network model. The trained model is then used to rank new sentences. The NetSum system uses the RankNet algorithm (Burges et al., 2005) to perform sentence ranking. Based on the evaluation, it was found that NetSum achieved better results with statistically significant improvements compared to the baseline.
There are also other machine learning methods that has been recently used for text summarization. Hannah and Mukherjee (2014) proposed a trainable summarizer for classifying important sentences. The authors used a decision tree model which was trained to classify sentences as interesting sentence and not interesting sentence. The results of their approach was able to outperform the baseline approach results. Fig. 2. Feed-forward network model after training (Kaikhah, 2004)

Domain Specific Summarization
Much of the work we reviewed in the previous sections involved generic summarization whereby the relevance of a summary is decided just based on the input document without relating to its domain or the user needs (Nenkova and McKeown, 2011). For example, inputs such as medical documents, news documents or emails; have special structures or unique characteristics which should be taken into account by the summarizer to produce more accurate information. Next, we will review some of the works concerning domain specific text summarization.

Medical Summarization
The study on automatic summarization was found to be very useful to the medical field. Summarization can help doctors to obtain relevant information about a particular disease or information from the patient records (Becher et al., 2002). It will also be beneficial to patients or users whom turn online to find information pertinent to their health problems (Kaicker et al., 2010). Furthermore, there are extensive resources that provide access to medical information and medical-related databases. For instance, there are over 20 million articles in MEDLINE; a biomedical database. Summarization is thus essential in such condition to treat the problem of information overload. An early summarization system that has been built for medical knowledge is the Centrifuser (Elhadad et al., 2005;Kan et al., 2001). The Centrifuser is a summarizer that helps consumers by producing query-driven summaries in their search for healthcare information. It represents document topics by a tree data structure and perform query mapping from the topic trees to retrieve relevant sentences. Another medical summarizer, proposed by Fiszman et al. (2009), was built to generate summaries based on semantic abstraction to assist physicians find the most salient information in MEDLINE citations for some specified diseases.
There are also researchers who utilize the background knowledge (i.e., ontology) for medical summarization. Ontology can be used to describe domain-related information. Using ontology, information can be related to each other through the common characteristics of a domain (Khelif et al., 2007).
One example is the utilization UMLS, a medical ontology, which is used to summarize biomedical articles (Verma et al., 2007). UMLS was used to match words in sentences that contains similar concepts in it. Likewise, Kogilavani and Balasubramanie (2009) have employed UMLS to expand user's natural language queries with synonyms and semantically related concepts. Ontology has also been used by Naderi and Witte (2010) in biomedical research area to summarize protein mutation impact information. They populated their ontology with protein mutation impact information and then used it to generate query based summaries.

News Summarization
Early work on news summarization can be dated back to 1990s when SUMMONS summarizer was created (McKeown and Radev, 1995). SUMMONS was designed for summarizing single events (news articles related to terrorist events). It was built using a templatedriven message understanding system, MUC-4 (Sundheim, 1992). The system first processes the full text and fills the template slots before synthesizing the summary from the extracted information.
Similar to the SUMMONS system is a system called RIPTIDES (White et al., 2001). It incorporates information extraction to support summarization. They use natural disaster scenario templates for each text and provide them as input to the summarization system. The summarizer first merges the templates into event oriented structure and then the importance scores are assigned to each slot/sentence to select the summary sentences.
Newsblaster (McKeown et al., 2002), was developed to summarize online news articles. The summarizer uses MultiGen McKeown et al., 1999), which identifies common sentences from news article using machine learning together with statistical techniques . Summaries are then produced by analyzing and fusing together the sentences.
In later work, Li et al. (2010) proposed Ontologyenriched Multi-Document Summarization (OMS) system to generate query-relevant summary applied to disaster management; for natural calamities related news and reports. OMS relates sentences onto a domain-specific ontology. Node on the ontology will then be matched based on user query and the sentences attached to that particular node will be extracted to form summary. Fig. 3. Comment-oriented blog summarization (Hu et al., 2007) Another concept called fuzzy ontology was studied by Lee et al. (2005) to develop weather news summarization. Fuzzy ontology was found to be more suitable to treat domains with uncertainty.
From the understanding of news structure, the utility of sub-events in news topic were investigated by Daniel et al. (2003) in order to capture essential information to produce better summaries. Their study involves experiments carried out on Gulf Air crash. In their experiment, human judges were asked to determine the sentences related to the predefined sub-events comprising the topic. Then summaries were created using selection algorithms. Their findings showed that the utilization of sub-events can improve the performance and suggest that future efforts should be directed towards enhancing automatic clustering of subevents. In another related work, Kumar et al. (2014) exploited news structure by incorporating the contextual information such as 'who ', 'what' and 'where' in the sentence selection process. Contextual information was able to significantly improve news summarization.

Email/Blog Summarization
There have also been studies on email and blog summarization reported in literature. In early research on email summarization, Nenkova and Bagga (2004) came up with a system to generate summaries from email threads. They produce short "overview summaries" by extracting sentences only from the thread root message and its immediate follow-ups. To extract sentences from the root message, they find sentence that has the largest overlap nouns or verbs with the subject of the email. Similarly, to select sentence from follow-up emails, the largest overlap of nouns and verbs between the root email and the follow-up emails were computed. Newman and Blitzer (2003) also address the problem of summarizing email threads. First, all the messages are clustered into group messages. Sentences in each group are scored using several features. Then from each group, summaries are extracted. In another related work, Rambow et al. (2004) used email specific features and rules to extract sentences from emails. The features that they used take into account the structure of the email thread.
For research in blog summarization, the main context of the blogs is usually the writer's opinion. Zhou and Hovy (2006) proposed a summarization approach which was inspired by the work by Marcu (1999), who produces summary extracts using (abstract, text) tuples. Starting from the blog entry, they continuously remove sentences that are not related to the story (linked articles), while keeping sentences with maximal semantic similarity with the linked articles.
In later work, Hu et al. (2007) argued that comments from blog readers does change the understanding about the blog post. The authors use the words from the blog's comments to extract sentences. They integrate several word representative measures to weight the words appearing in the comments and perform sentence selection based on the representativeness of its contained words. Figure 3 show the architecture of their blog summarization model.
Apart from personal blogs, summarization for legal blog entries has also been studied. Conrad et al. (2009) proposed a query-based summarization approach which is specific to legal blogs. The task carried was based on the Text Analysis Conference (TAC) 2008 task. Using the retrieved documents from the Blog Search Engine (www.blogsearchengine.com), they first filter the sentences that do not match the query questions (questions related to topics from the legal domain). Then they apply the FastSum (Schilder and Kondadadi, 2008); a summarization system which have been previously used to produce sentiment summaries (Schilder et al., 2008a;2008b), to extract summaries from the retrieved blogs.

Multi Document Summarization
Concerns have been raised in past regarding the size of input documents which is required to be summarized. Since information can be collected from multiple sources, condensing these information is considered essential. Various types of multi document summarization methods have been developed by researchers (Nenkova and McKeown, 2012;Saggion and Poibeau, 2013). In this section we will focus on two popular methods i.e., cluster based method and graph based method (Gupta and Lehal, 2010;Haque et al., 2013). Besides these two methods, we also review some of the related works, using discourse based method, which received much attention in recent years. For each of these methods, its primary concept will be explained.

Cluster Based Method
Clustering refers to the grouping of similar instances into their clusters. In our case, these instances are the sentences. This can be done by computing the similarity between sentences and the sentences which are highly similar to each other are grouped into the same cluster. Different clusters may represent different subtopics. High scoring sentences from each cluster are then put together to form summary. This process is depicted in Fig. 4. Radev et al. (2004) pioneered the use of cluster centroids for their multi-document summarizer, MEAD. Centroids are the top ranking tf-idf that represents the cluster. These cluster centroids are then used to identify the sentences in each cluster that are most similar to the centroid. The cosine similarity measure was used for this purpose. As a result, the summarizer generates sentence which are most relevant to each cluster.
Taking the benefit of clustering approach, efforts have been put into making the overall text summarization process more effective. One that is worth to be mentioned here is determining the optimal number of clusters, where Xia et al. (2011) adopted the coclustering theory to find optimal clusters. They determine the weights of sentences and terms based on the sentence-term co-occurrence matrix. Sentence-term matrix is designed to represent diversity and redundancy within multiple articles. Finally, the top-weighted sentence in every cluster is picked out to form the summary until a user-preferred summary length is met. An evolutionary algorithm called Differential Evolution algorithm was also used to optimize data clustering process and could increase the quality of the generated text summaries (Abuobieda et al., 2013b).
Some researchers employ clustering-based hybrid strategy to combine local and global search for sentence selection (Nie et al., 2006). This approach does not depend only on similarity to cluster for sentence selection but also considers the overall document content similarity. In another related work, focus has been given on strengthening the clusters diversity. To achieve this, Aliguliyev (2010) used PSO algorithm by adding a mutation operation adopted from genetic algorithms to optimize intra-cluster similarity and inter-cluster dissimilarity.
Cluster based methods have been successful in its task to represent diversity and reduce redundancy within multiple articles. Although these can be considered the advantage of using clustering methods, as far as multi document is concerned, a summary cannot be meaningful enough if the relevance of a sentence is judged merely based on the clusters. This is because in clustering based method, eventually sentences are ranked according to its similarity with cluster centroid which simply represents frequent occurring terms.

Graph Based Method
Graph theory is simply used to model the connections or links that exist between objects. Generally, a graph can be denoted in the form of G = (V, E), where V represents the graph's vertex or node and E is the edge between each vertex. In the context of text documents, vertex represents sentences and an edge is the weight between two sentences. Using this approach, documents can therefore be represented as a graph where each sentence becomes the vertex and the weight between each vertex corresponds to the similarity between the two sentences.
As in most literature concerning graph based approach, the most widely used similarity measure is the cosine similarity measure (Erkan and Radev, 2004). An edge then exists if the similarity weight is above some predefined threshold. Figure 5 shows an example graph for multi document. Once the graph is constructed for a set of documents, important sentences will then be identified; it follows the idea that a sentence is considered important if it is strongly connected to many other sentences.
This approach differs from the cluster based approach where sentences are ranked based on its closeness to cluster centroid. Two well-known graph based ranking algorithms is the HITS algorithm (Kleinberg, 1999) and the Google's PageRank (Brin and Page, 2012). Both methods have been traditionally used in Web-link analysis and social networks. Lexrank (Erkan and Radev, 2004) and TextRank (Mihalcea and Tarau, 2004) are two successful graph-based ranking systems that implement these algorithms.
Further studies have been carried to make improvement through modification in the ranking algorithm. Wan and Yang (2006) assigned different weights to intra-document links and inter-document links.
They gave more priority to sentence with high interdocument links. In later work by Hariharan and Srinivasan (2009), they approached the graph based method differently i.e., by discounting the already selected sentence by removing it from further consideration when they rank the remaining sentences in the document.
Apart from sentence level information, Wan (2008) and Wei et al. (2010) devised a document-sensitive graph model to explore document impact on the graphbased summarization, by incorporating both the document-level information and the sentence-to-document relationship in the graph-based ranking process. The document-level relations are used to adjust the weights of the vertices and the strength of the edges in the graph.
The approach to graph based methods have resulted in positive feedback from the multi document summarization research communities as it was able to identify 'prestigious' sentences across the documents. The resulting graph is also able to capture distinct topics from unconnected sub-graphs. However, this approach depends heavily on sentence similarity to generate graph, without "understanding" the relationship between the sentences.

Discourse Based Method
In this study, we also investigate studies related to discourse analysis. It involves analysis on the semantic relation that exist between textual units. In the case involving multiple document, some research works study the utility of cross-document relations to determine important sentences which are deemed relevant to the document collection. Radev (2000), initiated the study on cross-document relations and came up with Cross-Document Structure Theory (CST) model. In this model, words, phrases or sentences can be link with each other if they are semantically connected. For example, some of the semantic connections or CST relations between sentences are given in Table 1.
Past studies have claimed that CST was indeed useful for document summarization. Zhang et al. (2002) have utilized CST to determine sentence relevance. First, they produce multi document summary using a summarization system called MEAD (Radev et al., 2001). Then, they ask human experts to identify the CST relations that exist between sentences in the document set. At this point, the low scoring sentences are replaced with sentences that contains high CST relations. It was to produce summaries which are coherent; through the existence of relations between the summary sentences.
The effect of incorporating CST on the summarization process have likewise been contemplated by Jorge and Pardo (2010). They mainly investigate content selection methods for producing both informative and preference-based summaries. They tested their method using news articles acquired from CST News corpus (Aleixo and Pardo, 2008) which were annotated beforehand by human experts. The CST relations were utilized to treat repetition, complementarity and inconsistency among the diverse data sources. Nonetheless, the significant limitation of the above works is that the CST relations should be explicitly determined by human. Fig. 5. Example graph as depicted in (Erkan and Radev, 2004).
Each node represents a sentence Table 1. Examples of CST relations (Zhang et al., 2002) Relationship Description Text span 1 (S1) Text span 2 (S2) Identity The same text appears Tony Blair was elected Tony Blair was elected in more than one location for a second term today. for a second term today.

Equivalence Two text spans have Derek Bell is experiencing
Derek Bell is having the same information content resurgence in his career. a comeback year. Translation Same information content Shouts of "Viva la revolucion!" The rebels could be heard shouting, in different languages echoed through the night. "Long live the revolution". Subsumption S1 contains all information in S2, plus With 3 wins this year, Green Bay Green Bay has 3 wins this year. additional information not in S2 has the best record in the NFL.

Contradiction Conflicting information
There were 122 people 126 people were aboard the plane. on the downed plane. Historical S1 gives historical context This was the fourth time a member of The Duke of Windsor was divorced from background to information in S2 the Royal Family has gotten divorced. the Duchess of Windsor yesterday.
To address this gap, recent studies have attempted to identify the CST relations directly from texts document to produce summaries. Zahri and Fukumoto (2011) determined the CST relations by applying SVM classifier. The PageRank algorithm was used for sentence weighting whereby the directionality in PageRank was determined using the identified CST relations. Based on these relations, they also adjust the connected sentences to handle repetition issue.
In a similar study, Kumar et al. (2013) proposed Genetic-CBR classifier to identify CST relations from un-annotated documents. Two techniques based on voting model and fuzzy reasoning were used to rank the sentences (Kumar et al., 2014). These techniques use the identified CST relationship between the sentences for sentence scoring. Both studies showed that CST based approach outperformed the cluster based method and graph based method.

Conclusion
In this study, the fundamental concepts and methods related to automatic text summarization have been discussed. Indeed, this study has been presented in a way that researchers new to this field are exposed to various automatic text summarization approaches and applications. The paper starts with a brief introduction to automatic text summarization and provides the review on past and present works found in the literature. Much discussion revolves around extractive based text summarization and primarily reviews approaches concerning sentence extraction, domain specific summarization and multi document summarization. It appears that each of the approaches discussed in this study possess its own advantages towards automatic summarization. However, there are a number of limitations pertaining to some approaches. Recent studies have attempted to address some of these limitations. The next big challenge is not only to focus on the summary information content, but efforts should also be put into the readability aspect of the generated summary itself. The future trend of automatic text summarization is most likely to move along this direction.