A Hybrid Approach to Pronominal Anaphora Resolution in Arabic

: One of the challenges in natural language processing is to determine which pronouns to be referred to their intended referents in the discourse. Performing anaphora resolution is considered as an important task for a number of natural language processing applications such as information extraction, question answering and text summarization. Most of the earlier works of anaphora resolution have been applied to English and other languages. However, the work done in Arabic is not sufficiently studied. In this study, a hybrid approach that combines different architectures for resolving pronominal anaphora in Arabic language is presented. The hybrid model adopted the strategy based on the combination of a rule-based and machine learning approach. The collection of anaphora and respective possible antecedents was identified in a rule-based manner with morphological information taken into account. In addition, the selection of the most probable candidate as the antecedent of the anaphor was done by machine learning based on a k-Nearest Neighbor (k-NN) approach. In this study, the appropriate features to be used in this task were determined and their effect on the performance of anaphora resolution was investigated. Experiments of the proposed method were performed using the corpus of the Quran annotated with pronominal anaphora. The experimental results indicate that the proposed hybrid approach is completely reasonable and feasible for Arabic pronominal anaphora resolution.


Introduction
Anaphora is a phenomenon that uses an abstract form to refer to a certain type of language form and content in a certain language environment. It is one of the most frequent linguistic phenomenon used in natural language processing that should be solved in order to establish the coherence in text. Anaphora resolution is a process of determining the antecedent of an anaphor and the subsequent replacement of the anaphor by its antecedent. The resolution of anaphora to their intended referents has been considered one of the most intriguing and challenging problems in Natural Language Processing (NLP). Such resolution is of importance for two reasons: It helps uncover the meaning and role of the anaphor by finding the proper antecedent and it also helps to fully and correctly understand the text. Therefore, a wide range of practical applications require the successful identification and resolution of anaphora including machine translation, question-answer systems, text summarization or automatic abstracting, information extraction, language generation and dialog systems. Although a significant amount of work has been done in English and other languages, the work done in the field of anaphora resolution in Arabic anaphora resolution is not sufficiently studied. Arabic has particular characteristics that make the pronominal anaphora resolution task difficult, such as morphological complexity, null pronominals and free ordering. The most widespread type of anaphora is the pronominal anaphora, which is realized by anaphoric pronouns. Due to their characteristics and high frequency of use, such anaphoric expressions pose great challenges to most of the fields in computational linguistics. The main reason for this is that these expressions do not carry much information on their own and therefore have to be processed before they can be used. The present study focuses exclusively on the resolution of pronominal anaphora in Arabic which occurred on pronoun, in which the purpose of the resolution process is to identify the appropriate antecedent that is referred by the pronoun. The term pronominal anaphora occurs at the level of personal pronouns, possessive pronouns and reflexive pronouns.
This paper represents a hybrid model for resolving Arabic pronominal anaphora, where the model adopted the strategy based on a combination of a rulebased approach and machine learning approach. The collection of the anaphora and possible antecedents was identified in a rule-based manner with morphological information taken into account. In addition, the selection of the most probable candidate as the antecedent of the anaphor was determined by the module that performs the actual pronoun resolution based on the machine learning approach. The model was tested against a set of test examples which is distinct from the training set.

Related Work
Although there has been extensive research regarding anaphora resolution in various languages, anaphora resolution has attracted increasing attention recently. Accordingly, several studies applied different techniques to deal with anaphora resolution in several languages. Among examples of their works include rule-based and learning-based approaches. However, there has also been some limited work towards using hybrid approaches. In this section, these approaches are introduced by grouping them into three approaches; rule-based approaches, machine learning-based and hybrid-based approaches.

Machine Learning Approach
In recent years, machine learning approaches have been widely applied to anaphora resolution and have achieved considerable success such as (Soon et al., 2001;Fei et al., 2008;Denis and Baldridge, 2008;Ng and Cardie, 2002;Yang et al., 2003). Arregi et al. (2010) introduced a machine learning approach to resolve the pronominal anaphora in Basque language. They used different machine learning classifiers to identify the suitable system based on the properties of Basque language. A number of the classifiers used include SVM, Multilayer Perceptron, NB, k-NN, Random Forest (RF), NB-Tree and Voting Feature Intervals (VFI). Suitable features were identified and used for Basque anaphora resolution. Furthermore, several experiments were conducted to analyze the contribution of each feature in order to determine the relative importance of the features. The corpus used in experiments to evaluate the proposed model was constructed from a part of Eus3LB corpus with 50,000 words. It contains 349 annotated pronominal anaphora. The experiment shows that several features are more helpful for Basque anaphora resolution.
Akilandeswari and Sobha (2012) developed a model for resolving pronominal anaphora in Tamil based on conditional random fields. They performeda manual in-depth analysis of the corpus that contains Tamil pronominal aval, avan, atu and its suffixes with the necessary set of features in order to identify the anaphora pronouns and its antecedents. The corpora they have considered for their work was obtained from the web, which was compiled and taken from the tourism domain that consists of 10000 sentences. The achieved results illustrated that over all, the proposed approach is very effective and encouraging. Hammami et al. (2010) proposed a Bayesian classifier for the identification of non-referential pronouns in Arabic texts. They evaluated their approach on common data sets and obtained encouraging results, proving that the learning approach achieved better accuracy than the rule-based approach.

Rule-Based Approach
Many researchers have worked on anaphora resolution from a variety of perspectives. The early work on anaphora resolution was carried out almost in a rule-based paradigm. In this section, the rule-based approach is introduced, which relies heavily on linguistic information and domain (or general) knowledge (Holen, 2007;Khan et al., 2006;Mitkov, 1997;Lappin and Leass, 1994). Ali et al. (2008) presented a rule-based approach for resolving the strong personal pronouns resolution in Pashto language. They re-implemented the algorithm of (Ali et al., 2007) that relies on factors such as lexical and morphological, subject and object preferences, section heading and recency preferences. They also identified a few new rules to the existing rules by removing the undesirable rules to improve the performance and efficiency. As a result, the modified algorithm correctly resolved 338 personal pronoun occurrences of the pronoun among 397 pronouns with an increase of the accuracy rate from 80 to 85.14%. Fallahi and Shamsfard (2011) proposed a rule-based approach for pronoun reference resolution in Persian texts. The proposed method is based on a set of manual extracted rules to determine the reference of pronouns within three sentences preceding the sentence, in which anaphora is present to identify the pronouns reference. They created their own corpus, which composed of five Persian blogger from the web. The experimental results illustrated that the proposed rule-based approach for Persian is proven effective.
Naing and Thida (2014) presented a rule-based approach to resolve pronominal anaphora resolution in Myanmar language. They implemented their model by using Hobbs algorithm which works only for the surface syntax of sentences in a given text. The corpus used in their experiments was collected from two different types of dataset of short stories in Myanmar and basic Myanmar essays. As a result, their system yielded an accuracy rate of 80% and illustrated that the proposed system is reasonable for Myanmar anaphora resolution.

Hybrid Approach
Several research techniques utilize both machine learning and linguistic knowledge approaches. Hybrid approaches have achieved considerable success in anaphora resolution such as (Nilsson, 2010;Holger, 2009;Hartrumpf, 2001;Chen and Ng, 2012). Dakwale et al. (2013) presented a hybrid approach to resolve entity-pronoun references in Hindi .They adopted the strategy based on dependency structures of a rulebased module to resolve simple anaphoric references, whereas supervised learning approach was used to resolve more ambiguous instances using grammatical and semantic features. The corpus used in their experiment was collected from tree bank data which contains news articles with the average size of 20 sentences. The test set contains 2162 entity pronouns and the test data contains 1071 entity pronouns. The results show that the proposed hybrid approaches that the rulebased system yielded a substantial F-measure of 0.6, which provides a suitable source of syntactic knowledge. Also, the supervised learning approach helps to achieve a significant improvement and semantic information further helps to improve resolution accuracy. Hinrichs et al. (2005) presented a hybrid approach for Computational Anaphora Resolution (CAR) of German that combines a rule-based pre-filtering component with a machine learning resolution. They applied morphological filters to retain only those NPs as potential that match a given pronoun in number and gender in the rule-based module. Meanwhile, the actual pronoun resolution performed in the module of a machine learning resolution is based on the Memory-Based Learning (MBL). These experiments were conducted on TüBa-D/Z tree bank of German newspaper text. In total, the corpus includes 15260 sentences and an average number of 19.46 sentences per text. Also, the number of pronouns in the TüBa-D/Z corpus is 7606 reflexive and personal pronouns, 2195 possessive pronouns and 99585 markables (i.e., potential antecedent NPs). The results show that the proposed model outperformed the results reported by (Schiehlen, 2004). It yielded an F-measure of 73.4%. D'Souza and Ng (2012) adopted both learning-based and rule-based approaches to handle anaphora resolution in biomedical literature. They hypothesized that both approaches had unique strengths and proposed the combination of their strengths in a hybrid approach for anaphora resolution in biomedical texts. The hybrid approach achieved an F-score of 60.9% on the BioNLP-2011 coreference dataset.

Data Annotation
In order to use the machine learning method, a suitable annotated corpus is needed. Therefore, in this study, the researchers used Quranic corpus, a corpus of the Quran annotated with antecedent references of pronouns. The Quran is characterized by very frequent use of anaphors and its comparatively large number of pronouns tagged with antecedent information. Table 1 shows the key statistics of the Quran's corpus.

System Architecture
This section presents a hybrid architecture for pronominal anaphora. The hybrid approach is based on the combination of rule-based filtering module and machine learning module. The following subsections discuss in detail the main steps of the each component respectively, including pre-processing task, morphological filtering, feature extraction and classification step.

Preprocessing
In the pre-processing stage, various NLP techniques are applied. These modules include tokenization, POS tagging and NP identification. The following describes these tasks briefly: Tokenization is the process of breaking up a stream of text into constitute tokens so that the to kenscan be fed into a morphological transducer or POS tagger for further processing. The tokenization module is responsible for identifying a word, a part of a word (or a clitic), a multiword expression, or a punctuation mark. Actually, Arabic is a clitic language. Arabic token may consist of several lexical items that have their own meaning and POS.
Part Of Speech (POS) is the process of assigning appropriate linguistic categories (noun, verb, adjective, etc.) of each word in a sentence that is found during the tokenization stage. The Arabic Statistical POS Tagger (ASPOST) is used (Albared et al., 2010;. ASPOST is a statistical POS tagger, which is trainable on different Arabic corpora. In addition, it provides the morphological features (gender, number, tense) that are assigned to each word. The system incorporates several methods of smoothing and handling unknown words.
In noun phrase identification, noun phrase chunking deals with extracting noun phrases from a sentence. It takes the input and leaves different sequences of POS-tagged tokens, as well as identifies the boundaries of NPs. A set of static NPs patterns parses noun phrases from the text. For our purpose, nearly 200 NPs patterns have been identified to parse Arabic NPs. The Arabic noun phrase chunker is a rulebased chunker and the NPs patterns were identified from previous linguistic knowledge.

Anaphora and Initial Antecedent Candidate Selection
In the initial phase, third personal pronouns, reflexive pronouns and possessive pronouns that refer to Noun Phrases (NPs) were marked according to their obviousness within a sentence. Next, potential candidates were selected as antecedents of the pronouns. Typically, all NPs preceding an anaphora are regarded as potential candidates for antecedents. Finding anaphoric accessibility space is a very crucial phase in the process of anaphora resolution. A search scope has to be identified, which is the space where the correct antecedent is likely to be found. The search scope of candidates for the antecedents of pronouns varies in different languages.
The search limit of 17 sentences was considered. Hence, for any anaphora, the most potential antecedents were found located within a window of 17 sentences preceding the sentence in which anaphora is present.

Morphological and Syntactic Filter
The purpose of morphological filtering is to reduce the large size of pairs of pronoun and a potential antecedent that match a given pronoun in number and gender. In order to pre-filter candidates based on morphological information for each type of anaphora (pronoun), the following requirements are needed: • All POS tags of expected antecedents that the anaphora may refer to need to be identified. To achieve this, all pronominal anaphora types and their antecedents manually analyzed. A set of pre-filtering rules was identified and formulated.
The following rule is an example of the morphological filtering rule. If (Gender (p) is plural or singular) then Number (p) = Number (n) and Gender (p) = Gender (n) else if (Gender (p) is dual) then Number (p) = Number (n) This rule admits a pair of a personal pronoun and a candidate antecedent, which is a common noun, a proper noun, another personal pronoun, a relative pronoun +VP, or a (substitutive) conditional particle + VP; if both elements are either singular or plural and their gender features are compatible, or both elements are dual.

Feature Extraction
Feature extraction is an extremely important task in Natural Language Processing (NLP), especially for Machine Learning (ML) techniques that require data to be represented as a feature vector of attribute and value pairs. The selection and definition of features have a significant influence for effective learningbased system. Consequently, these features were applied on the models for classification. In addition, this work also studies their effect on the performance of anaphora resolution.
An extensive set of features was defined for anaphora resolution. The used feature set composed of 16 features. All of these features describe the properties of either anaphora, its antecedent candidate or their relationship. Each training or testing instance is represented by a feature vector. All features used in this study are summarized in Table 2. Each anaphorantecedent pair in the corpus is represented using the following feature vector: ((F1: value1, F2: value2, …,Fm: value m)

Classification Methods
The training data for the k-Nearest Neighbor (k-NN) classifier is extracted from the Quranic data set training corpora. For any pair of markables ana and candi that is coreferent, a positive training sample is extracted. Negative training samples are generated such that for any positive instance of a pair ana and candi, all the intervening negative instances between anaphor and candidate are extracted. With this approach, given the training data, the classifier constructs a k-Nearest Neighbor (k-NN). k-NN is a well-known instance-based classifier. k-NN has been known as an effective approach to a broad range of text classification problems. k-NN has been known as lazy learners, since it defers the decision on how to generalize beyond the training data until each new query instance is encountered. In the k-NN algorithm, a new input instance should belong to the same class as its k-NN in the training data set. After all the training data are stored in memory, a new input instance is classified with the class of k-NN among all stored training instances. To categorize an antecedent, the k-NN classifier ranks the antecedent's neighbors among the training antecedents. Then, the k-NN uses the class labels of the k most similar neighbors.
Given a test antecedent r, the system finds the k-NN among the training antecedents. The similarity score of each nearest neighbor antecedent to the test antecedent is used as the weight of the classes of the neighbor antecedent. The weighted sum in k-NN classification can be written as follows: where k-NN (r) indicates the set of k-NN of antecedent r. If r j belongs to, then δ (r j , c i ) is equal to1; otherwise, it is 0. For test antecedent r, it should belong to the class that has the highest resulting weighted sum.

Advantages of the Proposed Method
Hybrid approaches offer a number of advantages based on the combination of rule-based and machine-learning based approaches to anaphora resolution, such as: • The hybrid approach leverages the strengths of both rule-based and machine approaches • Results indicate that this hybrid approach for anaphora resolution has shown to outperform both rule-based and machine learning approaches • The hybrid-based approaches are considered particularly attractive because they can automatically learn resolution regularities from the training data, which largely reduces human effort in designing and implementing the resolution strategies • The hybrid approaches learn from training data and are consequently more flexible. These approaches can combine different sources of information (features) in a soft way, in which the relevance of each feature is balanced by its prominence and frequency in the training instances • Nevertheless, machine learning approaches cannot always account for such specific restrictions that are based on precise observations. For this reason, more complex models combining both machine learning and rule-based methods of hybrid approaches are often used In general, the main disadvantages of the proposed approaches are the approaches may require careful reengineering of the knowledge-based part for different languages.

Experimental Setup
To evaluate the anaphora resolution and efficiently integrate different feature sets and classification algorithms to synthesize a more accurate classification procedure, several experiments were conducted. Experiments were conducted on the Quran corpus for Arabic anaphora resolution. Algorithms were evaluated using 5-fold cross-validation approach.
To measure the performance of these classification methods, experimental results were sorted into the following: True Positive (TP) is the number of instances correctly classified as coreferent, False Positive (FP) is the number of instances incorrectly classified as coreferent, False Negative (FN) is the number of coreferent instances classified as noncoreferent and True Negative (TN) is the number of non-coreferent instances correctly classified. However, F1measures were used in this study. The following describe these metrics:

Experimental Results
In this study, the types of pronouns were restricted to third-person pronouns, reflexive pronouns and possessive pronouns. The aim of the morphological filter is to reduce the size of the set of candidates by removing pairs that are not morphologically compatible. In this experiment, the size of pairs that are retained or filtered with respect to their morphological compatibility were assessed. The morphological filter removed 18345 morphologically incompatible pairs, which reduced from 60119 pairs, a decrease of approximately 69.4%. Table 3 shows the recall and precision achieved using the morphological filter.
To evaluate the resolution system's performance, experiments were performed on each feature set using the classifier (k-NN). The model was trained and tested using the Quranic Arabic dataset. The experimental results of the performance of the model are summarized in Table 4. Furthermore, the effect and relative importance of features for anaphora resolution model on the Quranic Arabic dataset were investigated. The main aim is to determine the importance of each attribute of a large feature set and how this attribute can be used to improve anaphora resolution. To perform this, this step began with the full set of features, which yields the optimal results. Then, the experiments were repeated by leaving out one feature each time and the modified performance of the resolver module was recorded at a time, which decrease the performance by a certain amount. The higher this amount, the more important the information that the feature contributes. Table 5 summarizes the result of the feature selection.
Based on the results, five features that constitute the most important features for the performance of the system were Candi-subj NP, Frequency Indicator (LR), Nearest NP, Candi-def NP and Candi-First NP (FNP), which appeared to be important. These features give relatively high importance to the system; hence increasing the system's performance. The least important features were Candi Sent Dist, Ref-pro and Ana-pron-type. These features yielded low importance to the system; hence, a slight decrease in the system's performance. In can be concluded that the candidate-related features appeared to be important than other features; hence, substantially improve the system's overall performance.

Conclusion
A hybrid approach for resolving Arabic pronominal anaphora was presented. The model discussed in this study is based on the combination of the advantages of rule-based methods and learningbased methods. An in-depth study has been conducted on each feature to determine the importance and the effect of each feature for anaphora resolution. The effectiveness of the proposed approach was evaluated on the Arabic Quranic data set. The results show that hybrid approaches are more appropriate for the anaphora resolution task compared to standalone classification approaches. It is also a feasible method to resolve Arabic pronominal anaphora resolution. Furthermore, the results indicate that some attributes are more important than other attributes. In the future, the researchers aim at the identification of reference type and resolution of other types of anaphora, as well as to incorporate more syntactic features into the feature set, such as grammatical role. These features may be helpful to improve the performance of anaphora resolution in Arabic.