Research Article Open Access

Parsing Arabic Texts Using Rhetorical Structure Theory

H. I. Mathkour, A. A. Touir and W. A. Al-Sanea

Abstract

Problem Statement: Processing texts based on rhetorical structure theory has shown interesting results. Rhetorical Structure Theory (RST) improves the ability of extracting the semantic behind the processed text. Different applications such as information retrieval, text summarization, and text generation have proved to give better result using RST. The applicability of RST to process and understand texts has been studied in several languages, but little is devoted to the Arabic language. Given an Arabic text, the more accurate the Arabic rhetorical relations are extracted the more useful the subsequent text representation will be. This, in turn, leads to a better understanding of the text and, hence, better results. Approach: We show a framework of applying RST on Arabic language in order to rhetorically parse, understand, and summarize Arabic texts. We discuss a new approach that extracts the Arabic rhetorical relations that is based on studying the English relations, analyzing Arabic corpus and understanding and using the Arabic cue phrases. Results: We obtain rhetorical relations based on Arabic cues. We show how this approach contributes in improving the understanding of the Arabic text. The study addresses the relations that rise from cues that act as connectors among Arabic clauses as well as words. Conclusion: The introduced approach suggests that realizing text coherency in the process of obtaining Arabic rhetorical relations suits the characteristics of the Arabic language and avoids the disadvantages of previous approaches. The obtained Arabic rhetorical relations will make it possible to build rhetorical trees for Arabic texts to apply in text summarization and generation, information retrieval, and text segmentation while preserving the coherency of the text.

Journal of Computer Science
Volume 4 No. 9, 2008, 713-720

DOI: https://doi.org/10.3844/jcssp.2008.713.720

Submitted On: 12 May 2008 Published On: 30 September 2008

How to Cite: Mathkour, H. I., Touir, A. A. & Al-Sanea, W. A. (2008). Parsing Arabic Texts Using Rhetorical Structure Theory. Journal of Computer Science, 4(9), 713-720. https://doi.org/10.3844/jcssp.2008.713.720

  • 3,663 Views
  • 3,005 Downloads
  • 11 Citations

Download

Keywords

  • Rhetorical Structure Theory (RST)
  • parsing Arabic texts
  • rhetorical relations
  • Arabic corpus analysis
  • Arabic cue phrases
  • Arabic text coherence