Topic-Transformer for Document-Level Language Understanding

Oumaima Hourrane; El Habib Benlahmar

doi:10.3844/jcssp.2022.18.25

Research Article Open Access

Topic-Transformer for Document-Level Language Understanding

Oumaima Hourrane¹ and El Habib Benlahmar¹

¹ Hassan II University, Morocco

Abstract

As long as natural language processing applications are considered prediction problems with insufficient context, usually referred to as a single sentence or paragraph, this does not reveal how humans perceive natural language. When reading a text, humans are sensitive to much more context, such as the rest or other relevant documents. This study focuses on simultaneously capturing syntax and global semantics from a text, thus acquiring document-level understanding. Accordingly, we introduce a Topic-Transformer that combines the benefits of a neural topic model that captures global semantic information and a transformer-based language model, which can capture the local structure of texts both semantically and syntactically. Experiments on various datasets confirm that our model has a lower perplexity metric compared to standard transformer architecture and the recent topic-guided language models and generates topics that are conceivably coherent compared to those of regular Latent Dirichlet Allocation (LDA) topic model.

Journal of Computer Science

Volume 18 No. 1, 2022, 18-25

DOI: https://doi.org/10.3844/jcssp.2022.18.25

Submitted On: 21 September 2021 Published On: 22 January 2022

How to Cite: Hourrane, O. & Benlahmar, E. H. (2022). Topic-Transformer for Document-Level Language Understanding. Journal of Computer Science, 18(1), 18-25. https://doi.org/10.3844/jcssp.2022.18.25

Copyright: © 2022 Oumaima Hourrane and El Habib Benlahmar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

5,175 Views
2,615 Downloads
0 Citations

Download

Keywords

Neural Topic Model
Neural Language Model
Topic-Guided Language Model
Document-Level Understanding
Long-Range Semantic Dependencies