Research Article Open Access

A New Text Mining Approach for Finding Protein-to-Disease Associations

Hisham Al-Mubaid1 and Rajit K. Singh1
  • 1 University of Houston, United States


Discovering significant relationships between biological entities from text documents is an important task for biologists in order to develop biological models for research and discovery, especially with the existing gigantic amounts of biomedical documents and the rate at which they are increasing everyday. We propose a new text mining method to extract associations between biological entities from text documents; and we focus and apply the method in our experiments on discovering proteins-to-diseases associations. The proposed method uses two sets of documents on the topic of interest [a negative set and positive (or relevant) set] and utilizes the concepts of expectation (ex), evidence (ev) and Z-scores in combining positive and negative evidences in determining the significant associations. Moreover, the method offers an efficient way to handle protein names, aliases and abbreviations and to disambiguate them from common abbreviations, gene symbols and such. We evaluated the method in discovering protein-to-disease associations from Medline abstracts and the results are very encouraging. We confirmed the correctness of the results, in each experiment, through articles from Medline. Our method was able to discover associations between certain proteins and various diseases like Alzheimer, Creutzfeldt-Jakob, Crohn Disease, Dengue, Jaundice, Lung cancer and more. For example, in Alzheimer test, the method ran on 83,933 abstracts and discovered that Alzheimer has significant association with 6 proteins, among them, Amyloid beta A4 protein precursor, Apolipoprotein E precursor and Presenilin 1 [PMIDs: 8596911, 1465129, 8346443, 12614323, 8766720 and 8878479]. We further tested our method on some already discovered and published relationships between genes and diseases and the method was also successful in supporting those discoveries.

American Journal of Biochemistry and Biotechnology
Volume 1 No. 3, 2005, 145-152


Submitted On: 17 September 2005 Published On: 30 September 2005

How to Cite: Al-Mubaid, H. & Singh, R. K. (2005). A New Text Mining Approach for Finding Protein-to-Disease Associations. American Journal of Biochemistry and Biotechnology, 1(3), 145-152.

  • 3 Citations



  • Biomedical text mining
  • information extraction
  • text mining
  • bioinformatics