American Journal of Biochemistry and Biotechnology

A New Text Mining Approach for Finding Protein-to-Disease Associations

Hisham Al-Mubaid and Rajit K. Singh

DOI : 10.3844/ajbbsp.2005.145.152

American Journal of Biochemistry and Biotechnology

Volume 1, Issue 3

Pages 145-152


Discovering significant relationships between biological entities from text documents is an important task for biologists in order to develop biological models for research and discovery, especially with the existing gigantic amounts of biomedical documents and the rate at which they are increasing everyday. We propose a new text mining method to extract associations between biological entities from text documents; and we focus and apply the method in our experiments on discovering proteins-to-diseases associations. The proposed method uses two sets of documents on the topic of interest [a negative set and positive (or relevant) set] and utilizes the concepts of expectation (ex), evidence (ev) and Z-scores in combining positive and negative evidences in determining the significant associations. Moreover, the method offers an efficient way to handle protein names, aliases and abbreviations and to disambiguate them from common abbreviations, gene symbols and such. We evaluated the method in discovering protein-to-disease associations from Medline abstracts and the results are very encouraging. We confirmed the correctness of the results, in each experiment, through articles from Medline. Our method was able to discover associations between certain proteins and various diseases like Alzheimer, Creutzfeldt-Jakob, Crohn Disease, Dengue, Jaundice, Lung cancer and more. For example, in Alzheimer test, the method ran on 83,933 abstracts and discovered that Alzheimer has significant association with 6 proteins, among them, Amyloid beta A4 protein precursor, Apolipoprotein E precursor and Presenilin 1 [PMIDs: 8596911, 1465129, 8346443, 12614323, 8766720 and 8878479]. We further tested our method on some already discovered and published relationships between genes and diseases and the method was also successful in supporting those discoveries.


© 2005 Hisham Al-Mubaid and Rajit K. Singh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.