Research Article Open Access

Model Classification for Predicting the Post-Translational Modification (PTM) Glycosylation in Sequence O Using an Extreme Gradient Boosting Algorithm

Damayanti1,2, Sutyarso3, Akmal Junaidi4 and Favorisen Rosyking Lumbanraja4
  • 1 Faculty of Mathematics and Natural Science, Universitas Lampung, Lampung, Indonesia
  • 2 Faculty of Engineering and Computer Science, Universitas Teknokrat Indonesia, Lampung, Indonesia
  • 3 Department of Biology, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Lampung, Indonesia
  • 4 Department of Computer Science, Faculty of Mathematics and Natural Sciences, Universitas Lampung, Lampung, Indonesia

Abstract

Post Translational Modification (PTM) is an important mechanism involved in regulating protein function. Post-translational modification refers to the addition of covalent and enzymatic modifications of proteins in protein biosynthesis, which has an important role in modifying protein function and regulating gene expression. One of the post-translational modifications is glycosylation. Glycosylation is the addition of a sugar group to a protein structure. One type of glycosylation is glycosylation, which occurs in sequence O. Glycosylation has been linked to several illnesses, including diabetes, cancer, and the flu. Therefore, it is important to anticipate the occurrence of glycosylation by carrying out predicted glycosylated or non-glycosylated data. Glycosylation prediction has been widely done using manual laboratory techniques, which results in the prediction process being long and expensive for lab equipment. To overcome this, computerized data is needed that can predict glycosylation more quickly. The data used is glycosylation data on sequence O obtained from the UniProt website, which can be openly accessed. This study aimed to improve the accuracy of post-translational modification glycosylation in sequence O prediction using the method of extreme gradient boosting as a framework for gradient enhancement that tends to be faster. This accuracy is increased by conducting feature extraction experiments with the following types: AAIndex, hydrophobicity, sable, composition, CTD, and PseAAC. Feature selection uses the MRMR approach. Evaluation using k-fold cross-validation. The results of this study indicate the prediction performance of post-translational modification glycosylation in sequence O with an accuracy value of 100%. The study's findings indicate that the XGBoost algorithm performs better than other research that has been conducted.

Journal of Computer Science
Volume 20 No. 7, 2024, 758-767

DOI: https://doi.org/10.3844/jcssp.2024.758.767

Submitted On: 6 October 2023 Published On: 29 April 2024

How to Cite: Damayanti, Sutyarso, Junaidi, A. & Lumbanraja, F. R. (2024). Model Classification for Predicting the Post-Translational Modification (PTM) Glycosylation in Sequence O Using an Extreme Gradient Boosting Algorithm. Journal of Computer Science, 20(7), 758-767. https://doi.org/10.3844/jcssp.2024.758.767

  • 1,525 Views
  • 865 Downloads
  • 0 Citations

Download

Keywords

  • Glycosylation
  • XGBoost
  • Machine Learning
  • Sequence