The Relevance of Bioinformatic Tools in the Study of Polymorphisms of the B4GALNT2 Gene and its Association with Cancer

Corresponding Author: Atika Eddaikra Department of Cell Biology and Physiology, Faculty of Nature and Life, SAAD DAHLEB University Blida1, Ageria, Algiers; B.P:270 Route de Soumâa – Blida, Ageria Email: aeddaikra@yahoo.fr Abstract: The human gene B4GALNT2 encodes an enzyme (β1,4-Nacetylgalactosaminyltransferase II) that controls the expression of the blood group Sd carbohydrate’s antigen. This gene is located in position 17q21,32 and consists of 11 exons. The characterization and understanding of genetic variation is a real challenge in human genetics, both for healthy individuals and diseased ones. The in silico study of the B4GALNT2 gene’s polymorphism using a bioinformatic methodology by means of analyzing various databases and open source web browsers has shown that this gene is characterized by a polymorphic profile that has a very large number of Cosmic SNPs associated with different types of cancer. The prediction of the 3D structure in silico is an important step to better understand the overall architecture of the B4GALNT2 protein. The chosen model this study is one of chondroitin synthase with a recovery percentage of 20.10% relative to the target sequence. Our findings suggest that these cosmic polymorphisms are at the origin of a cellular disorder responsible for the initiation, birth and proliferation of tumors. Bioinformatics has become an indispensable tool in identifying and predicting the function of the B4GALNT2 gene and its relation to cancer.


Introduction
The human gene Beta-1,4-N-Acetyl-Galactosaminyltransferase II (B4GALNT2) (ID: 124872, OMIM: 111730) is located on the 17q21.32 chromosome (NC_00017.11). It consists of 11 exons and encodes a glycosyltransferase that catalyzes the last step in the biosynthesis of the human Sd a antigen (https://www.ncbi.nlm.nih.gov/). GalNAc is the immunodominant glycan of the Sd a antigen that is present in more than 90% of red blood cells (Montiel et al., 2003). The B4GALNT2 gene is primarily expressed in the Gastrointestinal (GI) epithelium of the colon, though lower levels of expression can be found in the kidneys, ileum, stomach and rectum (Kawamura et al., 2005).
Single Nucleotide Polymorphisms (SNPs) are one of the most common types of genetic variations in the human genome. Single Nucleotide Polymorphisms (SNPs) in genes that regulate DNA mismatch repair, cell cycle regulation, metabolism and immunity, are associated with genetic susceptibility to cancer (Deng et al., 2017). The characterization and understanding of genetic variation is a real challenge in human genetics, both for healthy individuals and for patients. SNPs are point variations of a single nucleotide. They are the smallest form of polymorphism because they affect only one base pair. They are distributed throughout the human genome and are the most common form of genetic variation. Indeed, they account for more than 90% of the differences between individuals. In addition, cancer of the digestive tract is a real public health problem. This particular type of cancer combines the involvement of genetic and environmental factors of many genes.
The present in silico study aims to identify the B4GALNT2 gene polymorphisms associated with cancer.

Results and Discussion
The identifier of our Alignment request is ID: NM001159387.2. Alignment of the FASTA nucleotide sequence for the variant 2 transcript of the B4GANT2 gene using the NCBI BLAST tool generated a length of 8657 with 338 blast hits against 293 sequences. We retained only those with a similarity score of 99% referring to 4 Blastn hits (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
The results of the search for SNPs variations of the B4GALNT2 gene using the UCSC navigator enable us to visualize 180 common SNPs. Results in the RS track refer to synonymous coding SNPs (green), non-synonymous coding (red), intronic regions (black) and untranslated regions (blue) (Fig. 1).
The configuration of the tracks allows us to visualize a cosmic track related to clinical studies on cancer. This track shows that the numbers of accessions in red are distributed over the entire length of the B4GALNT2 gene. Indeed, in recent years, several Genome-Wide Association Studies (GWAS) have identified genetic variants or susceptibility locus associated with various human diseases, including cancer (Stadler et al., 2010).
We also find, at the level of the comic track, a variant A40D which presents the genomic positions of natural and artificial amino acid variants in the UniProt/SwissProt database. Data related to these variants were selected from scientific publications by UniProt.
UCSC results show that the B4GALNT2 gene has 18.7% CpG islets. These are presented by a sequence rich in CG dinucleotides.
Our results suggest that CG richness will prevent transcript factors from binding to DNA and thus from transcription. CpG islets are generally common near transcription initiation sites and may be associated with promoter regions. Normally a C base (cytosine) followed immediately by a G base (guanine) (CpG) is rare in vertebrate DNA, since C in this arrangement tends to be methylated and linked to the epigenetic regulation of gene expression (Hanna et al., 2005).
We also note at the level of the tracks generated using UCSC, the existence of a source variant (Uniprot/swissProt) with the identifier: Var_049238. This variant is a mutation at the coding sequence at position 40 (Ala> Asp): chr17: 49133143-49133145; band17q21.32 genomic size "3" stand +.   The 40D variant generated the snip reference "RS: 7207403". For this same RS the ensemble navigator allowed us to visualize the distribution of genotypic and allelic frequencies in populations. Similarly, the 40D variant allowed us to switch to the following link (http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/) (Fig. 2).
A simple click on RS 7207403 allowed us to visualize the distribution of ancestral alleles and alleles derived from different populations according to geographic location (Fig. 3) The prediction of the 3D in silico structure is an important step to better understand the overall architecture of this structure. To perform this predicition, In total, 291 models were found to match the target sequence (B4GALT2 protein sequence). This list was filtered by a heuristic method to 50. Our chosen model is that of Chondroitin Synthase with a recovery percentage of 20.10% in relation to the target sequence (Fig 4 and 5). Variant Q8NHY0: Variant p.Pro459His generated by the SwissModel Navigator (https://web.expasy.org/variant_pages/VAR_035990.ht ml).
Our findings suggest that this variant was found in a colorectal cancer sample. It is identified as a somatic mutation. The examination of the DISGENET database allowed us to identify pathologies associated with B4GALNT2 genes such as colorectal cancer, carcinoma of the stomach, malignant neoplasm and muscular dystrophy. (https://www.ebi.ac.uk/gwas/search?query= B4GALNT2).
A study in 2014 reports that, in the absence of cancer, the Sd a epitope of the glycan is present. However, the absence of this epitotpe will upset the balance by causing a new Sd a phenotype. In this case, it has been observed that another antigen is overexpressed; sialyl Lewis Antigen (sL ex ). This antigen has been observed in multiple tumors. In colon cancer, the downregulation of the Sd a epitope plays a potential role in the overexpression of Lewis sialyl antigens, increasing the formation of metastases. In addition, it is involved in the lytic function of murine cytotoxic T lymphocytes. and because of this, the expression of Sd a antigen has a significant impact on the physiology and pathology of different biological systems (Dall'Olio et al., 2014).

Conclusion
In light of this work, we can conclude that the cosmic polymorphisms we have dealt with are at the origin of a cellular disorder responsible for the initiation, birth and proliferation of tumors. These polymorphisms can be due to an imbalance of the epigenetic balance (methylation/acetylation). We suggest that the SNPs identified by the web browsers we have used can be considered for case-control studies in order to identify the susceptibility and association of B4GALNT2 gene polymorphism with cancer. They can then be used as genetic markers that will identify the risk exposure to gastrointestinal cancer. Bioinformatics is indeed an essential tool in the identification of SNPs and the prediction of B4GALNT2 gene function and its relation to cancer.

Author's Contributions
Eddaikra Atika: Participated in all experiments, coordinated the data-analysis and contributed to the writing of the manuscript. Designed the research plan and organized the study and coordinated the mouse work.