In silico Analysis of 4CL Family in Scutellaria baicalensis through Biocomputational Tools and Servers

1Zhanjiang Key Laboratory of Biomedicine and Cardiovascular Disease Prevention in West Guangdong, Guangdong Medical University, Zhanjiang 524001, China 2Cardiovascular Medicine Center, Affiliated Hospital of Guangdong Medical University, Zhanjiang 524001, China 3Tibetan Collaborative Innovation Center of Agricultural and Animal Husbandry Resources, Tibet Agricultural and Animal Husbandry College, Nyingchi 860001, China 4Laboratory of Vascular Surgery, Guangdong Medical University, Zhanjiang 524001, China 5Laboratory of Cardiovascular Diseases, Guangdong Medical University, Zhanjiang 524001, China


Introduction
Bioinformatics is a science, which uses data information based on mathematics and computer science to understand biology. In the post genome era, researches of the protein structures and functions are the focus issues of molecular biology field and today, a number of computational software's and online servers are rapidly developed for identification and characterization of proteins and their encoded nucleotide acid sequences (Sivakumar et al., 2007;Lei et al., 2009). The physiochemical properties and biological function of the proteins can be well studied with bioinformatics methods (Ling et al., 2007;Lei et al., 2010).
Flavonoids are the important plant secondary metabolites, which are necessary for flower coloration, interspecies interaction, disease defense, UV protection and environment challenges (Stefan and Axel, 2005;Chen et al., 2014). Flavonoids are synthesized through phenylpropanoid pathway (the partial elements were represented in Fig. 1) and many of its enzymes involved have already been determined. 4-coumarate:coenzyme A ligase (4CL), locating on branch point of the phenylpropanoid derivative biosynthesis, catalyzes the formation of 4-coumarate-CoA from 4-coumarate and coenzyme A (Gross and Zenk, 1974;Lei et al., 2011a) and then the 4-coumarate-CoA served as substrates for various important reactions involved in branch metabolism of phenylpropanoid derivative including flavonoids (Dixon and Paiva, 1995;Hahlbrock and Scheel, 1989;Holton and Cornish, 1995). So 4CL is one key enzyme of flavonodids biosynthesis pathway (Fan et al., 2007). Many studies revealed that 4cl gene was a multigene family: Two 4cl genes are cloned from Scutellaria baicalensis Georgi and three 4cl genes are isolated and characterized in Hyvrid Poplar (Allina et al., 1998). With further 4CL enzymological identification, genetic mutation and crystal modeling, the studies on the structure and evolution were implemented extensively (Cukovic et al., 2001;Schneider et al., 2003) and then some highly conserved enzyme active sites residues were revealed, such as Box I (SSGTTGLPKGV) and Box II (GEICIRG) (Stuible and Kombrink, 2001), sbd I (N-terminal domain) and abd II (C-terminal domain) .

Fig. 1. Flavonoid biosynthetic pathway in Scutellaria cells
Individual expression of 4cl family is regulated by developmental process (Zhao et al., 2003), tissue specificity (Kumar and Ellis, 2003) and environmental stress (Ehlting et al., 1999), which just answered for the structural diversity of flavonoid compounds and explained their various biological function. Nevertheless, little information is available about molecular structure and physichemical function of 4CL family in Scutellaria baicalensis (Lei and Shui, 2014).
S. baicalensis is mainly distributed in East Asia and its dry roots were prevalently used to treat inflammatory and bacterial diseases as old-line China traditional medicine (Yamamoto, 1991;Huang et al., 2012;Xue et al., 2015). In present study, the bioinformatic analyses of 4cl family from S. baicalensis were completed, which would pave for further studies of physichemical properties of 4CL protein family and its related molecular mechanism of flavonoid biosynthesis.

Database Analyses
Two complete sequences with the coding regions (CDS) of Sb4cl gene were obtained from NCBI databases: 4CL1 (Accession: AB166767), 4CL2 (Accession: AB166768) and the accession numbers of their corresponding amino acid sequences were BAD90936 (4CL1) and BAD90937 (4CL2).

Bioinformatic Analyses
Comparative bioinformatic analysis of Sb4cl was performed at the websites including http://www.expasy.org and http://www.ncbi.nlm.nih.gov. Multiple alignment analysis of the amino acid sequences of Sb4CL and 4CLs from other plant species was finished with Vector NTI Suite 8 (Lei et al., 2009). The physicochemical properties was analyzed by ProtParam (Gasteiger et al., 2005). The transmembrane helices, subcellular location and hydrophobicity in target proteins were predicted by TMHMM Server v.2.0 (Ikeda et al., 2002), TargetP 1.1 Server (Kristin and Siegfried, 2004) and ProtScale (Kyte and Doolittle, 1982) orderly. The motifs of 4CL proteins were searched by ScanProsite. The conserved domains and coiled-coil structures were scanned by CDD (Marchler-Bauer and Bryant, 2004) and COILS (Lei et al., 2008) server, respectively. Amino acid sequences of Sb4CL and 4CLs from five species of plants were aligned using ClustalX software (Thompson et al., 1997) and subsequently a phylogenetic tree was successfully constructed by Maximum-Likelihood (MP) method with 1000 replicates and another tree was reconstructed by Neighbor-Joining (NJ) with 1000 replicates and meanwhile their reliability of each node was determined by bootstrap calculation using MEGA4.1, respectively (Saito and Nei, 1987;Kumar et al., 2008). Finally, the three-dimensional (3D) structures of Sb4CL sequences was modeled based on homological method by Swiss-Modeling (Guex and Peitsch, 1997;Schwede et al., 2003;Arnold et al., 2006) and then edited and displayed by WebLab ViewerLite 4.2.

Analyses of Structure and Properties
Nucleotide acid sequences of two 4cl genes were analyzed by the Vector NTI Suite 8 software. They had the same length of Open Reading Frame (ORF), the star codon (ATG) and the stop codon (TGA) and the only differentiation was that there was one base in the 5' Untranslated Region (UTR) of 4cl2, but forty-one in 4cl1. Computed using the online tools ProtParam, some physicochemical parameters were almost identical about 4CL members as shown in the Table 1, such as the formula, isoelectric point (PI), molar extinction coefficient, grand average of hydropathicity (GRAVY) and total number of negatively and positively charged residues and so on.
The tool GOR4 was used for the secondary structure prediction. Sb4CL1 had mixed secondary structure, i.e., random coil, α-helix and extended strand shared a proportion of 47.54, 33.52 and 18.94%, respectively. There was similar composition proportion in Sb4CL2 as shown in Fig. 2 and the coil structures were very high due to abundant hydrophobic praline and flexible glycine amino acids.

Cytological Characterization and Phylogram Analysis
Subcellular localization prediction with the help of online TargetP 1.1 Server inferred that Sb4CL family proteins localized in cytosol without transit peptide. TMHMM Server v2.0 identified no transmembrane region in two 4CL proteins, implying that Sb4CL catalyzed a series of reaction and substrates in cytoplasm without transportation.
After multiple alignments by ClustalX sofeware, two phylogenetic trees of 4CLs were successively constructed from seven plants by MEGA 4.1 with the ME and NJ methods. The most similar result in Fig. 3 showed that Sb4cl was most closed relative to each other and the genetic distance was determined to reach 100 nearly.
The tool CDD recognized the presence of an Acs domain in each Sb4CL protein, suggesting Sb4CL belong to 4CL family. Furthermore, the coiled-coil structure within the Sb4CLs proteins was visualized using COILS online server, polypeptide chain between 368-382aa shaped an obvious coiled-coil structure, confirming there were important function sites located in this region, which was just inlaid within the Acs domain.
Furthermore, the amino acid sequences multi-alignment of Sb4CL family and 4CLs from other four plant species was performed in Vector NTI Suite 8 and Fig. 4 showed the result, in which six highly conserved regions were found orderly from C-terminal to N-terminal: I SSGTTGLPKGV, II QGYGMTE, III GEICIRG, IV GWLHTGD, V VDRLKELIK, VI PKSPSGKILR.
And then, the three-dimensional modeling of the Sb4CLs proteins was visualized using Swiss-Modeling on the basis of the Firefly Luciferase in complex with bromoform and displayed by WebLab ViewerLite. As shown in Fig. 5, some crucial functional domains were marked on the 3-D structure map.

Discussion
Molecular structure and physicochemical properties were analyzed by some bioinformatic tools. Forty-one bases were found in the nucleotide acid sequences of 4cl1 gene, indicating that replication and transcription of Sb4cl2 gene were impossibly regulated by 5'UTR. Some physicochemical parameters showed high similarity between Sb4CL members and it was important to conclude that Sb4cl family was a group of genes with significant genetic conservation and functional association.
The abundant coil structures create effectively links in polypeptide chains and disrupting ordered secondary structure. It appeared that Sb4CL family was associated to ligation of hydroxycinnamate ester and amides. Sb4CL proteins were observed to locate in cytosol, consistent with Geza Hrazdina's report that flavonoid was synthesized in cytoplasmic matrix (Hrazdina, 1992).
4cl gene has been reported in various plants and the researches on its evolutionary were always the hotpoint in the field of the flavonoid metabolic regulation and genetic engineering (Lei et al., 2011b). It would be interesting to investigate the Sb4cl family evolutionary position in the phylogenetic trees (Huang et al., 2008). Belonging to Scutellaria 4cl gene family, Sb4cl1 was most closed relative to Sb4cl2 in evolutionary level, which also strongly suggested that 4CL was a conserved and committed enzyme of the flavonoid biosynthetic pathway.
Acs domain were identified in each Sb4CL protein, answer for rate-limiting step involved in flavonoids precursor synthesis pathway, i.e., the formation of CoA esters. Additionally, the domain I (i.e., Box I mentioned aboved) was considered as AMP binding motif in 4CL catalytic reaction (Challis et al., 2000), which just coincided with the PROSITE prediction that domain I was noted the AMP-binding domain signature. Therefore, domain I SSGTTGLPKGV has become one of the symbols of the adenylate synthase superfamily (Fulda et al., 1994;Stuible et al., 2000) and meanwhile, domain III (i.e., Box II mentioned aboved) was absolutely conserved in all 4CL proteins, whose central C residue directly participated in catalysis process (Stuible et al., 2000).

Conclusion
Based on computational software packages and online servers, bioinformatics analysis can provide useful characterization and prediction of proteins structure and function. In our current study, the nucleotide acid sequences and corresponding amino acid sequences of 4-Coumarate:coenzyme A ligase family from S. baicalensis were aligned, analyzed and modeled by some bioinformatic tools and their molecular structures and biochemical functions prediction were obtained as well. The results showed that there was almost no differentiation of molecular structures and physicochemical properties between two members of Sb4CL family, confirming their function relating to flavonoid biosynthesis. The study will be significant in lending theoretical supports for researches of physiochemical properties of 4CL protein and molecular mechanism of flavonoids biosynthesis.

Author's Contributions
Guoming Li: Performed the study and/or contributed to data analysis and interpretation.
Xiaozhong Lan: Performed the study, wrote the manuscript and/or contributed to data analysis and interpretation.
Xiaorong Shui and Shian Huang: Performed the study and/or wrote the manuscript.
Can Chen: Takes full responsibility for the work as a whole, including the study design, access to data and the decision to submit and publish the manuscript.
Wei Lei: Wrote the manuscript and takes full responsibility for the work as a whole, including the study design, access to data and the decision to submit and publish the manuscript.

Ethics
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.