Structure Prediction and Binding Site Analysis of Hepatotoxic Microcystin-LR Degrading MlrC-Like Protein from Burkholderia sp. using Computational Approaches

Corresponding Author: Dhananjaya P. Singh ICAR-National Bureau of Agriculturally Important Microorganisms, Indian Council of Agricultural Research, Kushmaur, Maunath Bhanjan-275103, UP, India Email: dpsfarm@rediffmail.com Abstract: Microcystin-LR (MCYST-LR) is a well characterized hepatotoxic heptapeptide produced by various species of cyanobacteria including Microcystis aeruginosa. Burkholderia is a genus of bacteria with cyanobacterial toxins degrading property. This study predicts the structure of microcystin degrading MlrC-like protein from Burkholderia sp. strain CCGE1002 that has microcystin degradation capability. Binding interaction of MlrC-like protein with MCYST-LR was studied. Threedimensional model of MlrC-like protein was generated using composite modeling based I-TASSER server. The model was further assessed through different computational approaches. The generated model was found comparable to experimental structures. MCYST-LR was used for docking with predicted model to investigate ligand-protein interaction. The study provides the structural insight into the binding mode of MlrC-like protein of Burkholderia sp. with MCYST-LR and could be further helpful in designing modeling inhibitors for MCYST-LR.

Synthesis and secretion of microcystins by different cyanobacterial species into the water mainly causes toxicity of water bodies and due to this, the problem has gained worldwide attention (Carmichael et al., 1985). Many variants of MCYSTs including the most potent MCYST-LR (leucine-arginine) are prominent hepatotoxins and chronic usage even in little quantity in drinking water leads to tumor growth in the human liver (Carmichael, 1994;Falconer, 1991). Owing to these facts, these toxins are grouped as ''probable carcinogenic to humans (group 2B)'' via the International Agency for Research on Cancer (Grosse, 2006).
Microcystins possess stable chemical structure in water and thus different water treatment approaches including filtration, coagulation and flocculation are found inefficient in decreasing the concentration of these toxins in the fresh water reservoirs (Manage et al., 2009). Thus, the risk associated with their toxicity remains very high. Presence of toxic cyanobacterial blooms in numerous natural freshwater bodies and possible use of such water in human consumption initiated work on biodegradation of cyanotoxins (Lam et al., 1995). Diversity of microbial species with ability of microcystin degradation and nodularin with tentative recognizance of novel degradation intermediates were reported (Manage et al., 2009). It was shown that MCYST can undergo biodegradation by certain aquatic bacteria like strains of the genus Sphingomonas (Saitou et al., 2003;Valeria et al., 2006) owing to the existance of an enzymatic degradation mechanism. Certain other bacteria like Paucibacter toxinivorans, Sphingosinicella microcystinivorans in Japan and Burkholderia (Lemes et al., 2008) were also reported as biodegraders of these cyanotoxins. The biodegradation mechanism is reported to involve a gene cluster mlrA, mlrB, mlrC and mlrD (Bourne et al., 2001;Saitou et al., 2003;Imanishi et al., 2005;Manage et al., 2009) via which particular variants, e.g., [D-Leu 1 ]MC-LR are recognized as biodegraded and/or biotransformed through aquatic microbes (Matthiensen et al., 2000). Burkholderia sp. (strain CGE1002) is an agriculturally important bacteria contributing to nitrogen fixation (Ormeño-Orrillo et al., 2012). This bacterium possesses MCYST degrading MlrC-like protein. Several 3D structure modeling based studies has been conducted to yield suitable and wide spectrum application for the investigation of mechanisms of degradation (Suresh et al., 2008), pollutant interactions (Srivastava et al., 2011;Librando and Pappalardo, 2012) and biotransformation of MCYST-LR (Jones et al., 1994;Lam et al., 1995). However, the three-dimensional configuration of MlrC-like protein or its interaction with MCYST-LR has not been worked out, although it could lead to valuable information related to the molecular structure, role and efficient binding with the protein.
Theoretical approaches have been applied for structure prediction (Suresh et al, 2015). We report 3D structure of MlrC-like protein implementing a combined in silico approach like threading and abinitio for identification via composite modeling. The interaction and identification of possible binding sites of MlrC-like protein of Burkholderia sp. with MCYST-LR was prediction through the docking perspective and probable interactions with binding sites were also recognized.

Materials and Methods
Sequence retrieval and 3D structure prediction Amino acid sequence of MCYST degrading MlrC-like protein (GenBank Accession no: ADG18989) of Burkholderia sp. strain CCGE1002 was taken from NCBI (http://www.ncbi.nlm.nih.gov). Primary properties of this protein sequence i.e., total number of composed amino acid residue, molecular weight, theoretical pI, entire amino acid content, number of residues with negative and positive charges and atomic composition were computed using Expasy's ProtParam server (Wilkins et al., 1999).
An adequate template was required for comparative modeling of MlrC-like protein. For this purpose, protein sequence of MlrC-like protein was aligned against Protein Data Bank (PDB) (Berman et al., 2000) through BLAST (Altschul et al., 1990). Default parameters were used during alignment (algorithm: Blastp; expected threshold: 10; matrix: BLOSUM62; word size: 3; filter low complexity regions). No single template was identified to provide full length query coverage.
For 3D structure prediction of full length protein, I-TASSER server (Zhang, 2008) was used. Score for template modeling i.e., TM-score was numerated via following formula for assessment of the identity between target and template protein structures in terms of topology (Zhang, 2008): Here, L represents length of the protein under investigation; d i is distance of the ith pair of the residues among two structures; the scale is used for the normalization of the TM-score in a manner so that the extent of the average TM-score for particular protein pairs is not influenced by the protein size.

Model Validation
Estimation of model quality is a crucial step in computational protein modeling as the accuracy of a model is responsible for its appropriateness for diverse biological and biochemical experimental purposes . Confidence score was calculated for the structure predicted from I-TASSER server for quality estimation (Zhang, 2008). Confidence score is calculated on the basis of threading template alignments significance and convergence parameters employed in the structure assembly refinement simulations of I-TASSER (Zhang, 2008). In addition to this, modeled structure was analyzed for energy minimization followed by complete assessment of model quality with Qualitative Model Energy Analysis (QMEAN) server (Benkert et al., 2009). The qualities of model was checked with the help of Errat (Bowie et al., 1991), VERIFY3D (Eisenberg et al., 1997) and Ramchandran plot (constructed using RAMPAGE server (Lovell et al., 2002)).

Docking Studies
Docking studies were carried out through AutoDock4.2 with AutoDockTools (ADT) to explore the residues of the MlrC-like protein involved in interaction with MCYST-LR toxin. As the binding site position of MCYST-LR was not known, thus blind docking approach was used for docking, so that entire surface of the protein can be analyzed. For this purpose, very large grid map (X = 126, Y = 126, Z = 126) was constructed possessing the utmost number of points in every dimension (Hetényi and van der Spoel, 2002;. Macromolecule structure and ligand (MCYST-LR) were created for docking studies in accordance of different parameters, like addition of polar hydrogen molecules, merging non-polar hydrogen molecules and defining the rotatable bonds. For the ligand MCYST-LR, the Gasteiger charge was defined and further nonpolar hydrogen atoms were converged. In last, Kollman united atom charge were assigned and Lamarckian genetic algorithm (GA-LS) was selected for the best conformers identification.

Physicochemical Properties
Physico-chemical properties of target protein i.e., MlrC-like protein of Burkholderia sp. strain CCGE1002 as computed using Expasy's ProtParam server (Wilkins et al., 1999) showed the molecular weight 54390.1, number of amino acids 511 and theoretical pI 5.86 (Table 1).

Structure Prediction and Validation
BlastP search against PDB disclosed the unavailability of template with >50% sequence identity. As the modeling confidence deteriorates below 50% sequence identity and below 25% identity, confidence in accuracy is poor (Kopp and Schwede, 2004) therefore, we used I-TASSER server which implemented composite modeling approach for structure determination of MlrClike protein of Burkholderia. Target protein structure was modeled by means of restraints from PDB template 3iuuA. The whole structure prediction process included identification of template, structure reassembly, atomic model production and selecting the best model (Zhang, 2009). The PDB template 3iuuA composed of two domains (i) domain of unknown function (DUF1485) and (ii) MlrC C-terminus domain. Domain search revealed the presence of both of these domains in MlrClike protein of Burkholderia (Table 2).
The calculation of a structural alignment between two protein structures is crucial step in protein modeling. Rather than sequence alignment methods, structure alignment approaches focus specifically on improving the structural resemblance of input proteins . C-alpha atom root-mean-square deviation (RMSD) was is calculated via the superposition among the model of MlrC-like protein and crystal structure of threading template putative metallopeptidase from Mesorhizobium (3iuuA) was 0.49 Å (Fig. 1).
The PDB models which were most close (in terms of structure) to the modeled protein along with TM-score for their structural alignment are listed in Table 3. TMscore >0.5 represents a model with accurate topology while TM-score <0.17 signalizes random similarity. TMscore of our MlrC-like protein model was 0.93±0.06 which reflects enhanced structural similarity of the target sequence with the templates. TM-score is a sensitive scale for evaluation of the quality of protein structure templates (Zhang and Skolnick, 2004) and the Mlr-C like protein fits well with the quality parameters. I-TASSER provided C-score i.e., confidence score for evaluation of the quality of predicted models. C-score lies in a range of -5 to +2, where higher C-score value represents a model with high confidence and vice-versa (Sitao et al., 2007) and the models with C-score > -1.5 possess correct fold (Roy et al., 2010). The calculated Cscore for predicted 3D structure of MlrC-like protein was 1.53 which favored the model.
The distorted geometries of protein models can be repaired with energy minimization procedure by movement of atoms for releasing internal constraints. Energy minimization was performed using Chimera (Pettersen et al., 2004) with default parameters (steepest descent minimization algorithms, maximum number of minimization steps (i.e., 100), step size 0.02Å and update interval 10). In the minimization procedure, we could not find the improvement in model quality as 10 residues from most favored regions moved in additional allowed regions while there was no any change in residue information of other generously allowed and disallowed regions of Ramachandran plot. Thus, the model prior to minimization was used for further analysis. The Ramachandran plot drawn through the RAMPAGE (Lovell et al., 2002) validated the model with 84.7% residues in favored region, 9.0% residues in allowed area and 6.3% residues in outlier area (Fig. 2). ERRAT is specifically used algorithm for verification of protein structure and assessment of model building and refinement. This algorithm is found on the investigation of statistics of non-bonded interactions among diverse atom types and helps in taking decisions with consistency. After the errat, the overall quality factor was 61.034. Results obtained from VERIFY3D revealed that 92.17% residues possess an average 3D-1D score higher than 0.2 which reflects a finely built model as most of the residues are suitable in their folded conformation.
Moreover, for approximation of the quality of predicted structure comprehensively, we employed QMEAN server (Benkert et al., 2009) that employs a composite scoring function QMEAN Z-score (Benkert et al., 2008;2009) as an approximate of the total quality of a model by connecting it to experimentally solved reference structures of like sizes (model size +/-10%) present in PDB (Berman et al., 2000). The calculated QMEAN Z-score of our model was -1.06 that represents reasonable homology model with correct folds.

Docking Studies
The chemical structure of oligopeptidic toxin MCYST-LR (PDB: 1LCM) is shown in Fig. 3. Docking analysis was performed by Autodock4. While molecular docking, sampling and scoring, both are equally important components. All the potential binding modes of particular molecule are sampled in accordance to the other molecule. Conformational adjustments may be contemplated while the sampling procedure (Huang and Zou, 2014). After scoring of docking results, it was identified that MCYST-LR showed greater binding affinity with MlrC C-terminus domain of MlrC-like protein of Burkholderia.
The complex with best docking score is envisaged in Fig. 4 and different energies determined by AutoDock4 for the best pose is listed in Table 4. It was observed that residues Leu353, His381, Leu382, Ser383, His384, Arg386, Arg388, Asn427, Agr430, Ala431, Ala432, Gly433 and Glu435 of MlrC-like protein were in close contact with MCYST-LR (within 4.5 Å) among which three residues HIS381, ARG430 and GLU435 forms the critical hydrogen bonds with MCYST-LR toxin ( Fig. 4 and Table 5).

Conclusion
Protein structure prediction approaches are not expected to substitute protein structure determination experimentally but can contribute significantly for overcoming the gap among number of sequences and structures available in respective databases. We predicted 3D structure of MCYST-LR degrading protein of Burkhoderia sp. strain CCGE1002. Structure prediction was done by threading based modeling since no close homolog was available for most of the targets. Predicted structure was further analyzed by different computational approaches for stability, efficacy, binding site prediction and docking with MCYST-LR. The protein structure assessment results provided the evidence that the predicted model is stable and accurate. Results suggested that the MlrC-like protein of Burkholderia sp. strain CCGE1002 can be used as significantly potential target for modeling future inhibitors for biological degradation and/or biotransformation of MCYST-LR from freshwater bodies.