In Silico Characterization and Comparative Analysis of Allergenicity of Allergic Proteins from Different Food Sources

Corresponding Author: Sourav Chakraborty Department of Genetic Engineering and Biotechnology, Shahjalal University of Science and Technology, Sylhet-3114, Bangladesh Email: souravgeb91@gmail.com Abstract: Allergy is a steadily increasing health problem for all age groups. In general, it’s recommended that 10-35% of our daily calories come from protein. A complete protein source is one that provides all of the essential amino acids called high quality proteins. Although animalbased foods; for example, meat, poultry, fish, milk, eggs, are considered complete protein sources these are the common sources of causing allergenicity. In case of meat Bos taurus, prawn Penaeus monodon and egg Gallus gallus are found to be the most responsible for triggering allergenicity. The current study has disclosed the best alternative sources for meat, prawn and egg through in silico characterization and comparative analysis of allergic proteins (Myoglobin Ovomucoid, Lysozyme, Ovalbumin, Ovotransferrin, Tropomyosin) with other common sources of meat (Capra hircus, Ovis aries, Gallus gallus, Sus scrofa), prawn (Fenneropenaeus merguiensis, Macrobrachium rosenbergii, Metapenaeus ensis, Pandalus borealis) and egg (Anas platyrhynchos). Analyzing the results we found that Gallus gallus, Macrobrachium rosenbergii, Anas platyrhynchos would be the safe source for meat, prawn and egg respectively.


Introduction
The term food allergy is used to describe an adverse immune response to foods (Johansson et al., 2004). Allergy is a steadily increasing health problem for all age groups in the United States. Food allergies, mostly against milk, eggs, peanuts, soy, or wheat, affect up to 8% of infants and young children (Sampson, 1999a;2005). A 2008 Centers for Disease Control and Prevention report indicated an 18% increase in childhood food allergy from 1997 to 2007, with an estimated 3.9% of children currently affected. Branum and Lukacs, (2008). One hypothesis is that this late onset may be the result of individuals being sensitized by long-term exposure to environmental factors that contain proteins similar to those in the known triggers of allergenic response (Sampson, 1999b;Vanek-Krebitz et al., 1995;Scheurer et al., 1999;Rabjohn et al., 1999). Recent studies have identified common molecular features of proteins from different sources, which could account for clinically important cross-reactivity (Breiteneder and Ebner, 2000;Jenkins et al., 2005) and sensitivity (Ferreira et al., 2004;Mari, 2001). Some common animal proteins from meat, egg, shrimp, cow's milk (Das et al., 2005) have been identified and characterized as major allergens. Myoglobin protein of mammalian meats causes allergy may be based on subtle changes of amino acids. Other cause may be a heat-resistant nature of the protein of 17 kDa that could be implicated in those patients that do not tolerate well-cooked meat (Fuentes et al., 2004). The only major allergen (Pen a 1) identified in shrimp is the muscle protein, tropomyosin (Daul et al., 1994). Surprisingly, there is no report of M. rosenbergii allergy in any medical literature. From the anaphylaxis study at the Siriraj hospital, Thailand, there were subpopulations of shrimp allergic patients who developed anaphylaxis to freshwater shrimp but could tolerate seawater shrimp or vice versa (Jirapongsananuruk et al., 2007). The specific role of egg ovalbumin has been found in patients allergic to cow milk, casein, along with other two milk proteins immunoreacted with IgE antibody (Szabo and Eigenmann, 2000). Although infants can, in theory, be allergic to any food, one of the major food allergies is hen's egg (Du Toit et al., 2009). Most people who are allergic to hen's eggs have antibodies which react to one of four proteins in the egg white: Ovomucoid, ovalbumin, ovotransferrin and lysozyme (Platts-Mills and Ring, 2005). Fish allergy is one of the most common food allergies mediated by IgE antibody. Consumption of fish products could lead to symptoms like skin rash, dermatitis, urticarial, angioedema, gastrointestinal problems, diarrhoea, respiratory distress and even fatal systemic anaphylactic reactions (Pascual et al., 1992;O'Neil et al., 1993). Allergenic proteins that have been isolated from primary food sources, such as egg (Mine and Rupa, 2004) (ovomucoid (Mizumachi and Kurisaki, 2003;Mine and Zhang, 2002;Mine et al., 2003) and lysozyme (Fremont et al., 1997), shrimp and related species (tropomyosins Reese et al., 2002;Samson et al., 2004)). The present study is an attempt to analyze the componential and structural similarity among different allergic proteins present in different foods to compare allergenicity among different allergic protein and identify common motifs. The newly discovered sequence motif along with the analyses of the structures of these allergens will not only help in the understanding of structure-function relationship of these allergens but also in the identification of new allergic protein.

Materials and Methods
The amino sequences of 18 different allergic proteins were retrieved from the NCBI (http//www.ncbi. nlm.nih. gov/genomes/FLU/). The Protparam tool at ExPASy (http://www. expasy.org/tools/) was used to analyze physic-chemical properties of two proteins i.e., amino acid compositions in all the species under consideration. The SOPMA tool at ExPASy server was exploited for comparative secondary structure analysis. The protein sequences of meat allergens are aligned using the Clustal W2 program (http://www.ebi.ac.uk/tools/clustalw2). Phylogenetic trees were constructed by the neighborjoining method. The computer software of the Molecular Evolution Genetic Analysis (MEGA), version 5.2 was utilized in this study for phylogenetic analysis of selected sequences. The MEME (http://meme.nbcr.net/meme/) software was used to elect the motifs from different protein sequences.

Results
The 18 protein sequences of allergens were retrieved from NCBI. The accession number of retrieved sequences along with species names is listed in Table 1. Multiple Sequence Alignment using ClustalW2 are shown in three separate figures ( Fig. 1 to 3). The phylogenetic tree using neighbor joining mode revealed three major clusters of protein sequences (Fig. 4). Biochemical features for 18 allergic protein were obtained by using ProtParam are listed in Table 2. Secondary structure analysis were done by using SOPMA software and shown in Table 3. MEME analysis resulted in frequently observed 3 motifs (Table 4).

Discussion
The sequences were characterized for homology search, multiple sequences alignment, biochemical features, phylogenetic tree construction and motifs search using various bioinformatics tools. Multiple Sequence Alignment using ClustalW2 provided that the myoglobin protein of Bos taurus shows similarity score of 97.4 with Capra hircus and Ovis aries. The myoglobin protein of Gallus gallus and Sus scrofa shown similarity score of 72.73 and 88.31 respectively with Bos taurus (Fig. 1). The tropomyosin protein of Penaeus monodon showed similarity score of 100 with Fenneropenaeus merguiensis and with Macrobrachium rosenbergii, Metapenaeus ensis, Pandalus borealis it was ranges from 96.83-98.24 (Fig. 2). The egg allergic proteinsovomucoid, lysozyme, ovalbumin, ovotransferrin of Anas platyrhynchos and Gallus gallus shown similarity score of 73.68, 80.27, 76.68, 79.45 respectively (Fig. 3).
The phylogenetic tree based on protein sequences revealed three major clusters. Cluster 1, a cluster containing 5 sequences under study, includes meat allergens (Fig. 4).
Biochemical features for this cluster are listed in Table 2. The total number of amino acid residues was 154 with variable molecular weights. pI values of this cluster ranged from 6.63 to 7.96. Variations among various allergens in this group in terms of other physiochemical parameters like positively charged and negatively charged residues, hydropathicity (GRAVY) are given in Table 2. Aliphatic index analysis reveals uniformity in this group of allergens within the range of 79.87 to 86.88. Aliphatic index of protein measures the relative volume occupied by aliphatic side chains of the amino acids: Alanine, valine, leucine and isoleucine. Globular proteins with high aliphatic index have high thermo stability and an increase in aliphatic index increases protein thermo stability (Ikai, 1980;Rawlings et al., 2006).   Cluster 2 includes 5 protein sequences and represents shrimp allergen sequences. The total number of amino acid residues was in the range of 274 to 284 and the pI values range from 4.66 to 4.73. It has less variation in its pI as compared to cluster 1 sequences. Aliphatic index of this cluster sequences was uniform in the range of 77.08 to 79.85. Cluster 3 includes 8 protein sequences and represents egg allergen sequences. Various biophysical parameters for this group of sequences reveal amino acid residues ranging from 147 to 705, while pI value of the majority of sequences was in range of 4.67 to 6.50except for lysozyme [Gallus gallus] (9.36). Aliphatic index of this group of sequences reveals in the range of 64.0 to 81.70 and ovalbumin was the highest thermo stable allergen (90.18) among all three clusters. Secondary structure analysis exhibits that the instability index is used to measure in vivo half-life of a protein (Guruprasad et al., 1990). The proteins which have been reported as in vivo half-life of less than 5 h showed instability index greater than 40, whereas those having more than 16 h half-life (Rogers et al., 1986) have an instability index of less than 40. Instability index of allergic protein sequences under the study was found less than 40 (Table 2).
Secondary structure analysis demonstrated that the myoglobin protein of beef was more similar with goat and sheep meats. In case of shrimp Alpha helix ranges from 99.27-100.00% and Random coil is 0.70-0.73%, except CBY17558.1, ADC55381.4. Extended strand and Beta turn are absent in all the sequences of the cluster2. In egg allergen Alpha helix ranges from 11.70-45.85%, Extended strand 9.36-19.39%, Beta turn 2.86-12.24%, Random coil 76.02-29.25% (Table 3). MEME analysis provided that five amino acid residues of Cluster1 representing motif 1.Cluster 2, representing motif 2 in its sequences, it contains a 50 amino acid residues long unique motif. Motif 3 was present in 6 protein sequences representing cluster 3.

Conclusion
Comparing variation among biochemical features of myoglobin from different organisms we get that goat, sheep and pig meat are most similar with beef than chicken meat in causing allergenicity. Heat stability of an allergic protein is one of the most important reasons of becoming more allergic and it depends on the value of aliphatic index. The aliphatic index of myoglobin protein of Gallus gallus is lower than other four organisms. So, chicken meat can be a convenient alternative to beef allergic peoples. In case of shrimp the biochemical features of tropomyosin of Penaeus monodon shows more similarity with Fenneropenaeus merguiensis, Metapenaeus ensis and Pandalus borealis than Macrobrachium rosenbergii. However, the aliphatic index of tropomyosin of Macrobrachium rosenbergii is lower than the other. As for that Macrobrachium rosenbergii can be a suitable substitute to Penaeus monodon allergic peoples. The allergic proteins of hen's egg white are more thermo stable because of having comparatively high aliphatic index than duck. So, duck egg can be an appropriate alternative to hen's egg allergic infants. Beside these phylogenetic clustering and variation among biochemical features of different allergens might contribute in further classification of highly diverse allergens and their selection for various application purposes. Conserved sequences in motifs may be utilized for designing specific degenerate primers for identification and isolation of type and class of allergens as numerous allergens are being isolated to assure food security. Variation in biochemical features may be a key source of information for the screening of novel allergens and comparison with other classes of allergens. Functional attributes are needed to verify experimentally for conserved motifs found. This in silico analysis might be used for future genetic engineering of assuring food security.
for publishing our manuscript without taking necessary charges as because of the inhabitant of Bangladesh.

Author's Contributions
Sourav Chakraborty: Manuscript writing and result verification.
Nazmul Hasan: Biochemical features data preparation for 18 proteins using ProtParamsoftware.

Ethics
This article is original and contains unpublished materials. I'm Sourav Chakraborty-the corresponding author confirms that all of the other authors have read and approved the manuscript.