Target Identification in Ory S1 Pollen Protein Allergen from Oryza sativa in the Course of Construction of Hypoallergenic Vaccines

Problem statement: Recombinant-based approaches are mostly focused on genetic modification of allergens to produce molecules with reduced allergenic activity and conserved antigenicity, such as hypoallergens. Recombinant al lergens represent promising tools for diagnosis and therapy of type I allergy. This approach was probably feasible with every alle rgen with known amino acid sequence. Approach: The primary aim of this study was to determine the consensus epitope from twenty homologous protein sequences of Ory S1 aller genic protein sequence from Oryza sativa (indica group) pollen. Molecular modeling calculations had been used to investigate the aller genic protein models for the epitope. Results: Oryza sativa (japonica), Phleum pratense, Poa pratensis, Holcus lanatus, Lolium perenne, Triticum aestivum, Dactylis glomerata and Zea mays were found more closely related (alignment score 1145-812) amo ng all the homologs and investigated further. The major binding pocket comprised an area of 604.5 Å 2 and 970 Å volume and another key binding pocket had 425.6 Å2 area and 658.8 Å3 volume. The r esidues found in the key site included ile2, lys13, cys14, ser15, lys16, pro17, ala25, leu26, ile27, ty r40, his41, phe42, asp43, leu44, ser45, gly46, leu4 7, ala48, met49, ala50, asp55, leu58, arg59, ala61, gl y62, ile63, ile64, asp65, gln67, phe68; correspondi ng to the allergen binding site and the IgE binding ep itope given in the title. Conclusion: These are the functional sites on the allergenic proteins that ca n be mutated to develop hypoallergenic vaccine. The se sites can be rationalized on the basis of simple ar guments that lead to vaccine development, by predicting the structure of the allergenic epitopes and comparative analysis.


INTRODUCTION
An allergen simply means a harmful immune response elicited by an antigen that is not itself intrinsically harmful. Grass pollens are well known among the health hazardous bio aerosols causing respiratory allergy. Being an important member of the grass family, the rice plants contribute a huge pollen load in agricultural fields during flowering.
Oryza sativa is the cultivated rice, used as staple food by majority of world's population. Pollen allergens of Oryza sativa is recognized by the International Union of Immunological Societies (IUIS) official list of allergens [1] which include Ory S1, Ory S7 and Ory S12. Protein Ory S1 has been validated as an allergen on the basis of its recognition by IgE antibodies from allergic individuals [2] .
Allergenic site identification can be explained as the residues found in the binding pocket and in IgE binding epitope. Most allergens contain multiple motifs though all the motifs might not prove to be good targets. The target motif must be selective in terms of IgE binding epitopes [3] . The knowledge of molecular nature of allergen-antibody interactions is important to understand the mechanism of conventional immunotherapy, as well as to design alternative immunotherapeutic strategies [4] . The allergy process has been widely studied in last few decades, enhancing the better understanding of allergic problem that affects varied age group.
Vaccination is the most effective technique suggested nowadays for allergy prevention. The molecules developed for vaccination against allergy possess significantly reduced allergenicity in terms of IgE binding and therefore will not lead to anaphylactic reactions upon injection. This approach is probably feasible with every allergen with known amino acid sequence; irrespective of the source (pollen, food, mites) from which it may be derived [5] . Also, the products of agricultural biotechnology should be subjected to a careful and complete safety assessment for its allergenicity before commercialization. The identification and validation of protein allergens have become more important nowadays as more and more transgenic proteins are introduced into our food chains.
We need to look for the Immunoglobulin Epsilon (IgE) epitopes to confirm the allergy response of any allergenic or transgenic protein. If the bioinformatics methods are standardized and optimized, it may lead to complete exploitation of the transgenic food and for the identification of targets to create hypoallergenic vaccines. Moreover, many attempts have been well documented to predict allergenicity of a query protein by its amino acid sequences [6] .
The present study was intended to obtain homologous sequences for Ory S1 allergenic proteins, analyze the homologous protein sequences for allergenic sites in sequence and to validate the obtained targets, pertaining to effective identification of consensus epitope and distinct sequence features.

MATERIALS AND METHODS
Sequence retrieval: Ory S1 Protein (query protein) sequence from Oryza sativa was retrieved from NCBI database [7] . Basic local alignment search tool was used to retrieve the homologous sequences by querying against non-redundant database (nr).
Pair wise sequence alignment: Dynamic programming algorithm was used to align individual homologous sequence(s) with the query protein sequence to find the proximity. LALIGN program which implements Huang and Miller algorithm was utilized for this study [8] .
Multiple sequence alignment: Nine close homologous sequences were selected for further study based on pairwise alignment score. ClustalW, a neighbor joining algorithm based tool was used to find the Multiple Sequence Alignment (MSA) [9] and a consensus sequence was prepared based on the MSA. Domain and motif detection: Domain positions of the sequences were identified using InterPro, an integrated resource for protein families, domains and sites that combines a number of databases (referred to as member databases) using diverse methodologies and a varying degree of biological information on well-characterized proteins to derive domain positions in the protein sequence [10] . Multiple Em for Motif Elicitation (MEME) was also used to find common motifs found in the MSA [11] .
Protein homology modeling: All 20 allergen homolog sequences and the consensus domain sequences were modeled using homology modeling method using same template (1n10A). MODELLER, a homology modeling tool which implements comparative protein structure modeling by spatial restraints, was utilized for the present study to construct protein models [12,13] .
Binding pocket exploration: CASTp server, which used weighted Delaunay triangulation and the alpha complex by shape measurement of the domain structure was used for the present study to locate the binding pockets [14] . This tool measures analytically the area and volume of each pocket and cavity, both in solvent accessible surface (SA, Richards' surface) and molecular surface (MS, Connolly's surface). The obtained pockets were validated based on their functional significance and its contribution to the essentiality and allergenicity of the organisms. Finally, the target's functionality significance and its role in IgE interactions were validated in the light of literature search.

RESULTS
BLAST search was performed to fetch significant homologous sequences of Ory S1 and were short listed for comparison (Table 1). Pairwise alignment of query sequence (Ory S1) with the individual short listed homologs was performed and the results were tabulated ( Table 2). The three Dimensional structures were modeled for the homologous sequences and were validated for plausibility. The RMSD value for all the structures with that of the template (1n10A) was calculated to elucidate 3-Dimensional homology ( Table 3). Based on RMSD score and similarity score, the most homologous sequences with good modeled structure were selected for target identification. The conserved residues in the domains of the structures were identified (Table 4) and the property of the conserved domains were analyzed (Fig. 1). Based on position-specific probability matrix ( Fig. 1) the probabilities of each possible amino acid letter appearing at each possible position in the conserved domain were elucidated. From all these sequences selected, subsets of highly conserved residues were retrieved and a multiple sequence alignment was performed to get the consensus sequences (Fig. 2). Furthermore, the binding analysis were also done all the structures and the potential pocket was identified based on Castp rating (Fig. 3).Moreover, the residues spanning the potential pocket were tabulated in Table 5.

DISCUSSION
The homology search for ORY S1 fetched 20 homologous Sequences ranging from grasses to higher plants ( Table 1). The three Dimensional structures were modeled for the homologous sequences and were validated by RMSD value with that of the template (1n10A) was found between 0.30 and 1.11. Hence, indicative of the plausible model obtained (Table 3). Based on RMSD score (0.30-0.47) and sequence similarity score (1145-812) ( Table 2), nine most homologous sequences with good modeled structure were selected: Oryza sativa (japonica), Phleum pratense, Poa pratensis, Holcus lanatus, Lolium perenne, Triticum aestivum, Dactylis glomerata and Zea mays for target identification. Conserved domains of the structures selected showed identity from position 86-164 (Table 4) in all these sequences which confirms the evolutionary relatedness. These active domains were reported to be of DPBB_1 domain (a conserved region from rare lipoprotein A (RlpA) that has the Double-Psi Beta-Barrel (DPBB) fold. Based on position-specific probability matrix derived (Fig. 1) the probability of each possible amino acid letter appearing at each conserved position was identified and it shows high conservation of cysteine residues and hence confirming the well documented role of disulfide bonds formed between cysteines in IgE binding and in several other allergens [15,16] . Consensus derived from the subset (82-154 residues) (Table 4)  . The consensus domain was modeled based on homology with (1n10A) as template. Structure for the domain position showed a total of six binding pockets, out of which the first major pocket had an area of 604.5 Å² (Fig. 3) and 970 Å³ volume,indicative of sufficient volume for antigenantibody interactions. As these are common pocket from nine structures, the outcome of this strategy can be used to identify common allergenic epitopes for similar structures at a single stretch. The role of conserved cysteines (Table 5) shall also play a major role in determining the allergenicity and this falls in line with the previously documented studies [15,16] . Hence, the procured consensus region shall be utilized for effective vaccine design against food allergens.

CONCLUSION
Consensus epitope identification using the available allergenic region has geared up the swiftness to fulfill the demands of the patients. If bioinformatics approaches are standardized and optimized, it can be used for the rapid identification of potential antigenic regions to develop hypoallergenic vaccines. The present study shows that allergenic epitopes have some common amino acid conservations in the allergenic domain positions: 82-154, which are conserved with hydrophobic residues. Amino acids cys, lys, gly, ser and pro was found more conserved in the allergenic motif as well as in binding pocket. Cysteine residue highly conserved at four positions shall play a crucial role in antigenicity. We present the results of this study to the medical community for vaccine development.