MOLECULAR CHARACTERIZATION ON THE BASIS OF HA, NA AND M GENE REVEALED CHANGES IN CRITICAL AMINO ACID POSITIONS OF INFLUENZA A (H3N2) VIRUS CIRCULATING IN INDIA DURING 2011-2013

Influenza A (H1N1) virus is responsible for acute r espiratory infection in human, which occasionally c auses epidemic and rarely a pandemic. Full length DNA seq uencing of HA, NA and M gene of H3N2 virus from year 2011-2013 was performed for determination of t he circulating strains in India along with monitori ng status of various mutations associated with drug re sistance and virulence. Genetic analysis of HA gene of study samples in comparison to reference strain sho wed maximum amino acid changes in epitopes of high neutralization efficiencies (antigenic sites A, B a nd D) along with gain of glycosylation sites at pos iti n 45 (except single sample) and loss at position 126 (7 samples). NA gene has four amino acid changes in antibody binding region at position S367N, K369T, S 332F and S334I along with gain and loss of glycosylation site at position 367 and 402 respecti v ly whereas no change was observed in its active s ite. While comparison of deduced amino acid sequences of study samples with Indian strains from earlier yea rs showed six amino acid changes in three antigenic si tes on HA protein and three amino acid changes in N A protein epitope sequence. All these changes indicat e that the H3N2 virus is undergoing antigenic drift and which may result in the emergence of new antigenic variants. We want to report that our study provided for the first time full length matrix gene sequence of influenza A (H3N2) virus from India.


INTRODUCTION
The influenza virus is responsible for human respiratory infections and is a source of seasonal epidemics and occasional pandemics. There are three types of seasonal influenza A, B and C. Influenza A viruses are further classified into different subtypes based on different combinations of two surface glycoprotein Heamagglutinin (HA) and Neuraminidase (NA) and till date 18 HA and 11 NA subtypes have been identified (Tong et al., 2013). Among the many subtypes of influenza A viruses, H1N1 and H3N2 subtypes are currently circulating among humans.
Influenza epidemic occurs yearly resulting in hospitalizations and deaths, primarily among high-risk groups. Worldwide, annual epidemics, result in about three to five million cases of severe illness and about 250,000 to 500,000 deaths (WHO, 2009). H3N2 pandemic of 1968 caused approximately one million deaths worldwide (Kilbourne, 2006).
Influenza viruses belong to family Orthomyxoviridae and its segmented genome consist of 8 negative sense Science Publications AJID single stranded RNAs, encoding 12 proteins. Influenza virus evolves and evades immune recognition by means of point mutations (antigenic drift) in HA and NA protein, resulting in epidemics or by reassortment of gene segments from different viruses co-infecting the same cell resulting in the generation of new strains, which may lead to pandemic like the 1918 Spanish flu (antigenic shift) (Knipe and Howley, 2007).
HA protein is a homotrimer each monomer of which consisting of HA1 and HA2 subunits and it is the major surface glycoprotein and antigenic determinant. HA1 forms the globular head domain which contains the receptor binding site for sialic acid receptor present on host cells, along with 5 antigenic sites (A-E) (Bush et al., 1999). HA1 mutates more frequently than HA2 and plays a major role in natural selection and antibody escape (Sun et al., 2013).
NA gene is encoded by segment 6. It is a tetramer of identical subunits (monomer) each containing an active site that is highly conserved across all influenza A and B viruses. NA catalyzes Neu5Ac receptor removal facilitating release of viral particle and spread of infection. The use of two NA inhibitors, oseltamivir and zanamivir, in the current clinical treatment of influenza A virus infection is based on their "receptordestroying" activity on the cell surface since oseltamivir carboxylate (Tamiflu) and zanamivir (Relenza) are sialic acid analogs that interfere with the sialidase activity of NA (Colman, 1992).

MATERIALS AND METHODS
Outbreak samples (Nasal and throat swabs in viral transport media) from patients with symptoms of fever, cough, sore throat, nasal catarrh or shortness of breath were collected and referred by clinicians from different regions of the country. Which were received at National Centre for Disease Control (NCDC), New Delhi, India. RNA was extracted from 140 µL samples using QIAmp viral RNA mini kit (Qiagen, Germany) according to manufacturer's protocol. Diagnosis of seasonal influenza virus infection was carried as per WHO guidelines by real time RT PCR on 7500 Real-Time PCR System (Applied Biosystems, USA) targeting influenza gene followed by type and subtype using HA and NA gene specific primers (WHO, 2011).
Full length sequencing of HA, NA and M gene was carried out using M13 tailed sequencing primer and process described earlier (Ghedin et al., 2005;Hoffmann et al., 2001) on GeneAmp PCR System 9700 (Applied Biosystems, USA). PCR products purification was done using QIAquick PCR purification kit (Qiagen, Germany). Purified PCR products were sequenced using Big Dye Terminator cycle sequencing ready reaction kit v.3.1 on 7900 thermal cycler and followed by sequencing PCR products purification using Centri-Sep™ Spin Columns (princeton separations, USA). Purified products were lyophilized and reconstituted in HiDi formamide before placing onto ABI 3130×l genetic analyzer for automatic sequencing (Applied Biosystems, USA). Raw sequences obtained were resolved and assembled using SeqScape software v2.6 (Applied Biosystems, USA). Nucleotide and protein sequence BLAST search was performed, using the National Centre for Biotechnology Information (NCBI, National Institutes of Health, Bethesda, MD), Basic Local Alignment Search Tool (BLAST) server at GenBank database (Altschul et al., 1997). Sequences for phylogenetic analysis were retrieved from Influenza resource database at (www.fludb.com website) and Influenza virus resource database at NCBI. Multiple sequence alignment was performed at FluDB using MUSCLE (Bao et al., 2008;Squires et al., 2012) and phylogenetic analysis performed on MEGA v6.0 using Maximum likelihood method and 500 replicate bootstrap support (Tamura et al., 2013).
Glycosylation sites are defined by the motif N-X-T/S, whereas, X is any amino acid except proline. NetNGlyc (http://www.cbs.dtu.dk/services/NetNGlyc/) was used to identify potential glycosylation sites in the HA and NA protein of influenza A (H3N2) virus (Gupta et al., 2004).
Comparison of the HA1 sequence of study sample (2011-2013) with earlier years sequence (shown in supp Table 2) from Indian showed 8 amino acid changes at position Q33R, S45N, T48I, T128A, N145S, A198S, V223I and N312S in HA1 region, of which 6 positions fall in antigenic sites: Three (45, 48 and 312) in site C, two (128, 198) in site B and one (145) in site A.
Mutations usually accumulate on five epitope neutralizing sites on HA 1 . Approximately 39% (17 of synonymous and 22 of non-synonymous type shown in Table 1) of mutations fall in one of the five antigenic sites (A-E)/RBS (receptor binding site)/epitope outside antigenic site (64 critical amino acids) of HA. Out of 39 mutations 25 were observed in all the samples and remaining fourteen mutations were present in most samples. Maximum ten substitutions were seen in antigenic site D followed by site C with 9 changes, site B with 7 changes, site O (verified epitope outside antigenic site) with 6 changes and 5 changes each in site A and RBS and no change in site E shown in Table 2 and Fig. 1.

Glycosylation Sites
Glycosylation sites are variable in both number and location on HA protein of different strains. Presence or absence of glycosylation site around the antigenic site plays a major role in antigenic variation, by masking or unmasking antigenic sites and changing the accessibility of antibody to antigenic sites. Study samples have variations in the glycosylation pattern of HA protein.
Total 13 glycosylation sites were identified at residue 8, 22, 38, 45, 63, 122, 126, 133, 144, 165, 246, 285 and 483. All sites were seen to be conserved in the samples except for the loss of glycosylation seen at position 126 in 7 samples and another at position 45 in a single sample. All positions except 483 have been reported in previous studies (Sun et al., 2013). Among these sites residues 45,63,122,126,133,144,165 and 246 were found to occur on antigenic sites and these changes may alter the ability of antibodies to bind antigenic site and may thus contribute to the generation of escape mutants.

Genetic Analysis of Neuraminidase (NA) Gene
NA exists as a homotetramer, each monomer of which containing an active site that is highly conserved across all influenza A and B viruses (Colman, 1992). Neuraminidase is a surface glyco protein and performs various functions like clearance of 'deceive' receptors within the respiratory mucin, decreased viral superinfection (Huang et al., 2008) and increased virus infectivity (Goto and Kawaoka, 1998). However, mutations D93G, S332F and S334I were only found in samples from 2013. D93G mutation was most common whereas S332F and S334I were only seen in two samples. Four amino acid changes were observed in antibody binding region of NA gene (Jing et al., 2012;Gulati et al., 2002;Colman et al., 1983) of which two mutations at position S367N and K369T were seen in all the samples, while other two at position S332F and S334I were noticed only in two samples.
In total, eight potential glycosylation sites were identified in NA protein at position 61, 70, 86, 146, 200, 234, 329 and 367. Overall loss of glycosylation site at Science Publications AJID position 402 and gain at position 367 was observed comparison to the reference strain.
Limited sequences were available from earlier years for NA gene from India and on sequence comparison of this sequences with study samples showed amino acid changes at 8 positions L81P, D93G, E221K, V263I, S332F and S334I of which position 221, 367 and 369 were part of the epitope which are recognized by antibodies (shown in Table 3).

Genetic Analysis of Matrix (M) Gene
Segment 7 of the influenza virus genome encodes for the two protein major M1 matrix protein consisting of 252 amino acids. A second matrix protein, M2, is encoded by a separate ORF of segment 7. M2 protein forms pH-gated proton channels in the viral lipid envelope (Lamb et al., 1994) which is the target of M2 inhibitors, amantadine and rimantadine. Single amino acid substitutions in the M2 protein of the virus at positions 26, 27, 30, 31 and 34 have been reported to confer drug resistance to them (Hay et al., 1986). For sequence analysis A/New York/392/2004 (NC-007368.1) was used as the reference strain. Amino acid changes seen in M1 protein were V219I in all samples and S207N in one sample. Amino acid changes seen in M2 protein were S31N in all samples while L54F in a single sample and G34E in four samples. Molecular marker reported to be associated with amantadine drug resistance, Serine to arginine (S31N) mutation was seen in all the samples. Phylogenetic analysis by maximum likelihood method of the M gene shows maximum homology to the Texas 2012 strain as shown in Fig. 4. Only five partial (~300bp) M gene sequence were available from India before this study, which on comparison with study samples displayed no substitution.  45 48 50 57 62 75 83 121 124 128 131 133 135 137 140 142 144 145 155 156 157 158 159 172 173 186 189 192 193 196 197 198 202 212 222 223 225 226 227 262 276 278 312

DISCUSSION
Influenza viruses can efficiently escape from host antibodies by means of accumulation of mutations in their surface glycoproteins HA and NA (antigenic drift) or by the introduction of new subtypes of these glycoproteins through gene segment reassortment (antigenic shift) and it has been observed that antigenic drift occurs more frequently in H3N2 viruses than any other seasonal virus, which is reflected by the number of times H3N2 vaccine strain has been updated in comparison to H1N1 and influenza B virus (Sun et al., 2013).
We have reported 22 amino acid changes in five antigenic sites with the maximum changes in site D and no change in site E, along with 5 changes in RBS. It has been suggested that amino acid changes occurring in epitopes of high neutralization efficiencies (i.e., epitope A, B and D) are associated with antigenic drift, rather than those in epitope of low neutralization efficiencies (i.e., C and E) (Sun et al., 2013;Ndifon et al., 2009). It has been also been proposed that a minimum of four substitutions in two or more antigenic sites of HA protein were required for an epidemically important strain (Huang et al., 2011). This indicates that the virus is undergoing antigenic drift as the present study has observed most substitution in antigenic site A, B and D in comparison to zero substitution in site E of HA protein and which is reflected by the changes in vaccine strain since 2004.
Phylogenetic analysis of study samples on the basis of HA1 gene region formed separate cluster for samples from each year. Samples from year 2011 falling with A/Singapore/GP1684/2011 strain. Samples from year 2012 form one major cluster with A/Ontario/006/2013 strain and one minor cluster with A/Texas/50/2012 the current vaccine strain. However, samples from year 2013 form two major clusters: One with A/Kenya/254/2013 strain and another with A/Florida/3476/2013 strain (shown in Fig. 2).
HA gene of study samples had 10 amino acid changes (R33Q, S47P, N128T, R142G, N145S, V186G, P198S, F219S, K278N and E325K) in comparison to A/Texas/50/2012 vaccine strain (shown in Table 4). All study samples had glycine at position 186, serine at 198 and phenylalanine at 219 similar to strains from 2010 and 2011, while Texas/50/2012 strain has valine at position 186, proline at 198 and serine at 219 at these positions Five of the 10 amino acid changes seen (R33Q, S47P, N145S, K278N and E325K) were rare and seen only in 1-3 samples, which were also observed in 2011 strains from Singapore and Victoria and 2012 strains from New York and Kenya. While, all 2013 samples have R142G (all samples) and N128T (in eight samples) mutation similar to Ontario and Kenya strains from 2013. Despite all these changes A/Texas /50/2012 strain is effective in eliciting an immune response against majority of the circulating strains (ECDPC, 2013). But with emerging mutations in circulating strains we may need to update vaccine strain.
In NA gene, no changes have been seen in active sites reported to be associated with oseltamivir and zanamivir drug resistance. Gain of single glycosylation site at position 367 and loss at position 402 has been seen to occur. A similar pattern has also reported from Europe (INIMRL, 2014). NA gene of study samples had 6 amino acid changes (G93D, H150R, E221K, V263I, S332F and S334I) in comparison to A/Texas/50/2012 vaccine strain (shown in Table 5 Fig. 3).
Matrix gene was the first gene to be targeted for diagnosis of any type or subtype of influenza virus infection and it's also the target of M2 ion channel blockers like Amantadine. No changes were seen in the region targeted by diagnostic primers but all samples had S31N mutation reported to be associated with amantadine drug resistance (Hay et al., 1986). Phylogenetic analysis of samples from 2013 clustered with A/Texas/50/2012 vaccine strain (shown in Fig. 4).
Sequencing data available from India showed continues circulation of influenza A (H3N2)    Comparison of deduced amino acid sequence of HA1, NA and M gene between previous year's sequences from India and study samples showed changes in antigenic sites or epitope binding site. Analysis of HA1 sequence showed 6 amino acid changes in 3 antigenic site (A, B and C), while NA protein showed 3 changes in epitope binding site.

AJID
In the present study majority of mutations seen in the mature HA protein has also been reported from European countries. On the basis of this, all samples studied belong to group 3C. Most 2012 and 2013 samples with (Q33R, N145S and N278K mutations) seem to fall in 3C.2 sub-group while 2013 samples with (T128A, resulting in the loss of potential Science Publications AJID glycosylation site and R142G mutations) appear to belong to 3C.3 sub-group (ECDPC, 2013). Mutations like A198S, V226I and N312S (according to HA 0 numbering, A214S, V239I and N328S) in HA protein and L81P in NA protein seen in present study were also observed in study performed on 2011-12 samples in Southern China and these mutations were similar to A/Perth/16/2009 influenza vaccine strain for year 2010-2011 (Zhong et al., 2013). Two mutation were seen for the first time W427R (2 samples from 2013) and G479A (5 samples from 2013), while R498K mutation (9 samples from 2013 and 1 sample from 2012) was rare in database sequences. Mutations were part of HA2 region and their possible effect on HA protein function is yet to determine. The glycosylation pattern in the HA protein of influenza virus is similar to the pattern reported previously. All sites are conserved in samples except for the loss of glycosylation observed at position 126 in 7 samples and another at position 45 in a single sample. All positions except 483 have been reported in previous studies (Sun et al., 2013). The study shows a gain of the glycosylation sites in the antigenic region since 2004. This gain of glycosylation sites along with changes in the antigenic sites contribute towards the antigenic escape of virus and the evolution of antigenic variants (new Strains).

CONCLUSION
On the basis of the present study, we can conclude that the influenza A (H3N2) strain circulating in India is similar to the strains reported from Europe and the other part of Asia (ECDPC, 2013;WHO, 2013). It also explains the change of a vaccine strain for year 2013 (from A/Perth/16/2009 to A/Victoria/361/2011 strain) and 2014 (from A/Victoria/361/2011 to A/Texas/50/2012 strain). In comparison to strains circulating in Europe, there was a single change in antigenic site A of HA1 subunit at position 143 (R143G). Mutation was also observed in 2013 sequences from Kenya and Ontario (Canada). This is for the first time full length sequence of M gene for Influenza A (H3N2) was submitted from India. The present study showed that both HA and NA gene have acquired lots of nonsynonymous mutation in antigenic site/epitopes and indicating the role of antigenic drift to antibody escape mutants and antigenic variant. These changes indicate that H3N2 virus is undergoing antigenic drift and emphasizes the requirement of continuous monitoring of circulating influenza viruses for preparedness of any unforeseen epidemic or pandemic.

ACKNOWLEDGEMENT
Author Sachin Kumar acknowledges financial support of the Council for Scientific and Industrial Research (CSIR), Delhi, India, during the course of study and thanks the staff of Division of Biotechnology and Microbiology for their support.

Database
Nucleotide sequence data reported are available in the GenBank databases under the accession number (s) "NCBI: KF952356 to KF952430".

Conflict of Interest
The authors declare that they have no conflict of interest.