High-Order Interactions in Rheumatoid Arthritis Detected by Bayesian Method using Genome-Wide Association Studies Data

Problem statement: In order to reveal the missing genetic component of Rheumatoid Arthritis (RA) susceptibility, we carried out a genome-wide h igh-order epistatic interaction study for RA. Approach: A powerful Bayesian strategy was applied to analyz e the data of Genome-Wide Association Studies (GWAS) from the Welcome Trust C ase Control Consortium (WTCCC), where 319 high-order interactions were found across the w ole genome and many of which were validated by the GWAS data from the North American Rheumatoid Ar thritis Consortium (NARAC). Results: This is the first study intensively searching for high-o rder epistatic interactions genome-widely for RA. Conclusion: Our results suggest that high-order interactions m ight explain a big proportion of missing genetic component of RA. In the meanwhile, synapse, calcium ion binding and membrane part likely have interactive associations with RA. This finding mplies that not only autoimmune system but also nervous system can play an important role in RA.


INTRODUCTION
Rheumatoid Arthritis (RA) is the most common chronic inflammatory autoimmune disease that leads to progressive joint destruction with the prevalence up to 1% in adult populations (Gabriel, 2001;Hu et al., 2011).Twin studies estimated that genetic factors contribute to 60% of the susceptibility of RA (MacGregor et al., 2000).Multiple loci associated with RA susceptibility have been identified by genome-wide linkage and association studies (Cornelis et al., 1998;WTCCC, 2007;Plenge et al., 2007;Stahl et al., 2010).However, current findings only account for a small portion of the genetic component to the RA susceptibility (Stahl et al., 2010;Raychaudhuri et al., 2008;Imboden, 2009), while the most recent GWAS findings are ethnicity-specific (Terao et al., 2011;Freudenberg et al., 2011;Julia et al., 2008).In order to find the missing heritability of RA, it is critical to study epistatic interactions in which genetic variations may show weak marginal penetrance, but may interact with each other in complex ways (Manolio et al., 2009;Wu and Zhao, 2009).The complicated interacting structures likely exist in the pathogenesis of RA due to the sophisticated regulatory mechanisms encoded in the human genome.
This study provides the first genome-wide highorder interaction analysis for RA using Bayesian epistasis association mapping (BEAM and BEAM2) methods (Zhang and Liu, 2007;Zhang et al., 2011).Specifically, BEAM uses Markov chain Monte Carlo (MCMC) to 'interrogate' each marker conditional on the current status of other markers iteratively and has been proved to provide a higher statistical power than many commonly used interaction-mapping methods in GWAS (Culverhouse et al., 2004;Cook et al., 2004;Ritchie et al., 2001;Nelson et al., 2001).Furthermore, to capture the block-wise structure of human genome, Am.Med. J. 3 (1): 56-66, 2012Zhang et al. (2011) extended BEAM model to BEAM2, the latter incorporates LD blocks into the original Bayesian partition model.BEAM2 is able to simultaneously infer haplotype-blocks and select SNPs within blocks that are associated with the disease, either individually or through epistatic interactions with other SNPs across the genome.BEAM2 has shown great success in studying type-1 diabetes (Zhang et al., 2011).We applied BEAM and BEAM2 to the RA data from WTCCC (2007) and found fruitful interesting interacting structures for RA, many of which were validated by NARAC data (Plenge et al., 2007).In particular, our results show that not only autoimmune system but also nerve system plays an important role in genetic mechanisms of RA.

Analysis strategy:
Figure 1 shows a flow chart for our analysis strategy.We first applied BEAM2 (Zhang et al., 2011) to analyze the WTCCC RA data on each chromosome.BEAM2 is a sophisticated method that takes care of LD block structure while searching for epistasis.In the second step, we took the advantage of BEAM (Zhang and Liu, 2007) as a more efficient tool to search for genome-wide high-order interactions.In particualr, we pooled all BEAM2-selected candidate SNPs with posterior assocition probabilities > 0.5 from 22 autosomal chromosomes and ran the Bayesian variable partition model to search for high-order interactions across all the 22 chromosomes.Finally, these identified high-order interactions were validated by NARAC data (Plenge et al., 2007).

Data description:
The RA data set from the WTCCC (2007) contains 1999 RA patients, 1504 controls from the 1958 Birth Cohort (58C) and 1500 additional controls from National Blood Service (NBS).We removed all SNPs from the sample if they have genotype scores less than 0.9 in more than 20 individuals within each group of RA, 58C, or NBS.In addition, we removed all non-polymorphic SNPs, SNPs violating Hardy-Weinberg Equilibrium at a Bonferroni adjusted 0.05 level and SNPs with bad clustering quality according to the WTCCC summary report (WTCCC, 2007).After SNP filtration the dataset contains 301,653 high quality SNPs.
From the 90 unique SNPs involved in the 319 interactions identified above, we retrieved 18 SNPs from the RA GWAS data from NARAC (Plenge et al., 2007) that have good genotype data quality: Hardy-Weinberg Equilibrium P-values > 0.001, minor allele frequencies > 0.01 and the SNP-and subject-missing rates < 10%.The final data contain 2,002 subjects (862 cases and 1,140 controls).The missing genotypes were eliminated at each test on site.

Search
for chromosome-wise high-order interactions: Using BEAM2 to analyze each autosomal chromosome individually, we obtained the chromosome-wise posterior probabilities of SNPs associated with RA, which are shown in Fig. 2. It is clear that MHC region on Chromosome 6 strongly associates with RA: there are 72 SNPs located in MHC region with posterior probabilities > 0.5.At the same time, we also identified many associated SNPs outside MHC region: 85 SNPs across autosomal chromosomes except chr16 and chr21.Table 1 shows that our results are highly consistent with the original paper (WTCCC, 2007) on detecting single SNP effects.
Contr The full versions of Table 2 and 3 are given in supplementary files bestEpistasesFromWTCCC.txt and bestEpistasesFromWTCCC-SNPs.txt,respectively.From Table 2 and 3, we can see that most inter-chromosomal high-order interactions are among chr1, chr3, chr6 and chr9.A lot of them are more than 4-way interactions with very high Disease Relative Risk (DRR).Many protective diplotypes with DRR < 0.8 can also be found in Table 2. Diplotype frequencies in two control populations, 58C and NBS, are highly consistent, suggesting that these frequency estimations are stable.
Validation of interactions using NARAC data: Using NARAC data (Plenge et al., 2007) we sought to validate the high-order interactions identified by the WTCCC data.From the 90 unique SNPs involved in the 319 interactions, we retrieved 18 SNPs from NARAC data.For each supplementary table completetableX.txtobtained from the WTCCC data, whenever at least two NARAC SNPs are available for this interaction, we generated a supplementary table NARAC-completetableX.txt to list the diplotype frequencies and the P-values in the similar format.The supplementary file bestEpistasesFromWTCCC-NARAC.txtsummarizes the significant NARACvalidated high-order interactions involving at least three SNPs.These interactions have P-values < 2.94e-5 for a family-wise significance level of 0.05 after the Bonferroni adjustment based on 1701 possible diplotypes of these interactions.Am.Med. J. 3 (1): 56-66, 2012 Fig. 1: The flow chart of our Bayesian analysis strategy At the same time, these interactions also have moderately small P-values < 1e-4 for WTCCC data, as well as stable diplotype frequencies > 0.05 in either cases or controls for both NARAC and WTCCC data.The SNP annotations corresponding to these interactions are given in the supplementary file bestEpistasesFromWTCCC-NARAC-SNP.txt.Table 4 shows the selected high-order NARAC-SNP interactions that have DRR > 2 or are located on more than one chromosome (i.e., inter-chromosomal interactions).Table 5 shows the SNP annotations for selected interactions in Table 4.The validated interchromosomal interactions are on chromosomes 6 and 9 and chromosomes 6 and 3.
Synapse, calcium ion binding and membrane part associated with RA: MHC region on chromosome 6 is well-known for RA (Kozyryev and Zhang, 2012;Zhang et al., 2012).But very few other loci were detected without accounting for epistasis interaction.In our study, 31 associated SNPs were detected in 24 genes with posterior probabilities of association larger than 0.5 across 12 chromosomes (excluding chromosome 6) in WTCCC data.In order to test overrepresentation of biological pathways in these RA-associated genes, we use GOstat (Beissbarth and Speed, 2004) (http://gostat.wehi.edu.au/) to search enriched GO terms in these 24 RA-associated genes.Table 6 shows the top three GO terms: Synapse, calcium ion binding and membrane part (with Benjamini-corrected false discovery rates 0.0115, 0.0705 and 0.0705 respectively), the correspondingly associated genes and their chromosomes.

Am. Med. J. 3 (1): 56-66, 2012
Fig. 2: Chromosome-wise posterior probabilities of SNPs associated with RA on each chromosome.Dot indicates the marginal posterior probability of association per SNP, circle indicates the total posterior probability of association per SNP (i.e., the marginal plus the joint association probabilities).We connected the dot and circle for each SNP for better illustration.X-axis indicates the chromosomal position (Mb), y-axis shows the posterior probability

DISCUSSION
Rheumatoid arthritis is an inflammatory disease, primarily of the joints, with autoimmune features and a complex genetic component.To our limited knowledge this is the first time that synapse, calcium ion binding and membrane part are reported to be interactively associated with RA in whole-genome association studies.Rheumatoid arthritis is characterized by a chronic inflammation of the synovial joints (Feldmann et al., 1996) and factors like Fibroblast-Like Synoviocyte (FLS) and T cells are actively involved in joint deconstruction and synapse is the contact point of these factors (Tran et al., 2007).Also it is well-known that calcium ions and membrane are important parts in synaptic neurotransmission and peripheral and central nervous system is very important for joint protection (O'Connor and Vilensky, 2003).It is already known that neurogenic factors play very important roles in the etiopathogenesis of osteoarthritis (O'Connor and Vilensky, 2003).Our results suggest that synaptic neurotransmission could be as important for RA.

CONCLUSION
Our results suggest that high-order interactions might explain a big proportion of missing genetic component of RA.In the meanwhile, synapse, calcium Am.Med. J. 3 (1): 56-66, 2012

Table 2 :
Significant interactions obtained from the WTCCC data.The columns are: Inter: interaction index; Diplo.: diplotype, where number 0, 1 or 2 represents the copy number of Allele 2 (in Table3) at the corresponding SNP; Contr.1:diplotype frequency in 58C controls; Contr.2:diplotype frequency in NBS controls.Case: diplotype frequency in cases; P-value.12:Fisher's exact test P-values between two control groups; P-value: Fisher's exact test P-values between pooled controls and cases; DRR: disease relative risk

Table 6 :
Best GO terms for associated RA genes.