Genetic roots of multiple sclerosis
The genetics underlying who develops multiple sclerosis (MS) have been difficult to work out. Examining more than 47,000 cases and 68,000 controls with multiple genome-wide association studies, the International Multiple Sclerosis Genetics Consortium identified more than 200 risk loci in MS (see the Perspective by Briggs). Focusing on the best candidate genes, including a model of the major histocompatibility complex region, the authors identified statistically independent effects at the genome level. Gene expression studies detected that every major immune cell type is enriched for MS susceptibility genes and that MS risk variants are enriched in brain-resident immune cells, especially microglia. Up to 48% of the genetic contribution of MS can be explained through this analysis.
Structured Abstract
INTRODUCTION
Multiple sclerosis (MS) is an inflammatory and degenerative disease of the central nervous system (CNS) that often presents in young adults. Over the past decade, certain elements of the genetic architecture of susceptibility have gradually emerged, but most of the genetic risk for MS remained unknown.
RATIONALE
Earlier versions of the MS genetic map had highlighted the role of the adaptive arm of the immune system, implicating multiple different T cell subsets. We expanded our knowledge of MS susceptibility by performing a genetic association study in MS that leveraged genotype data from 47,429 MS cases and 68,374 control subjects. We enhanced this analysis with an in-depth and comprehensive evaluation of the functional impact of the susceptibility variants that we uncovered.
RESULTS
We identified 233 statistically independent associations with MS susceptibility that are genome-wide significant. The major histocompatibility complex (MHC) contains 32 of these associations, and one, the first MS locus on a sex chromosome, is found in chromosome X. The remaining 200 associations are found in the autosomal non-MHC genome. Our genome-wide partitioning approach and large-scale replication effort allowed the evaluation of other variants that did not meet our strict threshold of significance, such as 416 variants that had evidence of statistical replication but did not reach the level of genome-wide statistical significance. Many of these loci are likely to be true susceptibility loci. The genome-wide and suggestive effects jointly explain ~48% of the estimated heritability for MS.
Using atlases of gene expression patterns and epigenomic features, we documented that enrichment for MS susceptibility loci was apparent in many different immune cell types and tissues, whereas there was an absence of enrichment in tissue-level brain profiles. We extended the annotation analyses by analyzing new data generated from human induced pluripotent stem cell–derived neurons as well as from purified primary human astrocytes and microglia, observing that enrichment for MS genes is seen in human microglia, the resident immune cells of the brain, but not in astrocytes or neurons. Further, we have characterized the functional consequences of many MS susceptibility variants by identifying those that influence the expression of nearby genes in immune cells or brain. Last, we applied an ensemble of methods to prioritize 551 putative MS susceptibility genes that may be the target of the MS variants that meet a threshold of genome-wide significance. This extensive list of MS susceptibility genes expands our knowledge more than twofold and highlights processes relating to the development, maturation, and terminal differentiation of B, T, natural killer, and myeloid cells that may contribute to the onset of MS. These analyses focus our attention on a number of different cells in which the function of MS variants should be further investigated.
Using reference protein-protein interaction maps, these MS genes can also be assembled into 13 communities of genes encoding proteins that interact with one another; this higher-order architecture begins to assemble groups of susceptibility variants whose functional consequences may converge on certain protein complexes that can be prioritized for further evaluation as targets for MS prevention strategies.
CONCLUSION
We report a detailed genetic and genomic map of MS susceptibility, one that explains almost half of this disease’s heritability. We highlight the importance of several cells of the peripheral and brain resident immune systems—implicating both the adaptive and innate arms—in the translation of MS genetic risk into an auto-immune inflammatory process that targets the CNS and triggers a neurodegenerative cascade. In particular, the myeloid component highlights a possible role for microglia that requires further investigation, and the B cell component connects to the narrative of effective B cell–directed therapies in MS. These insights set the stage for a new generation of functional studies to uncover the sequence of molecular events that lead to disease onset. This perspective on the trajectory of disease onset will lay the foundation for developing primary prevention strategies that mitigate the risk of developing MS.
We list some of the immune cells in which we found an excess of MS susceptibility genes, implicating these cells as contributing to the earliest events that trigger MS. The sample size of our genome-wide association study is listed along with a circus plot illustrating main results.
Abstract
We analyzed genetic data of 47,429 multiple sclerosis (MS) and 68,374 control subjects and established a reference map of the genetic architecture of MS that includes 200 autosomal susceptibility variants outside the major histocompatibility complex (MHC), one chromosome X variant, and 32 variants within the extended MHC. We used an ensemble of methods to prioritize 551 putative susceptibility genes that implicate multiple innate and adaptive pathways distributed across the cellular components of the immune system. Using expression profiles from purified human microglia, we observed enrichment for MS genes in these brain-resident immune cells, suggesting that these may have a role in targeting an autoimmune process to the central nervous system, although MS is most likely initially triggered by perturbation of peripheral immune responses.
Over the past decade, elements of the genetic architecture of multiple sclerosis (MS) susceptibility have gradually emerged from genome-wide and targeted studies (1–6). The role of the adaptive arm of the immune system, particularly its CD4+ T cell component, has become clearer, with multiple different T cell subsets being implicated (4). Although the T cell component plays an important role, functional and epigenomic annotation studies have begun to suggest that other elements of the immune system may be involved as well (7, 8). We assembled available genome-wide MS data to perform a meta-analysis followed by a systematic, comprehensive replication effort in large independent sets of subjects. This effort has yielded a detailed genome-wide genetic map that includes the first successful evaluation of the X chromosome in MS and provides a powerful platform for the creation of a detailed genomic map, outlining the functional consequences of most variants and their assembly into susceptibility networks (fig. S1).
Discovery and replication of genetic associations
We organized available (1, 2, 4, 5) and newly genotyped genome-wide data in 15 data sets, totaling 14,802 subjects with MS and 26,703 controls for our discovery study (tables S1 to S3) (9). After rigorous per-data-set quality control, we imputed all samples using the 1000 Genomes Project European panel, resulting in an average of 7.8 million imputed single-nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) of at least 1% (9). We then performed a meta-analysis, penalized for within–data set residual genomic inflation, to a total of 8,278,136 SNPs, with data in at least two data sets (9). Of these, 26,395 SNPs reached genome-wide significance (P < 5 × 10−8; fixed-effects inverse-variance meta-analysis), and another 576,204 SNPs had at least nominal evidence of association (5 × 10−8 < P < 0.05; fixed-effects inverse-variance meta-analysis). In order to identify statistically independent SNPs in the discovery set and to prioritize variants for replication, we applied a genome-partitioning approach (9). Briefly, we first excluded an extended region of ~12 Mb around the major histocompatibility complex (MHC) locus to scrutinize this distinct region separately, and we then applied an iterative method to discover statistically independent SNPs in the rest of the genome using conditional modeling. We partitioned the genome into regions by extracting ±1 Mb on either side of the most statistically significant SNP and repeating this procedure until there were no SNPs with P < 0.05 (fixed-effects inverse-variance meta-analysis) left in the genome. Within each region, we applied conditional modeling to identify statistically independent effects (fig. S2). As a result, we identified 1961 non-MHC autosomal regions that included 4842 presumably statistically independent SNPs. We refer to these 4842 prioritized SNPs as “effects,” assuming that these SNPs tag a true causal genetic effect. Of these, 82 effects were genome-wide significant in the discovery analysis, and another 125 had P < 1 × 10−5 (fixed-effects inverse-variance meta-analysis).
In order to replicate these 4842 effects, we analyzed two large-scale independent sets of data. First, we designed the MS Chip to directly replicate each of the prioritized effects (9) and, after stringent quality check (table S4) (9), analyzed 20,360 MS subjects and 19,047 controls, which were organized into nine data sets. Second, we incorporated targeted genotyping data generated using the ImmunoChip platform on an additional 12,267 MS subjects and 22,625 control subjects that had not been used in either the discovery or the MS Chip subject sets (table S5) (3). Overall, we jointly analyzed data from 47,429 MS cases and 68,374 control subjects to provide a comprehensive genetic evaluation of MS susceptibility.
For 4311 of the 4842 effects (89%) that were prioritized in the discovery analysis, we could identify at least one tagging SNP in the replication data (table S6) (9); 156 regions had at least one genome-wide effect, and overall, 200 prioritized effects reached a level of genome-wide significance (GW) in these 156 regions (Fig. 1). Of these 200 effects, 62 represent secondary, independent, effects that emerged from conditional modeling within a given locus (table S7 and fig. S3) (9). The odds ratios (ORs) of these genome-wide effects ranged from 1.06 to 2.06, and the allele frequencies of the respective risk allele ranged from 2.1 to 98.4% in the European samples of the 1000 Genomes Project reference (mean, 51.3%; standard deviation, 24.5%) (table S8 and fig. S4). Of these 156 regions, 19.9% (31 out of 156) harbored more than one statistically independent GW effect. One of the most complex regions was the one harboring the EVI5 gene, which has been the subject of several reports with contradictory results (10–13). In this locus, we identified four statistically independent genome-wide effects, three of which were found under the same association peak (Fig. 2A), illustrating how our approach and the large sample size clarify associations described in smaller studies and can facilitate functional follow-up of complex loci.
The circos plot displays the 4842 prioritized autosomal non-MHC effects and the associations in chromosome X. Joint analysis (discovery and replication) P values are plotted as lines (fixed-effects inverse-variance meta-analysis). The green inner layer displays genome-wide significance (P < 5 × 10−8), the blue inner layer displays suggestive P values (1 × 10−5 < P >5 × 10−8), and the gray layer displays P values > 1 × 10−5. Each line in the inner layers represents one effect. Two hundred autosomal non-MHC and one in chromosome X genome-wide effects are listed. The vertical lines in the inner layers represent one effect, and the respective color displays the replication status (supplementary materials, materials and methods): green (genome-wide), blue (suggestive), and red (nonreplicated). Plotted on the outer surface are 551 prioritized genes. The inner circle space includes PPIs among genome-wide genes (green) and between genome-wide genes and suggestive genes (blue) that are identified as candidates by using PPI networks (9).
(A) Regional association plot of the EVI5 locus. Discovery P values (fixed-effects inverse-variance meta-analysis) are displayed. The layer tagged “Step 0” plots the associations of the marginal analysis, with the most statistically significant SNP being rs11809700 (ORT = 1.16; P = 3.51 × 10−15). The “Step 1” plots the associations conditioning on rs11809700; rs12133753 is the most statistically significant SNP (ORC = 1.14; P = 8.53 × 10−09). “Step 2” plots the results conditioning on rs11809700 and rs12133753, with rs1415069 displaying the lowest P value (ORG = 1.10; P = 4.01 × 10−5). Last, “Step 3” plots the associations conditioning on rs11809700, rs12133753, and rs1415069, identifying rs58394161 as the most statistically significant SNP (ORC = 1.10; P = 8.63 × 10−4). All four SNPs reached genome-wide significance in the respective joint (discovery plus replication) analyses (table S7). Each of the four independent SNPs—lead SNPs—are highlighted by use of a triangle in the respective layer. (B) Regional association plot for the genome-wide chromosome X variant. Joint analysis P values (fixed-effects inverse-variance meta-analysis) are displayed. Linkage disequilibrium, in terms of r2 based on the 1000 Genomes European panel, is indicated by use of a combination of color grade and symbol size. All positions are in human genome 19.
We also performed a joint analysis of available data on sex chromosome variants (9) and identified rs2807267 as genome-wide significant [odds ratio (OR) for T allele (ORT) = 1.07, P = 6.86 × 10−9; fixed-effects inverse-variance meta-analysis] (tables S9 and S10). This variant lies within an enhancer peak specific for T cells and is 948 base pair (bp) downstream of the RNA U6 small nuclear 320 pseudogene (RNU6-320P), a component of the U6 small nuclear ribonucleoprotein (snRNP) that is part of the spliceosome and responsible for the splicing of introns from pre-mRNA (Fig. 2B) (14). The nearest gene is VGLL1 (27,486 bp upstream) that has been proposed to be a co-activator of mammalian transcription factors (15). No variant in the Y chromosome had a P value lower than 0.05 (fixed-effects inverse-variance meta-analysis).
The MHC was the first MS susceptibility locus to be identified, and prior studies have found that the MHC harbors multiple independent susceptibility variants, including interactions within the class II human leukocyte antigen (HLA) genes (16, 17). We undertook a detailed modeling of this region to account for its long-range linkage disequilibrium and allelic heterogeneity using SNP data as well as imputed classical alleles and amino acids of the HLA genes in the assembled data. We confirmed prior MHC susceptibility variants (including a nonclassical HLA effect located in the TNFA/LST1 long haplotype) and extended the association map to uncover a total of 31 statistically independent effects at the genome-wide level within the MHC (Fig. 3 and table S11). Multiple HLA and nearby non-HLA genes have several independent effects that can now be identified because of our large sample; for example, the HLA-DRB1 locus has six statistically independent effects. Another finding involves HLA-B, which also appears to harbor six independent effects on MS susceptibility. The role of the nonclassical HLA and non-HLA genome in the MHC is also highlighted. One-third (9 out of 31) of the identified variants lie within either intergenic regions or in a long-range haplotype that contains several nonclassical HLA and other non-HLA genes (17).