Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility – Science Magazine

Genetic roots of multiple sclerosis

The genetics underlying who develops multiple sclerosis (MS) have been difficult to work out. Examining more than 47,000 cases and 68,000 controls with multiple genome-wide association studies, the International Multiple Sclerosis Genetics Consortium identified more than 200 risk loci in MS (see the Perspective by Briggs). Focusing on the best candidate genes, including a model of the major histocompatibility complex region, the authors identified statistically independent effects at the genome level. Gene expression studies detected that every major immune cell type is enriched for MS susceptibility genes and that MS risk variants are enriched in brain-resident immune cells, especially microglia. Up to 48% of the genetic contribution of MS can be explained through this analysis.

Science, this issue p. eaav7188; see also p. 1383

Structured Abstract

INTRODUCTION

Multiple sclerosis (MS) is an inflammatory and degenerative disease of the central nervous system (CNS) that often presents in young adults. Over the past decade, certain elements of the genetic architecture of susceptibility have gradually emerged, but most of the genetic risk for MS remained unknown.

RATIONALE

Earlier versions of the MS genetic map had highlighted the role of the adaptive arm of the immune system, implicating multiple different T cell subsets. We expanded our knowledge of MS susceptibility by performing a genetic association study in MS that leveraged genotype data from 47,429 MS cases and 68,374 control subjects. We enhanced this analysis with an in-depth and comprehensive evaluation of the functional impact of the susceptibility variants that we uncovered.

RESULTS

We identified 233 statistically independent associations with MS susceptibility that are genome-wide significant. The major histocompatibility complex (MHC) contains 32 of these associations, and one, the first MS locus on a sex chromosome, is found in chromosome X. The remaining 200 associations are found in the autosomal non-MHC genome. Our genome-wide partitioning approach and large-scale replication effort allowed the evaluation of other variants that did not meet our strict threshold of significance, such as 416 variants that had evidence of statistical replication but did not reach the level of genome-wide statistical significance. Many of these loci are likely to be true susceptibility loci. The genome-wide and suggestive effects jointly explain ~48% of the estimated heritability for MS.

Using atlases of gene expression patterns and epigenomic features, we documented that enrichment for MS susceptibility loci was apparent in many different immune cell types and tissues, whereas there was an absence of enrichment in tissue-level brain profiles. We extended the annotation analyses by analyzing new data generated from human induced pluripotent stem cell–derived neurons as well as from purified primary human astrocytes and microglia, observing that enrichment for MS genes is seen in human microglia, the resident immune cells of the brain, but not in astrocytes or neurons. Further, we have characterized the functional consequences of many MS susceptibility variants by identifying those that influence the expression of nearby genes in immune cells or brain. Last, we applied an ensemble of methods to prioritize 551 putative MS susceptibility genes that may be the target of the MS variants that meet a threshold of genome-wide significance. This extensive list of MS susceptibility genes expands our knowledge more than twofold and highlights processes relating to the development, maturation, and terminal differentiation of B, T, natural killer, and myeloid cells that may contribute to the onset of MS. These analyses focus our attention on a number of different cells in which the function of MS variants should be further investigated.

Using reference protein-protein interaction maps, these MS genes can also be assembled into 13 communities of genes encoding proteins that interact with one another; this higher-order architecture begins to assemble groups of susceptibility variants whose functional consequences may converge on certain protein complexes that can be prioritized for further evaluation as targets for MS prevention strategies.

CONCLUSION

We report a detailed genetic and genomic map of MS susceptibility, one that explains almost half of this disease’s heritability. We highlight the importance of several cells of the peripheral and brain resident immune systems—implicating both the adaptive and innate arms—in the translation of MS genetic risk into an auto-immune inflammatory process that targets the CNS and triggers a neurodegenerative cascade. In particular, the myeloid component highlights a possible role for microglia that requires further investigation, and the B cell component connects to the narrative of effective B cell–directed therapies in MS. These insights set the stage for a new generation of functional studies to uncover the sequence of molecular events that lead to disease onset. This perspective on the trajectory of disease onset will lay the foundation for developing primary prevention strategies that mitigate the risk of developing MS.

The MS genetic map implicates microglia as well as multiple different peripheral immune cell populations in the onset of the disease.

We list some of the immune cells in which we found an excess of MS susceptibility genes, implicating these cells as contributing to the earliest events that trigger MS. The sample size of our genome-wide association study is listed along with a circus plot illustrating main results.

Abstract

We analyzed genetic data of 47,429 multiple sclerosis (MS) and 68,374 control subjects and established a reference map of the genetic architecture of MS that includes 200 autosomal susceptibility variants outside the major histocompatibility complex (MHC), one chromosome X variant, and 32 variants within the extended MHC. We used an ensemble of methods to prioritize 551 putative susceptibility genes that implicate multiple innate and adaptive pathways distributed across the cellular components of the immune system. Using expression profiles from purified human microglia, we observed enrichment for MS genes in these brain-resident immune cells, suggesting that these may have a role in targeting an autoimmune process to the central nervous system, although MS is most likely initially triggered by perturbation of peripheral immune responses.

Over the past decade, elements of the genetic architecture of multiple sclerosis (MS) susceptibility have gradually emerged from genome-wide and targeted studies (1–6). The role of the adaptive arm of the immune system, particularly its CD4⁺ T cell component, has become clearer, with multiple different T cell subsets being implicated (4). Although the T cell component plays an important role, functional and epigenomic annotation studies have begun to suggest that other elements of the immune system may be involved as well (7, 8). We assembled available genome-wide MS data to perform a meta-analysis followed by a systematic, comprehensive replication effort in large independent sets of subjects. This effort has yielded a detailed genome-wide genetic map that includes the first successful evaluation of the X chromosome in MS and provides a powerful platform for the creation of a detailed genomic map, outlining the functional consequences of most variants and their assembly into susceptibility networks (fig. S1).

Discovery and replication of genetic associations

We organized available (1, 2, 4, 5) and newly genotyped genome-wide data in 15 data sets, totaling 14,802 subjects with MS and 26,703 controls for our discovery study (tables S1 to S3) (9). After rigorous per-data-set quality control, we imputed all samples using the 1000 Genomes Project European panel, resulting in an average of 7.8 million imputed single-nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) of at least 1% (9). We then performed a meta-analysis, penalized for within–data set residual genomic inflation, to a total of 8,278,136 SNPs, with data in at least two data sets (9). Of these, 26,395 SNPs reached genome-wide significance (P < 5 × 10⁻⁸; fixed-effects inverse-variance meta-analysis), and another 576,204 SNPs had at least nominal evidence of association (5 × 10⁻⁸ < P < 0.05; fixed-effects inverse-variance meta-analysis). In order to identify statistically independent SNPs in the discovery set and to prioritize variants for replication, we applied a genome-partitioning approach (9). Briefly, we first excluded an extended region of ~12 Mb around the major histocompatibility complex (MHC) locus to scrutinize this distinct region separately, and we then applied an iterative method to discover statistically independent SNPs in the rest of the genome using conditional modeling. We partitioned the genome into regions by extracting ±1 Mb on either side of the most statistically significant SNP and repeating this procedure until there were no SNPs with P < 0.05 (fixed-effects inverse-variance meta-analysis) left in the genome. Within each region, we applied conditional modeling to identify statistically independent effects (fig. S2). As a result, we identified 1961 non-MHC autosomal regions that included 4842 presumably statistically independent SNPs. We refer to these 4842 prioritized SNPs as “effects,” assuming that these SNPs tag a true causal genetic effect. Of these, 82 effects were genome-wide significant in the discovery analysis, and another 125 had P < 1 × 10⁻⁵ (fixed-effects inverse-variance meta-analysis).

In order to replicate these 4842 effects, we analyzed two large-scale independent sets of data. First, we designed the MS Chip to directly replicate each of the prioritized effects (9) and, after stringent quality check (table S4) (9), analyzed 20,360 MS subjects and 19,047 controls, which were organized into nine data sets. Second, we incorporated targeted genotyping data generated using the ImmunoChip platform on an additional 12,267 MS subjects and 22,625 control subjects that had not been used in either the discovery or the MS Chip subject sets (table S5) (3). Overall, we jointly analyzed data from 47,429 MS cases and 68,374 control subjects to provide a comprehensive genetic evaluation of MS susceptibility.

For 4311 of the 4842 effects (89%) that were prioritized in the discovery analysis, we could identify at least one tagging SNP in the replication data (table S6) (9); 156 regions had at least one genome-wide effect, and overall, 200 prioritized effects reached a level of genome-wide significance (GW) in these 156 regions (Fig. 1). Of these 200 effects, 62 represent secondary, independent, effects that emerged from conditional modeling within a given locus (table S7 and fig. S3) (9). The odds ratios (ORs) of these genome-wide effects ranged from 1.06 to 2.06, and the allele frequencies of the respective risk allele ranged from 2.1 to 98.4% in the European samples of the 1000 Genomes Project reference (mean, 51.3%; standard deviation, 24.5%) (table S8 and fig. S4). Of these 156 regions, 19.9% (31 out of 156) harbored more than one statistically independent GW effect. One of the most complex regions was the one harboring the EVI5 gene, which has been the subject of several reports with contradictory results (10–13). In this locus, we identified four statistically independent genome-wide effects, three of which were found under the same association peak (Fig. 2A), illustrating how our approach and the large sample size clarify associations described in smaller studies and can facilitate functional follow-up of complex loci.

Fig. 1 The genetic map of multiple sclerosis.

The circos plot displays the 4842 prioritized autosomal non-MHC effects and the associations in chromosome X. Joint analysis (discovery and replication) P values are plotted as lines (fixed-effects inverse-variance meta-analysis). The green inner layer displays genome-wide significance (P < 5 × 10⁻⁸), the blue inner layer displays suggestive P values (1 × 10⁻⁵ < P >5 × 10⁻⁸), and the gray layer displays P values > 1 × 10⁻⁵. Each line in the inner layers represents one effect. Two hundred autosomal non-MHC and one in chromosome X genome-wide effects are listed. The vertical lines in the inner layers represent one effect, and the respective color displays the replication status (supplementary materials, materials and methods): green (genome-wide), blue (suggestive), and red (nonreplicated). Plotted on the outer surface are 551 prioritized genes. The inner circle space includes PPIs among genome-wide genes (green) and between genome-wide genes and suggestive genes (blue) that are identified as candidates by using PPI networks (9).

Fig. 2 Multiple independent effects in the *EVI5* locus and chromosome X associations.

(A) Regional association plot of the *EVI5* locus. Discovery P values (fixed-effects inverse-variance meta-analysis) are displayed. The layer tagged “Step 0” plots the associations of the marginal analysis, with the most statistically significant SNP being rs11809700 (OR_T = 1.16; P = 3.51 × 10⁻¹⁵). The “Step 1” plots the associations conditioning on rs11809700; rs12133753 is the most statistically significant SNP (OR_C = 1.14; P = 8.53 × 10⁻⁰⁹). “Step 2” plots the results conditioning on rs11809700 and rs12133753, with rs1415069 displaying the lowest P value (OR_G = 1.10; P = 4.01 × 10⁻⁵). Last, “Step 3” plots the associations conditioning on rs11809700, rs12133753, and rs1415069, identifying rs58394161 as the most statistically significant SNP (OR_C = 1.10; P = 8.63 × 10⁻⁴). All four SNPs reached genome-wide significance in the respective joint (discovery plus replication) analyses (table S7). Each of the four independent SNPs—lead SNPs—are highlighted by use of a triangle in the respective layer. (B) Regional association plot for the genome-wide chromosome X variant. Joint analysis P values (fixed-effects inverse-variance meta-analysis) are displayed. Linkage disequilibrium, in terms of r² based on the 1000 Genomes European panel, is indicated by use of a combination of color grade and symbol size. All positions are in human genome 19.

We also performed a joint analysis of available data on sex chromosome variants (9) and identified rs2807267 as genome-wide significant [odds ratio (OR) for T allele (OR_T) = 1.07, P = 6.86 × 10⁻⁹; fixed-effects inverse-variance meta-analysis] (tables S9 and S10). This variant lies within an enhancer peak specific for T cells and is 948 base pair (bp) downstream of the RNA U6 small nuclear 320 pseudogene (RNU6-320P), a component of the U6 small nuclear ribonucleoprotein (snRNP) that is part of the spliceosome and responsible for the splicing of introns from pre-mRNA (Fig. 2B) (14). The nearest gene is VGLL1 (27,486 bp upstream) that has been proposed to be a co-activator of mammalian transcription factors (15). No variant in the Y chromosome had a P value lower than 0.05 (fixed-effects inverse-variance meta-analysis).

The MHC was the first MS susceptibility locus to be identified, and prior studies have found that the MHC harbors multiple independent susceptibility variants, including interactions within the class II human leukocyte antigen (HLA) genes (16, 17). We undertook a detailed modeling of this region to account for its long-range linkage disequilibrium and allelic heterogeneity using SNP data as well as imputed classical alleles and amino acids of the HLA genes in the assembled data. We confirmed prior MHC susceptibility variants (including a nonclassical HLA effect located in the TNFA/LST1 long haplotype) and extended the association map to uncover a total of 31 statistically independent effects at the genome-wide level within the MHC (Fig. 3 and table S11). Multiple HLA and nearby non-HLA genes have several independent effects that can now be identified because of our large sample; for example, the HLA-DRB1 locus has six statistically independent effects. Another finding involves HLA-B, which also appears to harbor six independent effects on MS susceptibility. The role of the nonclassical HLA and non-HLA genome in the MHC is also highlighted. One-third (9 out of 31) of the identified variants lie within either intergenic regions or in a long-range haplotype that contains several nonclassical HLA and other non-HLA genes (17).

Fig. 3 Independent associations in the major histocompatibility locus.

Regional association plot in the MHC locus. Only genome-wide statistically independent effects are listed. The order of variants in the x axis represents the order that these were identified. The size of the circle represents different values of –log10(P value) (fixed-effects inverse-variance meta-analysis). Different colors are used to depict class I, II, III, and non-HLA effects. y axis displays million base pairs.

Recently, we reported an interaction between HLA-DRB1*15:01 and HLA-DQA1*01:01 by analyzing imputed HLA alleles (16). In this work, we reinforced this analysis by analyzing SNPs, HLA alleles, and respective amino acids. We replicated the presence of interactions among class II alleles, but the second interaction term, besides HLA-DRB1*15:01, can vary depending on the other independent variants that are included in the model. First, we found that there are interaction models of HLA-DRB1*15:01 with other variants in MHC that explain better the data than our previously reported HLA-DRB1*15:01/HLA-DQA1*01:01 interaction term (fig. S5). Second, we observed that there is a group of HLA*DQB1 and HLA*DQA1 SNPs, alleles, and amino acids that consistently rank among the best models with HLA-DRB1*15:01 interaction terms (fig. S6). This group of HLA-DRB1*15:01–interacting variants is consistently identified regardless of the marginal effects of other statistically independent variants that are added in the model, implying that these interaction terms capture a different subset of phenotypic variance and can be explored after the identification of the marginal effects. Last, we performed a sensitivity analysis by including interaction terms of HLA-DRB1*15:01 in each step and selecting the model with the lowest Bayesian information criterion instead of testing only the marginal results of the variants, as we did in the main analysis (classical model MHC analysis) (table S12). This sensitivity analysis also resulted in 32 statistically independent effects with a genome-wide significant P value (fixed-effects inverse-variance meta-analysis) (table S12), of which one-third (9 out of 32) were different than the ones in classical model MHC analysis. The main differences between the results of the two approaches were the inclusion of interaction of HLA-DRB1*15:01 and rs1049058 in step 3 and the stronger association of HLA*DPB1/2 effects over HLA*DRB1 effects in the sensitivity model (tables S12 and S13 and fig. S6). Thus, overall, our MHC results are not strongly affected by the analytic model that we have selected.

Characterization of non–genome-wide effects

The commonly used threshold of genome-wide significance (P = 5 × 10⁻⁸) has played an important role in making human genetic study results robust; however, several studies have demonstrated that non–genome-wide effects explain an important proportion of the effect of genetic variation on disease susceptibility (18, 19). More importantly, several such effects are eventually identified as genome-wide significant, given enough sample size and true effects (3). Thus, we also evaluated the non–genome-wide effects that were selected for replication, had available replication data (n = 4111 effects), but did not meet a standard threshold of genome-wide significance (P < 5 × 10⁻⁸). Specifically, we decided to stratify these 4111 effects into two main categories: (i) suggestive effects (S; n = 416), and (ii) nonreplicated effects (NR; n = 3,695) (9). We used these categories in downstream analyses to further characterize the prioritized effects from the discovery study in terms of potential to eventually be replicated. We also included a third category: effects for which there were no data for replication in any of the replication sets [no data (ND); n = 532). Furthermore, to add granularity in each category, we substratified the suggestive effects into two groups: (1a) strongly suggestive (sS; n = 117; 5 × 10⁻⁸ < P < 1 × 10⁻⁵, fixed-effects inverse-variance meta-analysis) and (1b) underpowered suggestive (unS; n = 299). Of these two categories of suggestive effects, the ones in the sS category have a high probability of reaching genome-wide significance as we increase our sample size in future studies (table S14) (9).

Heritability explained

To estimate the extent to which we have characterized the genetic architecture of MS susceptibility with our 200 genome-wide non-MHC autosomal MS effects, we calculated the narrow-sense heritability captured by common variation (h²g), the ratio of additive genetic variance to the total phenotypic variance (18, 20). Only the 15 strata of data from the discovery set had true genome-wide coverage, and hence, we used these 14,802 MS subjects and 26,703 controls for the heritability analyses. The overall heritability estimate for MS susceptibility in the discovery set of subjects was 19.2% [95% confidence interval (CI), 18.5 to 19.8%). Heritability partitioning by using minor allele frequency or P value thresholds has led to substantial insights in previous studies (21), and we therefore applied a similar partitioning approach but in a fashion that took into consideration the study design and the existence of replication information from the two large-scale replication cohorts. First, we partitioned the autosomal genome into three components: (i) the super extended MHC (SE MHC), (ii) a component with the 1961 regions prioritized for replication (Regions), and (iii) the rest of the genome that had P > 0.05 (fixed-effects inverse-variance meta-analysis) in the discovery study (Nonassociated regions). Then, we estimated the h²g that can be attributed to each component as a proportion of the overall narrow-sense heritability observed. The SE MHC explained 21.4% of the h²g, with the remaining 78.6% being captured by the second component (Fig. 4A). Then, we further partitioned the non-MHC component into one that captured all 4842 statistically independent effects (Prioritized for replication), which explained the vast majority of the overall estimated heritability: 68.3%. The “Nonprioritized” SNPs in the 1961 regions explained 11.6% of the heritability, which suggests that there may be residual linkage disequilibrium (LD) with prioritized effects or true effects that have not yet been identified (Fig. 4B).

Fig. 4 Heritability partitioning.

Proportion of the overall narrow-sense heritability under the liability model (~19.2%) explained with different genetic components. (A) The overall heritability is partitioned in the SE MHC, the 1962 regions that include all SNPs with P <0.05 (Regions; fixed-effects inverse-variance meta-analysis), and the rest of genome with P >0.05 (Nonassociated regions). (B) The Regions are further partitioned to the seemingly statistically independent effects (Prioritized) and the residual effects (Nonprioritized). (C) The Prioritized component is partitioned on the basis of the replication knowledge to genome-wide effects (GW), suggestive (S), nonreplicated (ND), and no data (ND). The lines connecting the pie charts depict the component that is partitioned. All values were estimated by using the discovery data sets (n = 4802 cases and 26,703 controls).

We then used the replication-based categories described above to further partition the “Prioritized” heritability component, namely “GW,” “S,” “NR,” and “ND” (Fig. 4C). The GW captured 18.3% of the overall heritability. Thus, along with the contribution of the SE MHC (20.2% in the same model), we can explain ~39% of the genetic predisposition to MS with the validated susceptibility alleles. This can be extended to ~48% if we include the suggestive (S) effects (9.0%). The nonreplicated (NR) effects captured 38.8% of the heritability, which could imply that some of these effects might be falsely nonreplicated—that these are true effects that need further data to emerge robustly or that their effect may be true and present in only a subset of the data. However, few of the 3695 NR effects would fall in either of the above two cases; the vast majority of these effects are likely to be false-positive results.

Functional implications of the MS loci, enriched pathways, and gene sets

Next, we began to annotate the MS effects. To prioritize the cell types or tissues in which the 200 non-MHC autosomal effects may exert their effect, we used two different approaches: one that leverages atlases of gene expression patterns and another that uses a catalog of epigenomic features such as deoxyribonuclease hypersensitivity sites (DHSs) (8, 9, 22–24). Significant enrichment for MS susceptibility loci was apparent in many different immune cell types and tissues, whereas there was an absence of enrichment in tissue-level central nervous system (CNS) profiles (Fig. 5). The enrichment is observed not only in immune cells that have long been studied in MS, such as T cells, but also in B cells, whose role has emerged more recently (25). Furthermore, although the adaptive immune system has been proposed to play a predominant role in MS onset (26), we now demonstrate that many elements of innate immunity, such as natural killer (NK) cells and dendritic cells, also display strong enrichment for MS susceptibility genes. At the tissue level, the role of the thymus is also highlighted, possibly suggesting a role of genetic variation in thymic selection of autoreactive T cells in MS (27). Public tissue-level CNS data—which are derived from a complex mixture of cell types—do not show an excess of MS susceptibility variants in annotation analyses. However, since MS is a disease of the CNS, we extended the annotation analyses by analyzing data generated from human iPSC-derived neurons as well as from purified primary human astrocytes and microglia (9). As seen in Fig. 6, enrichment for MS genes is seen in human microglia (P = 5 × 10⁻¹⁴) but not in astrocytes or neurons, suggesting that the resident immune cells of the brain may also play a role in MS susceptibility.

Fig. 5 Tissue- and cell-type enrichment analyses.

(A) Gene Atlas tissues and cell types gene expression enrichment. (B) DHS enrichment for tissues and cell types from the NIH Epigenetic Roadmap. Rows are sorted from immune cells or tissues to CNS-related ones. Both x axes display –log10 of Benjamini and Hochberg P values (FDR). The vertical black line highlights the threshold of significance for the enrichment analysis.

Fig. 6 Dissection of cortical RNA-seq data.

(A) A heatmap of the results of our analysis assessing whether a cortical eQTL is likely to come from one of the component cell types of the cortex: neurons, oligodendrocytes, endothelial cells, microglia, and astrocytes (in rows). Each column presents results for one of the MS brain eQTLs. The color scheme relates to the P value of the interaction term (linear regression), with red denoting a more extreme result. (B) The same results in a different form, comparing results of assessing for interaction with neuronal proportion (y axis) and microglial proportion (x axis). The *SLC12A5* eQTL is significantly stronger when accounting for neuronal proportion, and *CLECL1* is significantly stronger when accounting for microglia. The Bonferroni-corrected threshold of significance is highlighted by the dashed line. (C) Locus view of the *SLC12A5/CD40* locus, illustrating the distribution of MS susceptibility and the *SLC12A5* brain eQTL in a segment of chromosome 20 (x axis); the y axis presents the P value of association with MS susceptibility (top; fixed effects inverse-variance meta-analysis) or *SLC12A5* RNA expression (bottom; linear regression). The lead MS SNP is denoted by a triangle; other SNPs are circles, with the intensity of the red color denoting the strength of LD with the lead MS SNP. (D) Plot of the level of expression, transcriptome-wide, for each measured gene in our cortical RNA-seq dataset (n = 455) (y axis) and purified human microglia (n = 10) (x axis) from the same cortical region. In blue, we highlight those genes with greater than fourfold increased expression in microglia relative to bulk cortical tissue and are expressed at a reasonable level in microglia. Each dot is one gene. Gray dots denote the 551 putative MS genes from our integrated analysis. *SLC12A5* and *CLECL1* are highlighted in red; in blue, we highlight a selected subset of the MS genes—many of them well-validated—which are enriched in microglia. For clarity, we did not include all of the MS genes that fall in this category.

We repeated the enrichment analyses for the S and NR effects, aiming to test whether these have a similar enrichment pattern with the 200 GW effects. The S effects exhibited a pattern of enrichment that is similar to that of the GW effects, with only B cell expression reaching a threshold of statistical significance (fig. S7). This provides additional circumstantial evidence that this category of variants may harbor true causal associations. On the other hand, the NR enrichment results seem to follow a rather random pattern, suggesting that most of these effects are indeed not truly MS-related (fig. S7).

The strong enrichment of the GW effects in immune cell types motivated us to prioritize candidate MS susceptibility genes by identifying those susceptibility variants, which affect RNA expression of nearby genes [cis expression quantitative trait loci effect (cis–eQTL)] [±500 kilobase pairs (kbp) around the effect SNP] (9). Thus, we interrogated the potential function of MS susceptibility variants in naive CD4⁺ T cells and monocytes from 211 healthy subjects as well as peripheral blood mononuclear cells (PBMCs) from 225 remitting relapsing MS subjects. Out of the 200 GW MS effects, 36 (18%) had at least one tagging SNP (r² ≥ 0.5) that altered the expression of 46 genes [false discovery rate (FDR) < 5%] in CD4⁺ naïve T cells (tables S15 and S16), and 36 MS effects (18%; 10 common with the CD4⁺ naïve T cells) influenced the expression of 48 genes in monocytes (11 genes in common with T cells). In MS PBMC, 30% of the GW effects (60 out of the 200) were cis-eQTLs for 92 genes in the PBMC MS samples, with several loci being shared with those found in healthy T cells and monocytes (26 effects and 27 genes in T cells, and 21 effects and 24 genes in monocytes, respectively) (tables S15 and S16).

Because MS is a disease of the CNS, we also investigated a large collection of dorsolateral prefrontal cortex RNA sequencing profiles from two longitudinal cohort studies of aging (n = 455 subjects), which recruit cognitively nonimpaired individuals (9). This cortical sample provides a tissue-level profile derived from a complex mixture of neurons, astrocytes, and other parenchymal cells, such as microglia and occasional peripheral immune cells. In these data, we found that 66 of the GW MS effects (33% of the 200 effects) were cis-eQTLs for 104 genes. Over this CNS and the three immune sets of data, 104 GW effects were cis-eQTLs for 203 different genes (n = 211 cis-eQTLs), with several appearing to be seemingly specific for one of the cell or tissue type (table S16). Specifically, 21.2% (45 out of 211 cis-eQTLs) of these cortical cis-eQTLs displayed no evidence of association [P > 0.05, for linear regression (9), with any SNP with r² > 0.1] in the immune cell and PBMCs results and are less likely to be immune-related (tables S16 and S17).

To further explore the challenging and critical question of whether some of the MS variants have an effect that is primarily exerted through a nonimmune cell, we performed a secondary analysis of our cortical RNA-sequencing (RNA-seq) data in which we attempted to ascribe a brain cis-eQTL to a particular cell type. Specifically, we assessed our tissue-level profile and adjusted each cis-eQTL analysis for the proportion of neurons, astrocytes, microglia, and oligodendrocytes estimated to be present in the tissue: The hypothesis was that the effect of a SNP with a cell type–specific cis-eQTL would be stronger if we adjusted for the proportion of the target cell type (Fig. 6 and fig. S8). As anticipated, almost all of the MS variants present in cortex remain ambiguous; it is likely that many of them influence gene function in multiple immune and nonimmune cell types. However, the SLC12A5 locus is different; here, the effect of the SNP is significantly stronger when we account for the proportion of neurons (Fig. 6, A and B), and the CLECL1 locus emerges when we account for the proportion of microglia. SLC12A5 is a potassium/chloride transporter that is known to be expressed in neurons, and a rare variant in SLC12A5 causes a form of pediatric epilepsy (28, 29). Although this MS locus may therefore appear to be a good candidate to have a primarily neuronal effect, further evaluation found that this MS susceptibility haplotype also harbors susceptibility to rheumatoid arthritis (30) and a cis-eQTL in B cells for the CD40 gene (31). Thus, the same haplotype harbors different functional effects in very different contexts, illustrating the challenge in dissecting the functional consequences of autoimmune variants in immune function as opposed to the tissue targeted in autoimmune disease. However, CLECL1 represents a simpler case of a known susceptibility effect that has previously been linked to altered CLECL1 RNA expression in monocytes (26, 32); its enrichment in microglial cells, which share many molecular pathways with other myeloid cells, is more straightforward to understand. CLECL1 is expressed at low levels in our cortical RNA-seq profiles because microglia represent just a small fraction of cells at the cortical tissue level, and CLECL1’s expression level is 20-fold greater when we compare its level of expression in purified human cortical microglia with the bulk cortical tissue (Fig. 6). CLECL1 therefore suggests a potential role of microglia in MS susceptibility, which is underestimated in bulk tissue profiles that are available in epigenomic and transcriptomic reference data. Overall, many genes that are eQTL targets of MS variants in the human cortex are most likely to affect multiple cell types. These brain eQTL results and the enrichment found in analyses of our purified human microglia data therefore highlight the need for more targeted, cell-type–specific data for the CNS to adequately determine the extent of its role in MS susceptibility.

These eQTL studies begin to transition our genetic map into a resource outlining the likely MS susceptibility gene(s) in a locus and the potential functional consequences of certain MS variants. To assemble these single-locus results into a higher-order perspective of MS susceptibility, we turned to pathway analyses to evaluate how the extended list of genome-wide effects provides new insights into the pathophysiology of the disease. Acknowledging that there is no available method to identify all causal genes after genome-wide association study (GWAS) discoveries, we prioritized genes for pathway analyses while allowing several different hypotheses for mechanisms of actions (9). In brief, we prioritized genes that (i) were cis-eQTLs in any of the eQTL data sets outlined above, (ii) had at least one exonic variant at r² ≥ 0.1 with any of the 200 effects, (iii) had high scores of regulatory potential by using a cell-specific network approach, and (iv) had a similar coexpression pattern as identified with DEPICT (33). Sensitivity analyses were performed that included different combinations of the above categories and included genes with intronic variants at r² ≥ 0.5 with any of the 200 effects (9). Overall, we prioritized 551 candidate MS genes (table S18; sensitivity analyses are provided in table S19) to test for statistical enrichment of known pathways. Approximately 39.6% (142 out of 358) of the Ingenuity Pathway Analysis canonical pathways (34), which had overlap with at least one of the identified genes, were enriched for MS genes at a FDR < 5% (table S20). Sensitivity analyses that included different criteria to prioritize genes revealed a similar pattern of pathway enrichment (table S21) (9). The extensive list of susceptibility genes, which more than doubles the previous knowledge in MS, captures processes of development, maturation, and terminal differentiation of several immune cells that potentially interact to predispose to MS. In particular, the role of B cells, dendritic cells, and NK cells has emerged more clearly, broadening the prior narrative of T cell dysregulation that emerged from earlier studies (4). Given the overrepresentation of immune pathways in these databases, ambiguity remains as to where some variants may have their effect: Neurons and particularly astrocytes repurpose the component genes of many “immune” signaling pathways, such as the ciliary neurotrophic factor, nerve growth factor, and neuregulin signaling pathways that are highly significant in our analysis (table S20). These results—along with the results relating to microglia—emphasize the need for further dissection of these pathways in specific cell types to resolve where a variant is exerting its effect; it is possible that multiple, different cell types could be involved in disease because they all experience the effect of the variant.

Pathway and gene-set enrichment analyses can only identify statistically significant connections of genes in already reported, and in some cases validated, mechanisms of action. However, the function of many genes is yet to be uncovered, and even for well-studied genes, the full repertoire of possible mechanisms is still incomplete. To complement the pathway analysis approach and to explore the connectivity of our prioritized GW genes, we performed a protein-protein interaction (PPI) analysis using GeNets (9, 35). About one-third of the 551 prioritized genes (n = 190; 34.5%) were connected (P = 0.052; permutation-based P value), and these could be organized into 13 communities—subnetworks with higher connectivity (P < 0.002; permutation-based P value) (table S22). This compares with nine communities that could be identified by the previously reported MS susceptibility list (81 connected genes out of 307) (table S23) (3). Next, we leveraged GeNets to predict candidate genes on the basis of network connectivity and pathway membership similarity and tested whether our previous known MS susceptibility list could have predicted any of the genes prioritized by the newly identified effects. Of the 244 genes prioritized by new findings (out of the 551 overall prioritized genes), only five could be predicted given the old results (out of 70 candidates that emerge from the extrapolation of earlier data) (fig. S9 and table S24). In a similar fashion, we estimated that the list of 551 prioritized genes could predict 102 new candidate genes, four of which can be prioritized because they are in the list of suggestive effects. (Fig. 1, fig. S10, and table S25).

Discussion

This detailed genetic map of MS is a powerful substrate for annotation and functional studies and provides a new level of understanding for the molecular events that contribute to MS susceptibility. Although the exact amount of MS’s heritability varies given the data and method used (36–38), we report that our findings can explain up to 48% of the heritability that can be estimated by using large-scale GWAS data. It is clear that these events are widely distributed across the many different cellular components of both the innate and adaptive arms of the immune system: Every major immune cell type is enriched for MS susceptibility genes. An important caveat is that many of the implicated molecular pathways, such as response to tumor necrosis factor–α and type I interferons, are repurposed in many different cell types, leading to an important ambiguity: Is risk of disease driven by altered function of only one of the implicated cell types, or are all of them contributing to susceptibility equally? This question highlights the important issue of the context in which these variants are exerting their effects. We have been thorough in our evaluation of available reference epigenomic data, but many different cell types and cell states remain to be characterized and could alter our summary. Further, interindividual variability has not been established in such reference data that are typically produced from one or a handful of individuals; thus, this issue is better evaluated in the eQTL data, where we have examined a range of samples and states in a large numbers of subjects. Overall, although we have identified putative functional consequences for the identified MS variants, the functional consequence of most of these MS variants requires further investigation.

Even where a function is reported, further work is needed to demonstrate that the effect is the causal functional change. This is particularly true of the role of the CNS in MS susceptibility; we mostly have data at the level of the human cortex, a complex tissue with many different cell types, including resident microglia and a small number of infiltrating macrophage and lymphocytes. MS variants clearly influence gene expression in this tissue, and we must now (i) resolve the implicated cell types and whether pathways shared with immune cells are having their MS susceptibility effect in the periphery or in the brain, and (ii) more deeply identify additional functional consequences that may be present in only a subset of cells, such as microglia or activated astrocytes, that are obscured in the cortical tissue level profile. A handful of loci are intriguing in that they alter gene expression in the human cortex but not in the sampled immune cells; these MS susceptibility variants deserve close examination to resolve the important question of the extent to which the CNS is involved in disease onset. Thus, our study suggests that although MS is a disease whose origin may lie primarily within the peripheral immune compartment where dysregulation of all branches of the immune system leads to organ-specific autoimmunity, there is subset of loci with a key role in directing the tissue-specific autoimmune response. This is similar to our previous examination of ulcerative colitis, in which we observed enrichment of genetic variants mapping to colon tissue (7). This view is consistent with our understanding of the mechanism of important MS therapies, such as natalizumab and fingolimod, that sequester pathogenic immune cell populations in the peripheral circulation to prevent episodes of acute CNS inflammation. It also has important implications as we begin to consider prevention strategies to block the onset of the disease by early targeting of peripheral immune cells.

An important step forward in MS genetics, for a disease with a 3:1 preponderance of women being affected, is robust evidence for a susceptibility locus on the X chromosome. Although chromosome X associations cannot be the sole explanation for the preponderance of women among MS patients, the discovery of an MS locus on the X chromosome is an exciting first step toward understanding the genetic contributions of this strong sex bias. This result also highlights the need for additional, dedicated genetic studies of the sex chromosomes in MS because existing data have not been fully leveraged (39). Future studies will also need to incorporate the interaction of the autosomal genome with factors that can affect the sex bias, such as hormones (40).

This genomic map of MS—the genetic map and its integrated functional annotation—is a foundation on which the next generation of projects will be developed. It is an important substrate with which to further dissect the genetic architecture of MS by accounting for the contribution of sex, evaluating the possibility of interaction among loci, and assessing other important factors, such as heterogeneity of effects across human populations or certain subsets of patients given the heterogeneity of this disease. In the current study, we have included individuals with either the relapsing remitting or the progressive form of MS because they are currently conceptualized to belong to the same disease spectrum. Further investigation may lead to the identification of variants that have an effect on the neurodegenerative component of MS, which is largely genetically distinct from MS susceptibility (41). Beyond the characterization of the molecular events that trigger MS, this map will also inform the development of primary prevention strategies because we can leverage this information to identify the subset of individuals who are at greatest risk of developing MS. Although insufficient by itself, an MS genetic risk score has a role to play in guiding the management of the population of individuals “at risk” of MS (such as family members) when deployed in combination with other measures of risk and biomarkers that capture intermediate phenotypes along the trajectory from health to disease (42). We thus report an important milestone in the investigation of MS and share a roadmap for future work: the establishment of a map with which to guide the development of the next generation of studies with high-dimensional molecular data to explore both the initial steps of immune dysregulation across both the adaptive and innate arms of the immune system, and second, the translation of this autoimmune process to the CNS, where it triggers a neurodegenerative cascade.

Materials and methods

Detailed materials and methods are listed in the supplementary materials (9). In brief, we analyzed genetic data from 15 GWASs of MS. For the autosomal non-MHC genome, we applied a partitioning approach to create regions of ±1 Mbp around the most statistically significant SNP. Then, we performed stepwise conditional analyses within each region to identify statistically independent effects (n = 4842). We replicated these effects in two large-scale replication cohorts: (i) nine data sets genotyped with the MS Replication Chip and (ii) eleven data sets genotyped with the ImmunoChip. Chromosomes X and Y were analyzed jointly across all the data sets, the discovery and replication. The extended MHC region was also analyzed jointly across all data sets. We further imputed HLA class I and II alleles and corresponding amino acids. Statistically independent effects in the autosomal non-MHC genome were grouped into four categories after replication: (i) genome-wide effects (GW), (ii) suggestive effects (S), (iii) nonreplicated (NR), and (iv) no replication data (ND). Narrow-sense heritability was estimated for various combinations of these effects, and the extended MHC region, to quantify the amount of the heritability our findings could explain. Next, we leveraged enrichment methods and tissue or cell reference data sets to characterize the potential involvement of the identified MS effects in the immune and central nervous system, at the tissue and cellular level. We developed an ensemble approach to prioritize genes putatively associated with the identify effects, leveraging cell-specific eQTL studies, network approaches, and genomic annotations. We performed pathway analyses to characterize canonical pathways statistically enriched for the putative causal genes. Last, we leveraged protein-protein interaction networks to quantify the degree of connectivity of the putative causal genes and identify new mechanisms of action.

The International Multiple Sclerosis Genetics Consortium

Nikolaos A Patsopoulos^1,2,3,4, Sergio E. Baranzini⁵, Adam Santaniello⁵, Parisa Shoostari^4,6,7*, Chris Cotsapas^4,6,7, Garrett Wong^1,3, Ashley H. Beecham⁸, Tojo James⁹, Joseph Replogle^2,3,4,10, Ioannis S. Vlachos^1,3,4, Cristin McCabe⁴, Tune H. Pers¹¹, Aaron Brandes⁴, Charles White^4,10, Brendan Keenan¹², Maria Cimpean¹⁰, Phoebe Winn¹⁰, Ioannis-Pavlos Panteliadis^1,4, Allison Robbins¹⁰, Till F. M. Andlauer^13,14,15, Onigiusz Zarzycki^1,4, Bénédicte Dubois¹⁶, An Goris¹⁶, Helle Bach Søndergaard¹⁷, Finn Sellebjerg¹⁷, Per Soelberg Sorensen¹⁷, Henrik Ullum¹⁸, Lise Wegner Thørner¹⁸, Janna Saarela¹⁹, Isabelle Cournu-Rebeix²⁰, Vincent Damotte^20,21, Bertrand Fontaine^20,22, Lena Guillot-Noel²⁰, Mark Lathrop^23,24,25, Sandra Vukusic^26,27,28, Achim Berthele^14,15, Viola Pongratz^14,15, Dorothea Buck^14,15, Christiane Gasperi^14,15, Christiane Graetz^15,29, Verena Grummel^14,15, Bernhard Hemmer^14,15,30, Muni Hoshi^14,15, Benjamin Knier^14,15, Thomas Korn^14,15,30, Christina M. Lill^15,31,32, Felix Luessi^15,31, Mark Mühlau^14,15, Frauke Zipp^15,31, Efthimios Dardiotis³³, Cristina Agliardi³⁴, Antonio Amoroso³⁵, Nadia Barizzone³⁶, Maria D. Benedetti^37,38, Luisa Bernardinelli³⁹, Paola Cavalla⁴⁰, Ferdinando Clarelli⁴¹, Giancarlo Comi^41,42, Daniele Cusi⁴³, Federica Esposito^41,44, Laura Ferrè⁴⁴, Daniela Galimberti^45,46, Clara Guaschino^41,44, Maurizio A. Leone⁴⁷, Vittorio Martinelli⁴⁴, Lucia Moiola⁴⁴, Marco Salvetti^48,49, Melissa Sorosina⁴¹, Domizia Vecchio⁵⁰, Andrea Zauli⁴¹, Silvia Santoro⁴¹, Nicasio Mancini⁵¹, Miriam Zuccalà⁵², Julia Mescheriakova⁵³, Cornelia van Duijn^53,54, Steffan D. Bos⁵⁵, Elisabeth G. Celius^55,56, Anne Spurkland⁵⁷, Manuel Comabella⁵⁸, Xavier Montalban⁵⁸, Lars Alfredsson⁵⁹, Izaura L. Bomfim⁶⁰, David Gomez-Cabrero^60,61,62, Jan Hillert⁶⁰, Maja Jagodic⁶⁰, Magdalena Lindén⁶⁰, Fredrik Piehl⁶⁰, Ilijas Jelčić^63,64, Roland Martin^63,64, Mirela Sospedra^63,64, Amie Baker⁶⁵, Maria Ban⁶⁶, Clive Hawkins⁶⁶, Pirro Hysi⁶⁷, Seema Kalra⁶⁸, Fredrik Karpe⁶⁸, Jyoti Khadake⁶⁹, Genevieve Lachance⁶⁷, Paul Molyneux⁶⁷, Matthew Neville⁶⁸, John Thorpe⁷⁰, Elizabeth Bradshaw¹⁰, Stacy J. Caillier⁵, Peter Calabresi⁷¹, Bruce A. C. Cree⁵, Anne Cross⁷², Mary Davis⁷³, Paul W. I. de Bakker^2,3,4†, Silvia Delgado⁷⁴, Marieme Dembele⁷¹, Keith Edwards⁷⁵, Kate Fitzgerald⁷¹, Irene Y. Frohlich¹⁰, Pierre-Antoine Gourraud^5,76, Jonathan L Haines⁷⁷, Hakon Hakonarson^78,79, Dorlan Kimbrough^3,80, Noriko Isobe^5,81, Ioanna Konidari⁸, Ellen Lathi⁸², Michelle H. Lee¹⁰, Taibo Li⁸³, David An⁸³, Andrew Zimmer⁸³, Lohith Madireddy⁵, Clara P. Manrique⁸, Mitja Mitrovic^4,6,7, Marta Olah¹⁰, Ellis Patrick^10,84,85, Margaret A. Pericak-Vance⁸, Laura Piccio⁷¹, Cathy Schaefer⁸⁶, Howard Weiner⁸⁷, Kasper Lage⁸², ANZgene, IIBDGC, WTCCC2, Alastair Compston⁶⁴, David Hafler^4,88, Hanne F. Harbo^54,55, Stephen L. Hauser⁵, Graeme Stewart⁸⁹, Sandra D’Alfonso⁹⁰, Georgios Hadjigeorgiou³³, Bruce Taylor⁹¹, Lisa F. Barcellos⁹², David Booth⁹³, Rogier Hintzen⁹⁴, Ingrid Kockum⁹, Filippo Martinelli-Boneschi^41,42, Jacob L. McCauley⁸, Jorge R. Oksenberg⁵, Annette Oturai¹⁶, Stephen Sawcer⁶², Adrian J. Ivinson⁹³, Tomas Olsson⁹, Philip L. De Jager^4,10

¹Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women’s Hospital, Boston, MA 02115, USA. ²Division of Genetics, Department of Medicine, Brigham & Women’s Hospital, Harvard Medical School, Boston, MA, USA. ³Harvard Medical School, Boston, MA 02115, USA. ⁴Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA. ⁵Department of Neurology, University of California at San Francisco, Sandler Neurosciences Center, 675 Nelson Rising Lane, San Francisco, CA 94158, USA. ⁶Department of Neurology, Yale University School of Medicine, New Haven, CT 06520, USA. ⁷Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA. ⁸John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136, USA. ⁹Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden. ¹⁰Center for Translational and Computational Neuroimmunology, Multiple Sclerosis Center, Department of Neurology, Columbia University Medical Center, New York, NY, USA. ¹¹The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2100, Denmark. ¹²Center for Sleep and Circadian Neurobiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA. ¹³Max Planck Institute of Psychiatry, 80804 Munich, Germany. ¹⁴Department of Neurology, Klinikum rechts der Isar, Technical University of Munich, 81675 Munich, Germany. ¹⁵German competence network for multiple sclerosis. ¹⁶KU Leuven Department of Neurosciences, Laboratory for Neuroimmunology, Herestraat 49 bus 1022, 3000 Leuven, Belgium. ¹⁷Danish Multiple Sclerosis Center, Department of Neurology, Rigshospitalet, University of Copenhagen, Section 6311, 2100 Copenhagen, Denmark. ¹⁸Department of Clinical Immunology, Rigshospitalet, University of Copenhagen, Section 2082, 2100 Copenhagen, Denmark. ¹⁹Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland. ²⁰ICM-UMR 1127, INSERM, Sorbonne University, Hôpital Universitaire Pitié-Salpêtrière 47 Boulevard de l’Hôpital, F-75013 Paris. ²¹UMR1167 Université de Lille, Inserm, CHU Lille, Institut Pasteur de Lille. ²²CRM-UMR974 Department of Neurology Hôpital Universitaire Pitié-Salpêtrière 47 Boulevard de l’Hôpital F-75013 Paris. ²³Commissariat à l′Energie Atomique, Institut Genomique, Centre National de Génotypage, Evry, France. ²⁴Fondation Jean Dausset – Centre d’Etude du Polymorphisme Humain, Paris, France. ²⁵McGill University and Genome Quebec Innovation Center, Montreal, Canada. ²⁶Hospices Civils de Lyon, Service de Neurologie, sclérose en plaques, pathologies de la myéline et neuro-inflammation, F-69677 Bron, France. ²⁷Observatoire Français de la Sclérose en Plaques, Centre de Recherche en Neurosciences de Lyon, INSERM 1028 et CNRS UMR 5292, F-69003 Lyon, France. ²⁸Université de Lyon, Université Claude Bernard Lyon 1, F-69000 Lyon, France; Eugène Devic EDMUS Foundation against multiple sclerosis, F-69677 Bron, France. ²⁹Focus Program Translational Neuroscience (FTN), Rhine Main Neuroscience Network (rmn2), Johannes Gutenberg University-Medical Center, Mainz, Germany. ³⁰Munich Cluster for Systems Neurology (SyNergy), 81377 Munich, Germany. ³¹Department of Neurology, Focus Program Translational Neuroscience (FTN), and Immunology (FZI), Rhine-Main Neuroscience Network (rmn2), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany. ³²Genetic and Molecular Epidemiology Group, Institute of Neurogenetics, University of Luebeck, Luebeck, Germany. ³³Neurology Department, Neurogenetics Lab, University Hospital of Larissa, Greece. ³⁴Laboratory of Molecular Medicine and Biotechnology, Don C. Gnocchi Foundation ONLUS, IRCCS S. Maria Nascente, Milan, Italy. ³⁵Department of Medical Sciences, Torino University, Turin, Italy. ³⁶Department of Health Sciences and Interdisciplinary Research Center of Autoimmune Diseases (IRCAD), University of Eastern Piedmont, Novara, Italy. ³⁷Centro Regionale Sclerosi Multipla, Neurologia B, AOUI Verona, Italy. ³⁸Fondazione IRCCS Cà Granda, Ospedale Maggiore Policlinico, Italy. ³⁹Medical Research Council Biostatistics Unit, Robinson Way, Cambridge CB2 0SR, UK. ⁴⁰MS Center, Department of Neuroscience, A.O. Città della Salute e della Scienza di Torino and University of Turin, Torino, Italy. ⁴¹Laboratory of Human Genetics of Neurological complex disorder, Institute of Experimental Neurology (INSPE), Division of Neuroscience, San Raffaele Scientific Institute, Via Olgettina 58, 20132, Milan, Italy. ⁴²Department of Biomedical Sciences for Health, University of Milan, Milan, Italy. ⁴³University of Milan, Department of Health Sciences, San Paolo Hospital and Filarete Foundation, viale Ortles 22/4, 20139 Milan, Italy. ⁴⁴Department of Neurology, Institute of Experimental Neurology (INSPE), Division of Neuroscience, San Raffaele Scientific Institute, Via Olgettina 58, 20132, Milan, Italy. ⁴⁵Neurology Unit, Department of Pathophysiology and Transplantation, University of Milan, Dino Ferrari Center, Milan, Italy. ⁴⁶Fondazione IRCCS Ca’ Granda, Ospedale Policlinico, Milan, Italy. ⁴⁷Fondazione IRCCS Casa Sollievo della Sofferenza, Unit of Neurology, San Giovanni Rotondo (FG), Italy. ⁴⁸Center for Experimental Neurological Therapies, Sant’Andrea Hospital, Department of Neurosciences, Mental Health and Sensory Organs, Sapienza University, Rome, Italy. ⁴⁹Istituto Neurologico Mediterraneo (INM) Neuromed, Pozzilli, Isernia, Italy. ⁵⁰Department of Neurology, Ospedale Maggiore, Novara, Italy. ⁵¹Laboratory of Microbiology and Virology, University Vita-Salute San Raffaele, Hospital San Raffaele, Milan, Italy. ⁵²Department of Health Sciences and Interdisciplinary Research Center of Autoimmune Diseases (IRCAD), University of Eastern Piedmont, Novara, Italy. ⁵³Department of Neurology, Erasmus MC, Rotterdam, Netherlands. ⁵⁴Nuffield Department of Population Health, Big Data Institute, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford OX3 7LF, UK. ⁵⁵Department of Neurology, Institute of Clinical Medicine, University of Oslo, Norway. ⁵⁶Department of Neurology, Oslo University Hospital, Oslo, Norway. ⁵⁷Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway. ⁵⁸Servei de Neurologia-Neuroimmunologia, Centre d’Esclerosi Múltiple de Catalunya (Cemcat), Institut de Recerca Vall d’Hebron (VHIR), Hospital Universitari Vall d’Hebron, Spain. ⁵⁹Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden. ⁶⁰Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden. ⁶¹Translational Bioinformatics Unit, NavarraBiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Navarra, Spain. ⁶²Mucosal and Salivary Biology Division, King’s College, London Dental Institute, London, UK. ⁶³Neuroimmunology and MS Research (nims), Neurology Clinic, University Hospital Zurich, Frauenklinikstrasse 26, 8091 Zurich, Switzerland. ⁶⁴Department of Neuroimmunology and MS Research, Neurology Clinic, University Hospital Zürich, Frauenklinikstrasse 26, 8091 Zürich, Switzerland. ⁶⁵University of Cambridge, Department of Clinical Neurosciences, Addenbrooke’s Hospital, BOX 165, Hills Road, Cambridge CB2 0QQ, UK. ⁶⁶Keele University Medical School, University Hospital of North Staffordshire, Stoke-on-Trent ST4 7NY, UK. ⁶⁷Department of Twin Research and Genetic Epidemiology, King’s College London, London, SE1 7EH, UK. ⁶⁸NIHR Oxford Biomedical Research Centre, Diabetes and Metabolism Theme, OCDEM, Churchill Hospital, Oxford UK. ⁶⁹NIHR BioResource, Box 299,University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge CB2 0QQ, UK. ⁷⁰Department of Neurology, Peterborough City Hospital, Edith Cavell Campus, Bretton Gate, Peterborough PE3 9GZ, UK. ⁷¹Department of Neurology, Johns Hopkins University School of medicine, Baltimore MD. ⁷²Multiple sclerosis center, Department of neurology, School of medicine, Washington University St Louis, St Louis MO. ⁷³Center for Human Genetics Research, Vanderbilt University Medical Center, 525 Light Hall, 2215 Garland Avenue, Nashville, TN 37232, USA. ⁷⁴Multiple Sclerosis Division, Department of Neurology, University of Miami, Miller School of Medicine, Miami, FL 33136, USA. ⁷⁵MS Center of Northeastern NY 1205 Troy Schenectady Rd, Latham, NY 12110, USA. ⁷⁶Université de Nantes, INSERM, Centre de Recherche en Transplantation et Immunologie, UMR 1064, ATIP-Avenir, Equipe 5, Nantes, France. ⁷⁷Population and Quantitative Health Sciences, Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106-4945 USA. ⁷⁸Center for Applied Genomics, The Children’s Hospital of Philadelphia, 3615 Civic Center Blvd., Philadelphia, PA 19104, USA. ⁷⁹Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA. ⁸⁰Department of Neurology, Brigham & Women’s Hospital, Boston, 02115 MA, USA. ⁸¹Departments of Neurology and Neurological Therapeutics, Neurological Institute, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka City, Fukuoka 812-8582 Japan. ⁸²The Elliot Lewis Center, 110 Cedar St, Wellesley MA, 02481, USA. ⁸³Broad Institute of Harvard University and MIT, Cambridge, 02142 MA, USA. ⁸⁴School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia. ⁸⁵Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia. ⁸⁶Kaiser Permanente Division of Research, Oakland, CA, USA. ⁸⁷Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women’s Hospital, Boston, 02115 MA, USA. ⁸⁸Departments of Neurology and Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA. ⁸⁹Westmead Millennium Institute, University of Sydney, New South Wales, Australia. ⁹⁰Department of Health Sciences and Interdisciplinary Research Center of Autoimmune Diseases (IRCAD), University of Eastern Piedmont, Novara, Italy. ⁹¹Menzies Research Institute Tasmania, University of Tasmania, Australia. ⁹²UC Berkeley School of Public Health and Center for Computational Biology, USA. ⁹³Westmead Millennium Institute, University of Sydney, New South Wales, Australia. ⁹⁴Department of Neurology and Department of Immunology, Erasmus MC, Rotterdam, Netherlands. ⁹⁵UK Dementia Research Institute, University College London, Gower Street, London WC1E 6BT, UK.

*Present address: Center for Computational Medicine, Peter Gilgan Centre for Research and Learning, Hospital for Sick Children (SickKids), Toronto, ON M5G 0A4, Canada.

†Present address: Vertex Pharmaceuticals, 50 Northern Avenue, Boston, MA 02210, USA.

Acknowledgments: We thank the Harvard Aging Brain Study (HABS; P01AG036694). We thank the Biorepository Facility and the Center for Genome Technology laboratory personnel (specifically S. West, S. Clarke, D. Martinez, and P. Whitehead) within the John P. Hussman Institute for Human Genomics at the University of Miami for centralized DNA handling and genotyping for this project. The IMSGC acknowledges W. Edgerly and L. Edgerly, J. Carlos and E. Carlos, M. Crowninshield, and W. Fowler and C. Fowler, whose enduring commitments were critical in creation of the Consortium. We thank the volunteers from the Oxford Biobank (www.oxfordbiobank.org.uk) and the Oxford National Institute for Health Research (NIHR) Bioresource for their participation. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health, German Ministry for Education and Research, German Competence Network MS (BMBF KKNMS). Funding: This investigation was supported in part by grants from the National MS Society (NMSS) to A.J.I. on behalf of the International MS Genetics Consortium (RG 4198-A-1 and AP-3758-A-16). Other grants include a Harry Weaver Neuroscience Scholar Award from the NMSS to P.L.D. (JF2138A1) as well as NIH grants RC2 GM093080 and R01AG036836. This work was also supported by a postdoctoral fellowship from the National Multiple Sclerosis Society (FG 1938-A-1) and a Career Independence Award from the National Multiple Sclerosis Society (TA 3056-A-2) to N.A.P. and National Multiple Sclerosis Society award AP3758-A-16. N.A.P. has been supported by Harvard NeuroDiscovery Center and an Intel Parallel Computing Center award, the U.S. National Multiple Sclerosis Society (grants RG 4680-A-1), and the NIH/NINDS (grant R01NS096212). T.A. was supported by the German Federal Ministry of Education and Research (BMBF) through the Integrated Network IntegraMent, under the auspices of the e:Med Programme (01ZX1614J), Swedish Medical Research Council; Swedish Research Council for Health, Working Life and Welfare, Knut and Alice Wallenberg Foundation, AFA insurance, Swedish Brain Foundation, and the Swedish Association for Persons with Neurological Disabilities. This study makes use of data generated as part of the Wellcome Trust Case Control Consortium 2 project (085475/B/08/Z and 085475/Z/08/Z), including data from the British 1958 Birth Cohort DNA collection (funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02) and the UK National Blood Service controls (funded by the Wellcome Trust). The study was supported by the Cambridge NIHR Biomedical Research Centre, UK Medical Research Council (G1100125), and the UK MS society (861/07); NIH/NINDS: R01 NS049477, NIH/NIAID: R01 AI059829, NIH/NIEHS: R01 ES0495103; Research Council of Norway grant 196776 and 240102; NINDS/NIH R01NS088155; Oslo MS association and the Norwegian MS Registry and Biobank and the Norwegian Bone Marrow Registry; Research Council KU Leuven, Research Foundation Flanders; AFM, AFM-Généthon, CIC, ARSEP, ANR-10-INBS-01 and ANR-10-IAIHU-06; Research Council KU Leuven, Research Foundation Flanders; Inserm ATIP-Avenir Fellowship and Connect-Talents Award; German Ministry for Education and Research, German Competence Network MS (BMBF KKNMS); and Dutch MS Research Foundation. TwinsUK is funded by the Wellcome Trust, Medical Research Council, European Union, the NIHR-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. The recall process was supported by the NIHR Oxford Biomedical Research Centre Programme; Italian Foundation of Multiple Sclerosis (FISM grants, Special Project “Immunochip” 2011/R/1, 2015/R/10); NMSS (RG 4680A1/1); and the MultipleMS EU project. We acknowledge the Lundbeck Foundation and Benzon Foundation for support (THP). This research was supported by grants from the Danish Multiple Sclerosis Society, the Danish Council for Strategic Research [grant 2142-08-0039], Novartis, Biogen (Denmark) A/S, the Sofus Carl Emil Friis og Hustru Olga Doris Friis Foundation, and the Foundation for Research in Neurology. The Observatoire Français de la Sclérose en Plaques (OFSEP) is supported by a grant provided by the French government and handled by the Agence Nationale de la Recherche, within the framework of the Investments for the Future program, under the reference ANR-10-COHO-002, by the Eugène Devic EDMUS Foundation against multiple sclerosis, and by the ARSEP Foundation. Competing interests: H.F.H. has received modest travel support and honoraria for advice or lecturing from Norwegian branches of Biogen Idec, Sanofi-Genzyme, Merck, Novartis, Roche, and Teva and a modest unrestricted research grant from Novartis, as well as research grants from the South Eastern Norwegian Health Authorities and the Research Council of Norway. D.H. has been a consultant or SAB for the following companies: Compass Therapeutics, EMD Serono, Genentech, Novartis Pharmaceuticals, Proclara Bioscience, Sanofi Genzyme, and Versant Venture. We believe none of these consulting activities or grants represent a conflict of interest for this work. M.S. received consulting fees or speaking honoraria from Biogen, Merck, Novartis, Sanofi, Teva, Roche. K.L. is a paid consultant and Chair of the Genetics Advisory Panel of Biogen (Cambridge, USA). K.L. is a cofounder and co-owner of Intomics (Copenhagen, DK). K.L. has been a paid consultant to Merck (Boston, USA) and GoldFinch Bio (Cambridge, USA). H.W. receives grant funding from the National Institutes of Health, National Multiple Sclerosis Society, Verily Life Sciences, EMD Serono, Biogen, Teva Pharmaceuticals, Sanofi, Novartis, Genentech, and Tilos Therapeutics as well as consulting fees from Genentech, Tiziana Life Sciences, IM Therapeutics, Magnolia Therapeutics, MedDay Pharmaceuticals, and vTv Therapeutics. P.d.B. owns stock in Vertex Pharmaceuticals. H.F.H. receives funds from the South Eastern Norwegian Health Authorities. Data availability: The GWASs used in the discovery phase are available in the following repositories: (i) dbGAP, phs000275.v1.p1, phs000139.v1.p1, phs000294.v1.p1, and phs000171.v1.p1; and (ii) European Genome-phenome Archive database, EGAD00000000120, EGAD00000000022, and EGAD00000000021. The MS Chip and ImmunoChip data are available from the respective EGA accession nos.: EGAS00001003216 and EGAS00001003219. The ImmVar data are available in the Gene Expression Omnibus (GEO): GSE56035. The MS PBMC data are available in GEO: GSE16214. Human Gene Atlas: http://snpsea.readthedocs.io/en/latest/data.html#geneatlas2004-gct-gz. ImmGen: http://snpsea.readthedocs.io/en/latest/data.html#immgen2012-gct-gz. The brain-related data are available in Synapse: www.synapse.org/#!Synapse:syn2580853/wiki/409844. The list of putative associated MS genes is available for public access in GeNets. (https://apps.broadinstitute.org/genets). A subset of the data are available for MS-related studies only because the parent studies consent does not permit deposition into a repository of genetic data: (i) The ANZGENE GWAS data are available by means of request to the ANZGENE Consortium; a direct request can be made through https://msra.org.au. The request should state the purpose of the study, its relation to MS, and a list of the data being requested. (ii) Sharing of individual participant data was not included in the informed consent of the Rotterdam MS study because there is potential risk of revealing participants’ identities, and it is not possible to completely anonymize the data. The Rotterdam MS data are available by email to k.kreft@erasmusmc.nl. The request should state the purpose of the study, its relation to MS, and a list of the data being requested. (iii) The Rotterdam Study control data are available to interested researchers upon request. Requests can be directed to the study’s data manager Frank J. A. van Rooij (f.vanrooij@erasmusmc.nl). The following website contains more information about this cohort: www.ergo-onderzoek.nl/wp/contact. Sharing of individual participant data was not included in the informed consent of the Rotterdam MS study, and there is potential risk of revealing participants’ identities because it is not possible to completely anonymize the data. (iv) The genetic data from MS case and controls recruited through Kaiser Permanente Division of Research and University of California, Berkeley, data are available from the Institutional Data Access/Ethics Committee at University of California, Berkeley (contact R. Harris, rharris@berkeley.edu, for researchers who meet the criteria for access to confidential data. Please reference the manuscript title and corresponding author in your communication). Corresponding summary statistics for these three GWAS studies (ANZGENE, Rotterdam, and Berkeley) are available upon request.