Fmrp targets or not: long, highly brain-expressed genes tend to be implicated in autism and brain disorders
© Ouwenga and Dougherty; licensee BioMed Central. 2015
Received: 21 August 2014
Accepted: 5 February 2015
Published: 11 March 2015
Many studies have demonstrated a robust statistical overlap between genes whose transcripts are reported as Fragile X Mental Retardation Protein (Fmrp)-binding targets and genes implicated in various psychiatric disorders, including autism. However, it is not clear how to interpret this overlap as the Fmrp protein itself is not considered to be central to all instances of these conditions.
We tested whether Fmrp binding may be a proxy for some other features of these transcripts. Reviewing recent literature on the cross-linking and immunoprecipitation (CLIP)-derived targets of Fmrp in the brain, and the literature on identifying genes thought to mediate autism and other psychiatric disorders, reveals that both appear to be disproportionately made up of highly brain-expressed genes. This suggests a parsimonious explanation—that the overlap between Fmrp targets and neuropsychiatric candidate genes might be secondary to simple features such as transcript length and robust expression in the brain. Indeed, reanalyzing Fmrp high-throughput sequencing of RNAs isolated by CLIP (HITS-CLIP) data suggests that approximately 60% of CLIP tag depth can be predicted by gene expression, coding sequence length, and transcript length. Furthermore, there is a statistically significant overlap between autism candidate genes and random samples of long, highly brain-expressed genes, whether they are Fmrp targets or not.
Comparison of known Fmrp-binding targets to candidate gene lists should be informed by both of these features.
KeywordsFMRP interactome Autism Genome-wide association
In 2011, Darnell et al. published a study on the Fragile X Mental Retardation Protein (Fmrp) that demonstrated through brute-force biochemistry and elegant informatics, a fundamental role for Fmrp in stalling of ribosomes in the brain . Included was a table of the RNAs identified as bound to Fmrp. While the authors were careful to note their analysis likely “…underestimates the true number of Fmrp-regulated mRNAs,” this table has gradually become taken as the de facto Fmrp regulon - the comprehensive set of transcripts regulated by Fmrp. Since then, it has become recurrent in the psychiatric genetics literature to examine the intersection between the risk genes of a disorder and these Fmrp targets, often demonstrating a significant overlap between the two (for example, [2-8]). However, while statistically significant, these results are difficult to interpret. Does this mean the Fmrp protein is central to all of these diseases and processes? Or is Fmrp binding serving as a proxy for some other features of the genes that may parsimoniously explain their contribution to genetic risk? Here, we test a simple alternative explanation for these Fmrp-related findings: both these Fmrp targets and genes that moderate neurocognitive traits contain a disproportionate number of long and highly brain-expressed genes.
To test whether this intersection might sometimes lead to a statistical overlap between Fmrp targets and trait-associated genes in the brain, we conducted a simple experiment examining the overlap between a set of genes involved in a neurogenetic trait unrelated to Fragile X Syndrome or psychiatric disorder. It has been recognized that, since body weight tracks with consumptive behaviors, obesity is strongly influenced by genes that are expressed in the brain . Thus, we tested the statistical overlap between reported genes for “obesity-related traits” and the reported Fmrp targets and see a statistical enrichment of a magnitude not too different from that sometimes seen in the literature for psychiatric disorder (P < 0.005,). Thus, the results of our experiment are consistent with the explanation that genes mediating any neurogenetic trait may show overlap with the reported Fmrp targets simply because both sets overlap with the set of genes highly expressed in neural tissue. Indeed, genes reported for several other neurocognitive traits also overlap Fmrp targets with nominal (P < 0.05) significance, for example, “hippocampal atrophy” (P < 0.009), “Alzheimer’s disease” (P < 0.022), and “cognitive performance” (P < 0.012).
A linear model based on transcript expression and length predicts a substantial proportion of Fmrp HITS-CLIP data
Fmrp count >1
Fmrp count >16
n = 7,207 genes
n = 1,228 genes
Fmrp count ~ transcript abundance
p < 2.2e − 16
Fmrp count ~ Cds length
p < 2.2e − 16
Fmrp count ~ transcript length
p < 2.2e − 16
Fmrp count ~ abundance + length (either)
p < 2.2e − 16
Fmrp count ~ abundance + Cds length + transcript length
p < 2.2e − 16
We found in either case that random samples of abundant transcripts (Figure 1C) or transcripts with long Cds (Figure 2C) were significantly overlapped with the SFARIdb and rDNV genes, though not to the extent of that of the Fmrp targets. To make sure our results were not particular to a single sampling, we sampled 1,000 such gene lists. Most overlapped significantly with the autism candidates (Figures 1D and 2D), though again not as significantly as the reported Fmrp targets. Very similar results can be seen by comparing a contingency table overlapping the Fmrp targets and the rDNV genes relative to all brain-expressed genes (P < 6.02e − 12) or instead calculating the contingency table for just the genes with expression in the top quantile (P < 0.0002): gene expression level alone accounts for some of the overlap between rDNV genes and Fmrp targets, but not all of it.
We have shown that the overlap between the reported Fmrp targets and at least two autism candidate gene lists can be reproduced by simply selecting for similarly long and highly expressed genes in the brain. This is consistent with long and highly brain-expressed genes also being more likely to be under selective constraint  or containing critical exons  and provides a straightforward explanation for why the Fmrp target list overlaps so frequently with sets of genes implicated in psychiatric disease by genetic studies. It is imaginable that similar features may explain why the Fmrp target list also frequently overlaps results of brain transcriptomic studies as well. This parsimonious explanation thus obviates complex hypotheses which require the Fmrp protein itself to be involved in the mechanism for many diverse disorders or different forms of ASD. Of course, mutations in Fmrp clearly do still cause Fragile X Syndrome, the most common form of monogenic ASD, and thus continued research into this protein remains important for that reason alone.
This model also provides reasonable explanations for two other puzzles about the Fmrp targets. First, it could explain why studies of the Fmrp targets in HEK cells  are less concordant with other studies  and why HEK cell data overlaps marginally if at all with psychiatric disorder candidate gene lists . The HEK cell data should be biased towards long, highly expressed genes in HEK cells, which will likely contain few neural-specific transcripts. Second, this model might explain why identifying strong cis motifs or other features in the RNA that might mediate Fmrp binding has proven challenging . Efforts to model the affinity of Fmrp for particular mRNAs will likely be aided by first removing the variance in the Fmrp CLIP data that can be explained by transcript length, Cds length, and transcript abundance. The authors of  suspected a bias towards highly expressed genes, but recognized the data were not available at that time to adjust for it, particularly if the level of Fmrp varies substantially across cell types in the brain. Thus, the definition of the Fmrp targets can probably now be revisited both with greater sensitivity and by models incorporating these covarying factors to identify additional features of the transcripts that account for the remaining variance in Fmrp binding.
In the end, it may well be that these studies find that Fmrp does bind preferentially to those transcripts whose protein levels most require precise regulation for normal CNS function. It is not unreasonable that this set of genes would also be vulnerable to haploinsufficiency [8,9] and of course be expressed highly in the brain. And a set of genes needing more precise regulation may indeed be selected by evolution to be longer (that is, allowing more potential sites for regulatory motifs). Thus, Fmrp binding may have been serving as a useful proxy for these other features. However, in the interim, we have provided a table (Additional file 4: Table S1) with precalculated weightings for length, expression, or length and expression for measurably brain-expressed genes. This can be used for drawing random samples for comparison to candidate gene lists, to help determine whether the candidate list is enriched in Fmrp targets specifically and/or long, highly brain-expressed transcripts generally.
Comparisons to GWAS and GTEX
Eight hundred forty-two Fmrp targets were identified from Supplemental Table 2 of . Genes associated from cognitive traits were downloaded from the NHGRI GWAS Catalog . Highly expressed genes in the brain were defined as the 842 genes with the highest average RPKM across all brain samples in the genotype-tissue expression (GTEX) collection  (1/31/13 data release, summarized to genes, all brain samples averaged). Statistical overlap was calculated in R using the Fisher’s exact test, right-side probability, genome size of 20,000.
All experiments involving mice were approved by the Washington University Animal Studies Committee. For each replicate, cortical dissections were performed on three C57BL/6 male mice 21 days post birth. Tissue was homogenized in standard homogenization buffer (10 μL/mL pH 7.5 tris-Cl (Invitrogen 15567–027), containing 0.25 M sucrose (IBI IB37160), 1 μl/mL RNasin (Promega N251B), SuperRNasin (Ambion AM2696), protease inhibitor cocktail Tablet 1 per 10 mL (Roche 04693132001), 1 mM tetrodotoxin citrate (Tocris Bioscience 1069), and 0.5 mM DL-dithiothreitol (646563-10X) of which was centrifuged for 10 min at 1,000 rcf. The supernatant was then treated with the addition of 10X lysis buffer (10% IGEPAL (Sigma I8896-50ML), 300 mM DHPC (Avanti 850306P), 100 mM HEPES (Sigma H0887), 1.5 M KCl (Ambion AM9640G), and 50 mM MgCl2 (Ambion AM9530G)) for 10 min and centrifuged again for 15 min at 20,000 rcf. RNA was collected from 60 μL of supernatant on QIAGEN RNeasy MINI Kit (74106) with 2-mercaptoethanol (Sigma M7522) and DNase treatments (Qiagen 79254). Sequencing libraries were amplified (21 cycles) using Nugen Amplification Kit Ovation® RNA-Seq System V2 (7102). Standard Illumina adapter ligation, library preparation, and sequencing were performed on an Illumina Hi-seq by the Genome Technology Access Center at Washington University in St. Louis. Resulting reads were trimmed for quality and contaminating adapters. Possible rRNA contamination was filtered out by aligning with Bowtie2 to rRNA sequences from GenBank, ENSEMBL, and UCSC’s RepeatMasker track. Remaining sequences were then mapped to the Ensembl 75 mouse genome. Counts per million reads (CPM) for each gene were quantified using HTSeq. Data represent the average of three biological replicates.
Comparisons to length and expression
We then intersected this data with Supplemental Table 2C of Darnell et al. for all genes with a matching gene symbol and extracted Fmrp high-throughput sequencing of RNAs isolated by cross-linking immunoprecipitation (HITS-CLIP) tag count (Fmrp.sum), Cds length, and transcript length. For sampling analyses, we used all genes with measurable expression in the brain (logCPM > 2 in RNAseq data), as only brain-expressed genes could have been captured by a brain HITS-CLIP experiment and further filtered to keep only those genes with an annotated Cds and transcript lengths (final effective genome size = 9,544). All variables were converted to Log2 scale for normality prior to correlation and linear regression, and for these analyses genes with <1 CLIP tag were excluded (final gene number, 7207). A spreadsheet aggregating all of these variables is provided (Additional file 4: Table S1).
Candidate gene lists
For the SFARIdb analysis, we used the list of all unique genes (gene-score table, as downloaded on 8/7/14); rDNV genes are from Supplemental Table seven from , the dnv_LGDs_prb column.
Sampling random gene sets
To generate sets of 716 random genes with the same distribution of expression as the 716 Fmrp genes surviving the filters above, we computed a kernel density estimate on the logCPM (function ‘density’ in R) as well as a kernel density estimate on all 9,544 genes and used the ratio of these to assign probabilities for sampling to all 9,544 genes in the genome based on their expression levels. A similar sampling was done based using a kernel estimated from the Cds length or a 2d kernel (function kde2d) on both length and expression. Fisher’s tests were calculated as above for overlap between sampled lists with a genome size of 9,544. For !Fmrp lists, sampling was conducted on the 8,828 non-Fmrp genes, but with the same probabilities as above, or taking those of the 8,828 with the highest probabilities (Top !Fmrp). All sampling is without replacement.
fragile X mental retardation protein
cross-linking and immunoprecipitation
Simons Foundation Autism Research Initiative
counts per million reads
genome-wide association study
Thanks to R. Sears, M. Rieger, and D. O’ Brien for helpful discussions on this topic. J.D.D is supported by NIH (R21MH099798, DA038458-01, and R01MH100027), and the Children’s Discovery Institute of Washington University (MDII2013269), and R.O. by an NIH training grant (2 T32 GM081739). The Genome Technology Access Center is supported by P30 CA91842 and by ICTS/CTSA Grant UL1TR000448 from the NIH.
- Darnell JC, Van Driesche SJ, Zhang C, Hung KY, Mele A, Fraser CE, et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell. 2011;146(2):247–61.View ArticlePubMed CentralPubMedGoogle Scholar
- Devys D, Lutz Y, Rouyer N, Bellocq JP, Mandel JL. The Fmr-1 protein Is cytoplasmic, most abundant in neurons and appears normal in carriers of a fragile X premutation. Nat Genet. 1993;4(4):335–40.View ArticlePubMedGoogle Scholar
- Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5.View ArticleGoogle Scholar
- Yeo GSH, Heisler LK. Unraveling the brain regulation of appetite: lessons from genetics. Nat Neurosci. 2012;15(10):1343–9.View ArticlePubMedGoogle Scholar
- Suhl JA, Chopra P, Anderson BR, Bassell GJ, Warren ST. Analysis of FMRP mRNA target datasets reveals highly associated mRNAs mediated by G-quadruplex structures formed via clustered WGGA sequences. Hum Mol Genet. 2014;23(20):5479–91.View ArticlePubMedGoogle Scholar
- Iossifov I, O’Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515(7526):216–21.View ArticlePubMedGoogle Scholar
- Shohat S, Shifman S. Bias towards large genes in autism. Nature. 2014;512(7512):E1–2.View ArticlePubMedGoogle Scholar
- Samocha KE, Robinson EB, Sanders SJ, Stevens C, Sabo A, McGrath LM, et al. A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014;46(9):944–50.View ArticlePubMed CentralPubMedGoogle Scholar
- Uddin M, Tammimies K, Pellecchia G, Alipanahi B, Hui PZ, Wang ZZ, et al. Brain-expressed exons under purifying selection are enriched for de novo mutations in autism spectrum disorder. Nat Genet. 2014;46(7):742–7.View ArticlePubMedGoogle Scholar
- Ascano Jr M, Mukherjee N, Bandaru P, Miller JB, Nusbaum JD, Corcoran DL, et al. FMRP targets distinct mRNA sequence elements to regulate protein expression. Nature. 2012;492(7429):382–6.View ArticlePubMed CentralPubMedGoogle Scholar
- Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature. 2014;506(7487):185–90.View ArticlePubMed CentralPubMedGoogle Scholar
- Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(D1):D1001–6.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.