In this study, we performed a pilot sequencing analysis aimed at identifying potential genetic risk factors for autism in a large pedigree, focusing on inherited mutations. We attempted multiple complementary analytical approaches, each of which identified one to a few candidate genes. We were not able to confirm specific disease-causing mutations with certainty, but we uncovered multiple rare mutations unique to the family, as well as several candidate genes that harbor suspected deleterious coding or non-coding mutations. Among them, based on prior literature, ANK3 is a highly plausible candidate gene that may increase the susceptibility to ASDs in this family. Given that autism is a complex neuropsychiatric disease, it is likely that multiple contributing variants in the family may increase susceptibility; therefore, even if a specific candidate gene does contribute to disease risk, we caution that a single candidate gene may not be entirely responsible (that is, necessary and sufficient) for the genetic risk of autism in this pedigree. Although our findings are restricted to this specific family, these new candidates can certainly be evaluated in future sequencing studies to establish their true relevance to autism susceptibility.
We applied a whole genome sequencing strategy to reveal specific genetic mutations that may confer susceptibility to ASDs in one single family, and these results can also be compared to exome sequencing studies on schizophrenia, ADHD, and other neurodevelopmental disorders. A recent study revealed that de-novo mutation rate might play a major role in schizophrenia, and a large excess of non-synonymous changes were identified by whole exome sequencing from 53 sporadic cases, 22 unaffected controls, and their parents . In another study on schizophrenia, four of the 15 identified de-novo mutations in eight probands were nonsense mutations . In a previous small-scale exome sequencing study screening attention deficit/hyperactivity disorder (ADHD) genes on a multiplex pedigree, multiple rare coding variants were identified but were not prioritized based on bioinformatics predictions . In comparison, our study specifically identified rare and family-specific variants rather than de-novo mutations.
We initially focused on inherited mutations that are likely to be recessive, which shares some similarity with a very recent exome sequencing study on ASD families enriched for inherited causes due to consanguinity . Other studies have focused on sporadic mutations in families where the parents have been characterized as most likely ‘unaffected’ with autism [17–22], and several observations support the hypothesis that the genetic basis for ASDs in sporadic cases may be different from that seen in families with multiple affected individuals, with some of the former possibly more likely to result from de-novo mutation events rather than inherited variants. For an approach complementary to ongoing exome sequencing studies aiming to detect de-novo mutations in ASDs [17–22], we specifically selected a multiplex family to test our ability to find inherited mutations that increase risk for ASDs.
In addition to finding inherited mutations, one unique aspect of our study is the use of whole-genome sequence data, which enabled us to perform exploratory analysis on non-coding variants. Given the far larger number of candidate non-coding variants than coding variants, we had to apply highly stringent filtering criteria to focus on those that are most likely to be functionally relevant. These include the use of bioinformatics predictions from evolutionary constraint , as well as experimental evidence from the ENCODE project . As our knowledge and bioinformatics approaches for non-coding variants may improve in the future, we may be able to better interrogate the sequencing data to identify disease causal non-coding variants.
We also need to emphasize that previous studies all used the Illumina platform, yet our study used the CG platform, which represents a different type of sequencing technology  and generates vastly different types of output files for downstream analysis. As the Illumina platform uses open data formats, a variety of academic and commercial tools have been developed to analyze data from the Illumina sequencers and improve variant calls; in comparison, the CG platform takes a proprietary, ‘black-box’ approach, so that researchers generally have to rely on variant calls and associated quality scores provided by CG. A recent study has comprehensively compared these two platforms and identified that 12% of the called variants are discordant between platforms, yet >60% of these discordant variants were indeed present in the genome based on Sanger validation . Another recently published study also compared data from the 1000 Genomes Project and Complete Genomics, and demonstrated that 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset . Therefore, current sequencing studies on neuropsychiatric diseases, including ours, may all suffer significantly from false-negative variant calls, and may miss a portion of disease causal variants. Combining data from orthogonal platforms may partially reduce this problem, although this will result in higher sequencing and analytical cost.
In the current study, we first made the assumption that the ASD in the pedigree might be caused by a just a handful of mutations with high penetrance, and under such a model we were able to identify a list of possible such candidate genes. However, in practice, there may be a spectrum of diseases manifesting in each individual, with an as-yet-unknown balance of oligogenic and polygenic modes of inheritance. So, the approaches that we used were somewhat ad hoc, and we were unable to generate statistical support for these candidate genes. Indeed, the appropriate statistical threshold to determine functional relevance, in the context of prior biological knowledge, is not well developed. In summary, our study represents one of the first examples demonstrating the feasibility of whole genome sequencing for familial samples and analyzing inherited mutations on ASDs. Ultimately, we believe that studies focusing on de-novo or inherited mutations can complement each other, and reveal a more comprehensive picture of susceptibility to ASDs, once sufficient sample sizes have been reached by the community.