Behavioral signatures related to genetic disorders in autism

Background Autism spectrum disorder (ASD) is well recognized to be genetically heterogeneous. It is assumed that the genetic risk factors give rise to a broad spectrum of indistinguishable behavioral presentations. Methods We tested this assumption by analyzing the Autism Diagnostic Interview-Revised (ADI-R) symptom profiles in samples comprising six genetic disorders that carry an increased risk for ASD (22q11.2 deletion, Down’s syndrome, Prader-Willi, supernumerary marker chromosome 15, tuberous sclerosis complex and Klinefelter syndrome; total n = 322 cases, groups ranging in sample sizes from 21 to 90 cases). We mined the data to test the existence and specificity of ADI-R profiles using a multiclass extension of support vector machine (SVM) learning. We subsequently applied the SVM genetic disorder algorithm on idiopathic ASD profiles from the Autism Genetics Resource Exchange (AGRE). Results Genetic disorders were associated with behavioral specificity, indicated by the accuracy and certainty of SVM predictions; one-by-one genetic disorder stratifications were highly accurate leading to 63% accuracy of correct genotype prediction when all six genetic disorder groups were analyzed simultaneously. Application of the SVM algorithm to AGRE cases indicated that the algorithm could detect similarity of genetic behavioral signatures in idiopathic ASD subjects. Also, affected sib pairs in the AGRE were behaviorally more similar when they had been allocated to the same genetic disorder group. Conclusions Our findings provide evidence for genotype-phenotype correlations in relation to autistic symptomatology. SVM algorithms may be used to stratify idiopathic cases of ASD according to behavioral signature patterns associated with genetic disorders. Together, the results suggest a new approach for disentangling the heterogeneity of ASD.


Background
Autism spectrum disorder (ASD) is a behaviorally defined syndrome characterized by variable abnormalities in social interactions and communication, in association with restricted interest patterns and unusual stereotyped behaviors. There has been a concerted effort over the last 20 years to identify causal genetic risk factors and as a result, an increasing number of rare, highly penetrant genetic variants are being implicated [1]. When present, these rare variants are thought to account for a large proportion of an individual's genetic liability to the condition. Currently, specific genetic etiologies, including rare single nucleotide and copy number variants (CNVs) as well as larger chromosomal variations, can be identified in around 15 to 20% of patients [2][3][4][5]. These findings highlight the complexity of the genetic architecture and heterogeneity of ASD and indicate that by using standard case-control designs, extremely large sample sizes will be required to unravel the heterogeneity and map the dysregulated signaling pathways involved in the pathophysiology of ASD [4,[6][7][8][9].
The variability in phenotypic expression of autism observed in monozygotic twin pairs, coupled with the evidence from molecular genetic studies supporting a polygenic multi-factorial liability model has led to the recognition that the many genetic risk factors for autism give rise to a broad spectrum of behavioral presentations and hence the concept of autism as a spectrum disorder. The adoption of this model has led to an implicit assumption that specific genotype-phenotype correlations are unlikely to exist. However, there is evidence that ASD symptoms may be dissociable at the genetic level. Different genetic linkage regions have been obtained for social interaction and repetitive behavioral domains in ASD patients [10], and distinct developmental trajectories of social and repetitive behavior exist in the ASD population [11]. Moreover, in recent years, a growing interest has developed in the possibility that particular genetic disorders may give rise to characteristic patterns of autistic symptomatology. This interest is based on the assumption that perturbations in associated pathophysiological pathways would lead to relatively constrained and more specific phenotypic outcomes [12]. Indeed, a number of recent studies, involving a variety of genetic conditions including 16p11.2 and 7q11. 23 CNVs, Williams syndrome, fragile X syndrome and neurofibromatosis, have indicated the existence of genetic disorder-specific behavioral profiles that encourage further efforts in this direction [4,[13][14][15][16]. Building on these findings, we postulated that well-defined genetic conditions could give rise to relatively distinct patterns of autistic symptomatology. The designation of these patterns may be relevant to dissect ASD heterogeneity as other risk factors that perturb converging pathophysiological pathways, for example related to the genetic conditions, might lead to similar patterns of autistic symptomatology.
In the present study, we have undertaken a proof of concept study to determine if these genotype-phenotype correlations exist and whether they could be useful to disentangle the heterogeneity of ASD and complement future genetic studies. Support vector machine (SVM) learning was used to analyze 'signatures' of autistic symptomatology in six genetic developmental disorders associated with an increased risk for ASD [17][18][19][20]. Based on the premise that other risk factors which dysregulate the same pathways may give rise to similar 'signature' patterns of behavior, we aimed to apply the SVM algorithms derived from genetic disorders to cases of idiopathic ASD. Finally, we investigated whether the SVM algorithm would detect enhanced behavioral similarity in affected sib pairs from the Autism Genetics Resource Exchange (AGRE) multiplex families. Figure 1 provides an overview of the different steps involved in the study.

Subjects
The six genetic disorders we included in the study were: 22q11.2 deletion syndrome (22q11DS), Down's syndrome (DS) [21], Prader-Willi syndrome (PWS), supernumerary marker chromosome 15 (SMC15), tuberous sclerosis complex (TSC) and Klinefelter syndrome (XXY); total n = 322 cases, groups ranging in sample size from 21 to 90 cases. Cases were recruited through patient associations/ charities or centers for clinical genetics or pediatrics as part of a collaborative effort between the Department of Psychiatry of the University Medical Centre in Utrecht in the Netherlands and the Institute of Psychiatry, King's College London in the UK. Appropriate local ethical board approval was obtained (Medical Research Ethics Committee, METC, of the University Medical Centre in Utrecht and the College Research Ethics Committee, CREC, in London). Informed consent for each participant in the cohorts was obtained and included the use of data for the analysis we carried out for this paper. The genetic disorders had been diagnosed through clinical genetic centers and confirmed by routine molecular and cytogenetic analysis. The total sample consisted of 322 verbal subjects. Each of the six genetic disorders has previously been shown to be associated with an increased risk of ASD [6,7,[22][23][24][25]. The cases were drawn from studies that had originally been designed to elucidate the behavioral phenotypes associated with each of the six genetic disorders [22][23][24][25][26][27]. As far as possible, the samples were ascertained without reference to the presence of ASD. For more details on recruitment procedures and inclusion criteria for the genetic disorder subtypes please see previous publications [22][23][24][25][26]. All subjects were included in the analyses, regardless of the presence of an ASD diagnosis, in order to evaluate the widest range of symptom profiles. However, for technical reasons concerning the measurement of ASD symptomatology, only verbal individuals were included in the analyses. Estimates of intellectual abilities were available for the majority of subjects (>80%) and had been assessed by different standardized measures according to age and ability level [28][29][30][31][32]. Table 1 shows the sample characteristics.
The AGRE database was used for the selection of idiopathic subjects (http://www.agre.org) [33,34]. AGRE cases were included in the analyses if they fulfilled Autism Diagnostic Interview-Revised (ADI-R) criteria for an ASD and complete ADI-R algorithm data were available (see criteria). All verbal simplex probands in the AGRE cohort with complete ADI-R algorithm data and scoring above the ASD threshold (n = 375) were assigned the label ' AGRE0'. Among the multiplex families we identified all verbal affected sib pairs. Within these affected pairs one sib was allocated to ' AGRE1' while the other was allocated to ' AGRE2'. Therefore, AGRE1 and AGRE2 consisted of those verbal subjects with ASD with at least one related verbal sibling with ASD (both n = 433).

Measures
Autism symptom variables were extracted from the ADI-R which was used to interview the parents of each subject [35]. The ADI-R is an established interview schedule for assessing autism diagnoses but may also be used to assess profiles of autistic symptomatology [36,37], and as phenotype variables in large genetic population studies of ASD [38][39][40][41]. The interview focuses on identifying key symptoms that characterize the syndrome [12,36,37]. A subset of 37 items from the ADI-R is used to create a diagnostic algorithm, which documents behaviors reported between the 4th and 5th birthday, regarded as the optimal window to detect ASD. As a consequence, the use of the diagnostic algorithm data minimalizes the possible confound of age-related developmental effects on symptomatology. ADI-R items are scored as: 0, no ASD behavioral symptom present; 1, specified behavior definitely present but not clearly enough to warrant a code of 2; or 2, specified ASD symptom definitely present. In addition, for some items a code of 3 is given, if the behavior impacts markedly on or disrupts family life. Accordingly, when computing the algorithm scores, a code 3 is recoded as a 2. For this study, we used these algorithm scores, with a range of 0 to 2 instead of 0 to 3, to assign equal weight to all items entered in the analyses. Because certain symptoms of the communication impairments characterizing ASD can only be observed in verbal individuals, there are separate scores for verbal and non-verbal individuals. An overview of the description of the ADI-R items and the ADI-R domains of the algorithm is provided in Table 2. The classification of an ASD in this study was based on ADI-R criteria used in genetic studies and the AGRE collection: ASD is diagnosed when scores in all domains are met or when scores are met in two core symptom domains, in addition to the 'age of onset' domain, but are one point away from meeting autism criteria in the one remaining core symptom domain [35,42]. Reliability of the ADI-R in a population with mild to moderate mental retardation has been established [43].

Statistical analysis
Standard principal component analysis (PCA) of ADI-R item scores was used to investigate the extent of overlap between the symptom profiles of the different genetic groups.
The SVM method was used as a supervised learning method (incorporating the knowledge of the genotype) to classify genotype membership on the basis of ADI-R item scores. SVM is currently one of the most popular machine learning methods used in data mining, due to its firm theoretical foundation and proven superiority in applications. With regards to SVM, a radial basis kernel function was used, with optimal gamma and cost parameter values determined in a nested n-fold or, equivalently, leave-one-out cross-validation (LOOCV) procedure, n being the number of observations in the sample. Each observation in turn was left out of the sample, and an SVM classifier was optimized and built on the remaining n − 1 observations. In this way, an independent assessment of correctness of the predicted class can be achieved for each observation in the (See figure on previous page.) Figure 1 Overview of the different steps undertaken in the study.
Step 1: development of SVM classifier to assess the presence and strength of behavioral signatures among genetic syndromes.
Step 2: application of the classifier derived in step 1 to AGRE samples to test if similarity in behavioral signatures can be detected among idiopathic ASD subjects.
Step 3: application of classifier derived in step 1 to sibling pairs with idiopathic ASD (AGRE) to test relative familiality of behavioral signatures derived from genetic syndromes. AGRE, Autism Genetics Resource Exchange; ASD, autism spectrum disorder; SVM, support vector machine. sample, resulting in an independent estimate of the accuracy of SVM on the whole sample. In each one of the remaining samples, the optimization with respect to the gamma and cost parameter was achieved by applying a second LOOCV procedure, in which each of these n − 1 observations in turn was left out of the sample and SVM models were fitted to the remaining n − 2 observations, using a grid of combinations of gamma and cost parameter values. In a similar fashion as described above, accuracy was determined for every combination of gamma and cost parameter values on the grid, and the optimal value of gamma and cost parameter was determined as the one giving the highest accuracy. Finally, an SVM model was fitted to the n − 1 observations remaining in the outer loop using these optimal values. SVM by nature is a method for binary (two group) classification, so a multiclass (k classes) extension was used, based on the 'one-against-one' approach, in which k(k − 1)/2 binary classifiers are trained; the appropriate 'predicted' class is found by a voting scheme, choosing the most frequently assigned class by the binary classifiers. Thus, the class assigned by SVM is the one with the maximum votes from all one-versus-one (2-group) classifications, based on the decision values of the 2-group classifiers. These decision values can also, post hoc, be used to obtain a predicted probability for each class, which can be used as outcome parameters to evaluate the confidence of SVM predictions.
The software used was the libSVM program, implemented through the SVM function in the e1071 library in R [44].

Identification of behavioral signatures relating to each genetic disorder
As a starting point, we explored the distribution of autism symptom profiles in the genetic disorder sample by PCA. The PCA plot showed that, on average, some genetic disorder profiles were overlapping where others were more clearly separable ( Figure 2). This picture indicated that unsupervised statistical analysis was not sufficiently sensitive to optimally distinguish genetic disorder-related profiles. This notion was confirmed following cluster analysis (k-means clustering) of the ADI-R data in the genetic disorder sample, which did not identify any relevant clusters (data not shown).
To perform a more sophisticated pattern analysis, we turned to machine learning analysis. We used SVM as a supervised learning method to investigate genotypephenotype relationships between the six genetic disorders and the item scores from the ADI-R algorithm. The essential difference with the unsupervised PCA or clustering analysis used above is that the SVM approach incorporates the knowledge of the genotype in the analysis. The SVM allocations to genetic disorder groups occurred in two steps. First, the SVM analyzed 2-group, 'one-against-one' comparisons. Subsequently, the multiclass extension was used to select the most appropriate 'predicted' genetic disorder class for each subject on the basis of the most frequently assigned class by the binary classifiers. The binary one-by-one comparisons showed high accuracies of up to 97% of correct genetic group allocations ( Table 3). As a result, a total of 63% of cases was correctly allocated by the multiclass comparison using the LOOCV method, whereas random prediction (without prior knowledge of genetic group) would have resulted in 21% accuracy (Table 4). Interestingly, in all groups apart from DS, the averages of the post-hoc predicted probabilities were highest for the corresponding genetic disorder class, indicating that the SVM algorithm was able to predict correct disorder classes with a high degree of confidence (Table 4).
To further evaluate the validity of the prediction model, we investigated the correlation between the predicted probabilities and the proportion of cases correctly assigned to each genetic group, based on LOOCV output. This tests the expectation of the model that higher probabilities reflect greater confidence in prediction, as shown by increasing 'correctness' in classification. We observed a significant correlation (P = 0.002) between the predicted probabilities and the likelihood of correct classification, which provides support for the robustness of the model and encouraged us to test the classifier in further samples.
We were interested to identify which behaviors contributed most to the predictions by SVM. Therefore, the importance (weight) of each of the ADI-R items to the SVM classifier was extracted. The result of this analysis showed that four of the top five most influential items pertained to ASD symptoms that related to the quality of social interaction (Table 5). By contrast, the five least influential items were more concerned with aberrant communication and repetitive behaviors.
It was notable that the predicted probabilities in SMC15 cases were also relatively high for prediction to the PWS group. This seemed plausible, as both disorders are associated with differences in the 'dosage' of genes located in chromosome 15q11-13. By contrast, SMC15 could be clearly discriminated from 22q11DS by SVM, which corresponded with a lack of overlap in the PCA between these two groups ( Figure 2). Interestingly, SMC15 and 22q11DS are both characterized by low average intelligence, suggesting that the behavioral differences are independent of general intellectual ability. To rule out the influence of IQ on prediction accuracy, we re-analyzed the data, including IQ as an additional predictor. The average accuracy of the SVM predictions was essentially unchanged (63.0% versus 62.5%), indicating that IQ was not a confounding factor. The poor prediction for the DS group was due to a frequent misallocation to the PWS group; 17 of the DS cases were being incorrectly assigned to the PWS group. Indeed, an overlap between DS and PWS groups was also apparent in the PCA of the symptom profiles (Figure 2).
We also tested the accuracy of SVM class assignment among the subset of individuals who scored above the ADI-R threshold for ASD (n = 123). This resulted in similar assignment accuracies and predicted probabilities (data not shown). In subsequent analyses we used the algorithm derived from all patients from our genetic disorder samples, irrespective of whether they met formal criteria for ASD diagnosis, since from a clinical perspective, we also wanted to include the profiles of subjects who scored below ADI-R thresholds for ASD.

Testing the SVM classification algorithm in idiopathic ASD
Next, we considered whether the genetic disorder algorithm could detect a degree of similarity in patterns of autistic behavior in a sample of 'idiopathic' cases. To test  this hypothesis, we applied the algorithm to ADI-R data obtained from the AGRE dataset in order. It should be noted that the AGRE sample functioned as a 'blind' sample in this context, as we could not validate the outcome with genetic labels. Therefore, we performed analyses to indicate if the algorithm would detect meaningful associations or if these would not differ from random associations, for example not informed by genetic disorder labels. Thus, we generated randomly permuted ADI-R item data from the AGRE0 dataset and compared the distribution of predicted probabilities in the real (AGRE0 and genetic disorder sample) compared to the randomly generated data. The probabilities differed significantly between these groups. As expected, the highest predicted probabilities were observed among the genetic disorder cases. Indeed, the lowest probabilities were observed in the randomly generated AGRE subsample. There was also a significant difference between the genetic groups and AGRE0 (P = 0.0024), between the genetic groups and random data (P <0.001) and between AGRE0 and random data ( Figure 3). Most importantly, the probabilities in AGRE0 were significantly higher than those in the randomly configured data (P <0.001). This indicated that the algorithm derived from the genetic disorders detected non-random pattern information. Subsequently, we applied the genetic disorder classifier to the AGRE0 sample to analyze the distribution of genetic disorder allocations in the blind AGRE subsamples. The genetic disorder algorithm assigned the highest probabilities and most cases to the TSC group and the lowest probabilities and fewest cases to the DS and PWS groups. We observed a similar distribution of SVM predicted probabilities in the AGRE1 and AGRE2 samples, essentially replicating the result obtained for AGRE0. Again, TSC was by far the most commonly assigned class, whereas DS and PWS were the least frequently assigned classes. The predicted probabilities and group predictions for AGRE0, AGRE1 and AGRE2 are summarized in Table 6. It should be noted that these predictions were achieved by forcing all individuals into one of the six categories, which means that frequent allocation should be interpreted as indicative of relative phenotype  similarity. As such, the application of the genetic disorder classifier to AGRE samples seemed to indicate enhanced relative similarity of AGRE profiles to the TSC group. To support this notion, we plotted the AGRE0 ADI-R profiles in the PCA plot of the genetic disorder sample, which confirmed that, on average, the TSC group displayed most similarity to AGRE0 (Figure 4). In addition, 22q11DS, SMC15 and XXY groups also displayed some closeness to AGRE0, which seems also reflected in their occasional allocation by the genetic disorder classifier. We contrasted these predictions in the AGRE sample with random predictions; we generated SVM models by randomly permuting the six labels relating to the genetic disorders. Thus, random genetic labels were linked to the existing symptom profiles, thereby destroying the original relationship between ADI-R score profiles and the genetic groups. By analyzing the allocations arising from these random classifier algorithms, we could check which distribution of allocation would arise by chance, that is not informed by existing genetic disorder profiles. We repeated this exercise 1,000 times in order to gain robust results. The results showed that most were assigned to the 22q11DS and PWS groups. This result was most likely due to the fact that these disorders were the two largest groups in the genetic disorder sample. It should be noted that this result was strikingly different than the allocation in AGRE by the randomly permuted genetic labels.
Together, these analyses on blind AGRE samples indicated that the algorithm of the genetic disorder sample could detect an extent of relative similarity in ADI-R profile patterns among idiopathic subjects.

Behavioral signatures in sibling pairs with idiopathic ASD
To test our expectation that the signature patterns derived from the genetic disorders relate to genotype-phenotype associations, we hypothesized that the affected sib (sibling) would be significantly more often assigned to the same genetic disorder class and be relatively more similar in their behavioral profile than non-related subjects. To test this, we examined the concurrence in class assignment (Xsquare) and correlation between affected sib pairs in the SVM assigned class and predicted probabilities.
Significant dependence between the class assignment of siblings in AGRE1 and the other sibling in AGRE2 was indicated (X-squared = 43, df = 25, P = 0.015). Furthermore, the predicted probabilities for the assigned class in AGRE1 (sib1) were significantly correlated with the predicted probabilities of their affected sibling AGRE2 (sib2) (Pearson's correlation r = 0.20, P <0.001) ( Figure 5). To exclude the possibility that these correlations were driven by severity rather than specificity of ADI-R profiles, we found that the severity of the proband symptom scores did not predict the predicted probability of its sibling, while the predicted probability scores did predict the probability score of the sibling (sibling 1 as predictor of sibling 2: mean items score P = 0.18; probability score P = 1.5e-05; sibling 2 as predictor of sibling 1: mean items score P = 0.86; probability score P = 7e-05).
Interestingly, the correlation in prediction probabilities was driven by a correlation (r = 0.35) between sib pairs assigned to the same class compared with 'discordant' sibs (r = −0.18), that is sibling pairs that had not been assigned to the same class. In addition, we found that the covariance in probabilities between sibs was greater when both sibs were assigned to the same genetic disorder class (Ftest for equality of variances of the difference in probability, P <0.001). To confirm the notion of enhanced behavioral similarity between siblings allocated to the same genetic disorder class, we examined the ADI-R scores directly. We used the first principle component (PC1) of the ADI-R scores as a summary measure. Overall (disregarding genetic disorder class), the PC1s of sibs were not significantly correlated (r = 0.081, P = 0.089), but when split out for concordance of genetic disorder prediction, the correlations were 0.71 and −0.16 for concordant sibs and discordant sibs, respectively, with P <0.001 for 'concordant' versus 'discordant' sibs. Overall, the sibling analysis indicated that the familial liability to ASD may be partitioned according to the relative likelihood of disturbance related to certain genetic disorders.

Discussion
This study demonstrates that patterns of autistic symptomatology can be associated with specific genetic disorders. There has been much speculation that such genotypephenotype correlations exist but so far only limited evidence Figure 3 SVM predicted probabilities of the original genetic groups, AGRE0 singleton dataset and randomly generated scores for the AGRE0 singleton dataset. Mean SVM probabilities differed significantly between the genetic groups and AGRE0 (P = 0.0024), between the genetic groups and random data (P <0.001) and between AGRE0 and random data (P <0.001). SVM, support vector machine.
to support the conjecture. Our results are consistent with findings from animal research and suggest that different pathophysiological pathways underlie certain behavioral deficits [4,45].
The current study is the first to test the specificity of genetic behavioral phenotypes using a machine learning paradigm. The ADI-R algorithm items comprised a comparatively small number of symptom features, yet we used this small set of items to classify our cases. The total number of correct allocations (63%) was substantial given the fact that five groups were compared. Indeed, this result was derived from one-by-one genetic disorder comparisons, in which strong contrast were evident. It was notable, however, that the SVM algorithm derived from the current sample differentiated between some classes better than others. This variability might be explained by the variation in sample sizes; thus, in future larger samples will need to be investigated. It was also notable that the ratings of the pattern of social dysfunction were among the best contributors to class prediction,   Figure 4 PCA plot of ADI-R profiles of subjects in the genetic disorder sample, with the AGRE0 subsample inserted. PC2 is the dimension with the most differentiating contrast among the genetic disorder groups. AGRE0, on average, has negative values on PC1 and is around 0 on PC2. The TSC group (5) is also on average 0 on PC2 similar to AGRE0 and has the most negative average on PC1. Groups 1, 4 and 6 also display some closeness to AGRE0. Colors/numbers/letters denote genetic disorder subgroups. 1, 22q11.2 deletion syndrome; 2, Down's syndrome; 3, Prader-Willi syndrome; 4, supernumerary marker chromosome 15, 5, tuberous sclerosis complex, 6, Klinefelter syndrome; A, AGRE0. ADI-R, Autism Diagnostic Interview-Revised; PCA, principal component analysis; TSC, tuberous sclerosis complex. raising the possibility that particular styles of social impairment may be related to particular genetic risk factors. Although differences in the typology of social impairments have been noted in ASD [46], differences in the types of social impairment have not been studied in detail and are only partially captured by the ADI-R items. For instance, social avoidance is commonly reported in fragile X syndrome, as another example of social behavioral specificity within a genetic disorder associated with ASD [47,48]. It seems likely that with the incorporation of more symptoms and other phenotypic features, such as the presence of comorbid behavioral problems like those associated with ADHD [49], the ability to assign cases to specific classes of genetic disorder may be improved. The inclusion of other conditions such as fragile X syndrome may also help further map the patterns of genotype-phenotype correlations. Together, these extensions may reveal further contrasts or overlaps between genetic disorders that are biologically meaningful. For instance, it was already interesting that the prediction probabilities for SMC15 were similar to those for PWS. Both disorders are associated with abnormalities in the dosage of genes located in the 15q11-13 region and likely lead to perturbations in similar pathophysiological pathways.
The subjects of this study were included because they were ascertained for the presence of a genetic disorder and were assessed regardless of the presence or absence of behavioral concerns. Although this approach is likely to have minimized ascertainment biases, some bias cannot be ruled out. However, any enrichment of behavioral abnormalities in these cohorts is unlikely to give rise to the specific patterns of associations identified here. It was reassuring in this respect that the algorithm derived from all cases in the genetic disorder samples gave comparable results to the analyses that included only the subjects who scored above the ADI-R threshold for ASD. Analysis confirmed that IQ did not seem to act as a confounding factor in the SVM predictions. Also, the influence of age and medication as cofounds could be ruled out, as the ADI-R algorithm codes behaviors between 4 and 5 years old [35].
The application of the genetic disorder algorithm to AGRE samples indicated that the behavioral patterns observed in cases of idiopathic autism were not random. Therefore, these results could be used to estimate relative similarity to behavioral profiles designated from the genetic disorders. In addition, the sibling analysis showed correlation of SVM predictions between affected sib pairs. These findings indicate the feasibility to partition familiality into components according to patterns of autistic symptomatology, for example concordance in relative similarity to behavioral profiles related to the genetic disorders. This notion should be followed up by studies that incorporate genetic or pathway information to ascertain the behavior-based stratification in idiopathic samples. For instance, our allocation in idiopathic ASD to TSC-derived patterns may be supported by molecular data showing mammalian target of rapamycin (mTOR) pathway deregulation. Such a result would support the view that perturbation of the mTOR signaling cascade is a common pathophysiological feature of human neurological disorders, including mental retardation syndromes and ASDs [49]. If confirmed, such results could complement future gene searches, since stratification on the basis of behavioral profile may significantly increase the power to detect which (combination of ) genetic disorder related pathways are most prominently involved. Indeed, the notion that pathophysiological processes are shared in syndromic and idiopathic cases of ASD is supported by a recent study that showed converging synaptic pathophysiology between syndromic (for example as a cause of a defined genetic disorder) and non-syndromic rodent models of autism [50]. Moreover, genotype stratification may also have important treatment implications, as other animal studies suggest that the best treatment approaches for some genetic disorders (for example fragile X syndrome) may be unsuitable for others (for example tuberous sclerosis) [49].

Conclusion
Our proof of concept study indicates the existence of 'signature' autistic behavioral profiles that index underlying genetic risk processes. These signatures may be helpful in disentangling the etiological and phenotypic heterogeneity evident in ASD, but warrant replication in larger and independent samples. The approach presented in this study could hold promise as a means of stratifying patients who may benefit from treatments targeted at specific pathways and as a way of identifying those patients in whom interventions may have unwanted effects.