Difficulties in social communication are a defining clinical feature of autism. However, the underlying neurobiological heterogeneity has impeded targeted therapies and requires new approaches to identifying clinically relevant bio-behavioural subgroups. In the largest autism cohort to date, we comprehensively examined difficulties in facial expression recognition, a key process in social communication, as a bio-behavioural stratification biomarker, and validated them against clinical features and neurofunctional responses.
Between 255 and 488 participants aged 6–30 years with autism, typical development and/or mild intellectual disability completed the Karolinska Directed Emotional Faces task, the Reading the Mind in the Eyes Task and/or the Films Expression Task. We first examined mean-group differences on each test. Then, we used a novel intersection approach that compares two centroid and connectivity-based clustering methods to derive subgroups based on the combined performance across the three tasks. Measures and subgroups were then related to clinical features and neurofunctional differences measured using fMRI during a fearful face-matching task.
We found significant mean-group differences on each expression recognition test. However, cluster analyses showed that these were driven by a low-performing autistic subgroup (~ 30% of autistic individuals who performed below 2SDs of the neurotypical mean on at least one test), while a larger subgroup (~ 70%) performed within 1SD on at least 2 tests. The low-performing subgroup also had on average significantly more social communication difficulties and lower activation in the amygdala and fusiform gyrus than the high-performing subgroup.
Findings of autism expression recognition subgroups and their characteristics require independent replication. This is currently not possible, as there is no other existing dataset that includes all relevant measures. However, we demonstrated high internal robustness (91.6%) of findings between two clustering methods with fundamentally different assumptions, which is a critical pre-condition for independent replication.
We identified a subgroup of autistic individuals with expression recognition difficulties and showed that this related to clinical and neurobiological characteristics. If replicated, expression recognition may serve as bio-behavioural stratification biomarker and aid in the development of targeted interventions for a subgroup of autistic individuals.
Autism spectrum disorderFootnote 1 (henceforth ‘autism’) is a life-long neurodevelopmental condition, behaviourally defined by difficulties in social communication alongside repetitive and restricted behaviours and interests . Over recent years, the clinical and etiological heterogeneity of autism has been identified as a key barrier for mapping clinical features to underlying mechanisms, and to the development of targeted support or interventions . Several neurocognitive and neurobiological characteristics have been identified at the group level, but none have been found to be either universal (i.e. applying to all autistic people) or specific (i.e. many differences are also seen in other neurodevelopmental or psychiatric conditions ) to autism. These observations have prompted recent research efforts to move beyond the analysis of group-level differences to identify tests or markers that can be used to stratify autism into clinically relevant biological subgroups . A stratification biomarker could be any objectively measurable characteristic that can be used to subdivide or stratify a condition into more homogeneous subgroups. It could be specific for a subpopulation within an established clinical category (here autism) or may be transdiagnostic; that is, pertaining to a subpopulation of individuals across diagnostic categories. A stratification biomarker may be constructed from a univariate measure, whereby clinically relevant differences are indicated from certain cut-off values, or derived from multi-variate measures. Availability of validated stratification biomarkers is critical for the development of better targeted interventions.
The current study explored difficulties in facial expression recognition as a potential stratification marker for social communication difficulties in autism. Expression recognition is a plausible candidate marker because the ability to recognise (and respond to) others’ emotional states based on their facial expressions is a fundamental social perceptual skill that modulates many aspects of social interaction and communication. It emerges in infancy , becomes more sophisticated across development and has a well-defined neurobiological basis .
Difficulties in expression recognition in autism have been reported at both the behavioural and neurobiological level. Overall, findings indicate that the nature and severity of expression recognition difficulties in autism may depend on both participant characteristics (e.g. age, intellectual ability level) and task-related factors [7, 8]. For example, difficulties in recognising basic emotions (e.g. happy, sad, fear) have been predominantly found in studies with young autistic children [9, 10] or those with co-occurring learning disabilities. Some behavioural studies reported that adolescents/adults with an IQ in the average range had difficulties on tests assessing complex emotions , emotions displayed only briefly [12, 13] or subtle emotions ; others found no group differences . Furthermore, neuroimaging studies have revealed neurofunctional differences in autistic individuals during emotion processing tasks in key areas comprising the ‘social brain network’, including the amygdala and right posterior fusiform gyrus (FG) (also known as the ‘fusiform face area’) [16, 17]. The fusiform face area is thought to be involved in the processing of faces  and other objects that people have significant expertise with , while the amygdala plays a critical role in the processing of emotional and socially relevant information, including the recognition of emotions from facial expressions. The spatial location and direction of effect vary; for example, some studies reported that the autistic group had on average under-activation in the amygdala , while others found overactivation in the autistic group .
However, to date, the majority of behavioural and neuroimaging studies have focused on mean-group differences, and often included sample sizes of 15–20 individuals per group (see ), which are too small to meaningfully subdivide the autism group. Significant difficulties found at the group level, especially with small or moderate effect sizes, may not apply to particular individuals within that group . It is important to ascertain whether only subgroups of autistic individuals have difficulties in expression recognition, and to what extent those difficulties impact clinical features or functional outcomes, in order to better target therapies.
Hence, in this study we explored the role of expression recognition as a candidate stratification biomarker in a large group of autistic individuals diverse in age and intellectual ability. We employed three facial expression recognition tasks: the Karolinska Directed Emotional Faces (KDEF) , the Reading the Mind in the Eyes Test (RMET) , and the Films Expression task (FET) . First, we carried out case–control comparisons on each task and ascertained the frequency and severity of difficulties. Second, we used clustering approaches to identify subgroups based on combined performance across all three tasks. Here, we performed clustering separately for the autistic and non-autistic groups, with the primary goal of parsing heterogeneity in expression recognition among autistic people. We then repeated the same approach in the control group to ascertain whether similar (or different) patterns may exist in non-autistic individuals.
Finally, to externally validate expression recognition as a potential stratification biomarker, we compared performance on single measures and subgroups in terms of differences in social communication symptoms, functional outcomes, and regional activation measured using fMRI in key areas implicated in emotional face processing . We hypothesised that difficulties in expression recognition would be associated with increased social communication difficulties and neurofunctional differences, and that the combined performance profiles across tasks may be more sensitive than individual task performance. Such a profile could potentially serve as a stratification biomarker in autism.
Methods and materials
Participants were recruited as part of the EU-AIMS Longitudinal European Autism Project (LEAP), a multi-centre longitudinal European initiative seeking to identify biomarkers in autistic people aged 6–30 years . Participants completed a battery of clinical, cognitive, EEG and MRI assessments at one of the following six centres: Institute of Psychiatry, Psychology and Neuroscience, King’s College London, UK; Autism Research Centre, University of Cambridge, UK; Radboud University Nijmegen Medical Centre, the Netherlands; University Medical Centre Utrecht, the Netherlands; Central Institute of Mental Health, Mannheim, Germany; University Campus Bio-Medico of Rome, Italy. The study was approved by the respective research ethics committees at each site (IRAS, UK). Informed written consent was obtained from all participants, or—for minors or those unable to give informed consent—from a parent or legal guardian.
Participants were sorted into cohorts based on age and intellectual ability: adults between 18–30 years (schedule A), adolescents between 12–17 years (schedule B), children between 6–11 years (schedule C)—all with IQ in the normal range—and adolescents/adults with mild intellectual disability (ID) (IQ 50–74) between 12–30 years (schedule D). For further details about the study design and clinical characterisation of the cohort, see Loth et al.  and Charman et al. . The current study reports on a subset of measures acquired at the follow-up assessment time-point (2016–2018), as two facial expression tests were only added to the protocol at this time.
In the following analyses, we compared the autism group (including autistic individuals with mild ID, defined by FSIQ < 75) to a ‘control group’, comprising individuals with typical development (TD) and mild intellectual disability (ID, defined by IQ < 75). We also divided each group by study schedule as outlined above to investigate whether group differences may be greater at certain developmental stages (e.g. in children vs. adults), or in individuals with mild ID.
Participants in the autism group had an existing clinical diagnosis of autism spectrum disorder according to the DSM-IV-TR/ICD-10 or DSM-5 criteria at study entry. For research purposes, we also conducted a structured parent interview—the Autism Diagnostic Interview-Revised (ADI-R; ), which combines historic and current symptoms; and administered the Autism Diagnostic Observation Schedule 2 (ADOS-2; ) to participants, which measures current behaviours. We used the ADOS Calibrated Severity Score (CSS) for Social Affect (SA) as a measure of current social symptoms.
To assess autism traits, the parent-reported total t-score on the Social Responsiveness Scale Second Edition (SRS-2; ) was used in all participants except for TD adults, where the self-report version was used. The Repetitive Behavior Scale-Revised (RBS-R; ) was used to assess the severity of repetitive behaviours and the Vineland Adaptive Behavior Scales-2nd Edition (VABS;  social adaptive behaviour subscale) to measure social adaptive functioning. Neuropsychiatric symptoms were measured using the Development and Wellbeing Assessment (DAWBA) probability bands .
We examined the most common neuropsychiatric symptoms in autism (anxiety, depression, ADHD, behavioural disorder) and generated a total ‘neuropsychiatric symptom score’ based on all disorders. DAWBA scores reflect six levels of prediction (i.e. ~ 0.1%, ~ 0.5%, ~ 3%, ~ 15%, ~ 50%, > 70%) of the probability of a disorder, ranging from very unlikely to probable. We derived a binary measure of predicted DAWBA risk by combining Levels 0–3 as ‘absent’ (i.e. having a risk score of ~ 0.5%, ~ 3%, ~ 15%) and Levels 4–5 as ‘present’ (~ 50%, > 70%). In the autism and ID groups, all neuropsychiatric symptoms were allowed. An inclusion criterion for participants in the ‘typically developing’ control group was the absence of any existing clinical diagnosis (including mental health problems). This contributed to the relatively low rate of psychiatric/mental health features in the control group. Medication use was assessed using an in-house questionnaire.
Facial expression recognition tasks
The Karolinska Directed Emotional Faces task (KDEF)  tests for recognition of six basic emotions (happy, sad, angry, fearful, surprised, disgusted) and uses long presentation times (20 s).
The Reading the Mind in the Eyes Task (RMET)  requires participants to identify complex emotions based only on the eye region of a face. Depending on age, participants received either an adult (36 items), adolescent (31 items), or child (28 items) version of the test.
The Films Expression Task (FET)  tests for recognition of both basic and complex emotions from film stills and has short presentation times (500 ms). Depending on age and ability level, participants received an adult version with 56 items (or a child version, created for LEAP) with a subset of 36 items that only employed emotion terms that children within the targeted age range typically understand. Examples of test stimuli for all three tasks are given in Fig. 1. More details of each task characteristics, procedures, control tests and reliability information are provided in Additional file 1: Materials 1.
Across all three tasks, accuracy and accuracy-adjusted reaction times (ART, the mean time of each passed trial divided by the fraction of passed trials) were used as the main dependent variables.
We examined group differences overall as well as in each age group (corresponding to different task versions) in order to see whether difficulties were more pronounced at some developmental stages than in others.
To investigate case–control differences in response accuracy, we used nonparametric Kruskal–Wallis tests, as data distributions were skewed towards high performance across tests (Shapiro–Wilk normality tests: KDEF: p < 0.001; RMET: p < 0.001; FET: p < 0.001), and nonparametric ANOVA (fANCOVA), controlling for the effects of VIQ and neuropsychiatric symptoms (see Results) on task performance. To investigate case–control differences in ART, ANOVAs or ANCOVAs were used for all three tasks. Age was not controlled for as two of the tasks used different versions for different age groups, so we report findings split by age groups. We also did not control for sex but tested for the effect of sex on each task.
We used Bonferroni corrections to correct for the number of tests (i.e. 0.05/6 = 0.008).
Spearman’s correlations (and partial correlations, controlling for the previously documented effect of VIQ on expression recognition) were used to examine dimensional relationships of each expression recognition task to clinical features or ROIs (regions of interest in neuroimaging, see below). All analyses were performed in Rstudio Version 1.1.453 2009–2018. The number of participants varied between tasks. All details are provided in Table 1 (See below).
Clustering analyses were carried out in 129 autistic and 96 control participants who had completed all three behavioural expression recognition tasks. Cluster analysis was carried out using both hierarchical clustering (Ward’s linkage, Euclidean distance) and K-means (25 random starts). These two methods were chosen because they represent the most common variants of centroid and connectivity-based clustering, respectively, which are based on different assumptions and tend to produce the most divergent results. Here we carried out clustering separately for the autism and control groups, as our main goal was to examine heterogeneity among autistic individuals. More details on the clustering analysis are given in Additional file 1: Materials 2.
We had valid data of a subset of 132 participants from a well-established emotional face-matching task, the Hariri task, while fMRI blood-oxygen-level-dependent (BOLD) responses were recorded . As per LEAP study protocol , this task was mainly administered to schedules A and B to reduce scan time for children and those with mild ID. There were no significant differences between the sub-sample with fMRI data and the overall sample in terms of age or sex composition, but the sub-sample with fMRI data had on average significantly higher IQ (p < 0.001).
Please see Additional file 1: Materials 3 for more information on the fMRI paradigm, data acquisition, and analysis.
Participant characteristics for each task are given in Table 1.
First, we examined the effect of study site, schedule (and separately age and IQ), sex, medication, and neuropsychiatric symptoms on expression recognition, as detailed in Table 1. There were no significant effects of study site or medication on any accuracy scores or ART (all p > 0.05). However, there were significant effects of schedule on all tests (all p < 0.001). In both the autism and control groups, performance on each expression recognition task was moderately correlated with verbal, performance, and full-scale IQ (autism r’s: 0.34–0.53, control group r’s: 0.24–0.44, all p < 0.001). Age was weakly correlated with accuracy on the KDEF (r = 0.15, p < 0.05) in the autism group, but not significantly in the control group (r = 0.11, p > 0.05); presumably because the task employed expressions of basic emotions and long presentation times, and age effects may only occur in younger children than those tested in this cohort. There were no age effects on the RMET and FET, which employed different age-related versions.
Sex had no significant effect on the KDEF (p > 0.05), but we found nominally significant sex differences on the FET (p < 0.05) and a non-significant trend on the RMET (p = 0.06), such that males performed marginally worse than females. In the autism group, there was no significant difference between those with and without neuropsychiatric symptoms on any task performance (all p > 0.05). In the control group, we found a significant effect of neuropsychiatric symptoms on the RMET (p < 0.05, r = − 0.18) and the FET (p < 0.01, r = 0.31). This effect was mainly driven by a small number of participants with ID, as any psychiatric disorder was an exclusion criterion in the ‘typically developing’ group (8 in the RMET and 5 in the FET); hence, the following analyses were performed overall, and split by schedule to compare more homogeneous age/IQ groups, and with/without controlling for IQ and neuropsychiatric symptoms. The results of preliminary analyses are summarised in Table 2.
KDEF. The autism group had, on average, significantly lower accuracy scores than the comparison group (χ2 = 15.17, pcorrected < 0.001, r = 0.18). Analyses split by schedule showed significant differences in autistic adults (χ2 = 8.8, pcorrected < 0.05, r = 0.26) and children: (χ2 = 16.01, pcorrected < 0.01, r = 0.34); but not in adolescents: (χ2 = 3.2, p > 0.05) or those with ID: (χ2 = 0.4 p > 0.05). Furthermore, autistic participants were on average significantly less accurate than the comparison group in identifying expressions denoting anger (χ2 = 21.02, pcorrected < 0.001, r = 0.25), disgust (χ2 = 13.44, pcorrected < 0.01, r = 0.18) and surprise (χ2 = 8.38, pcorrected < 0.05, r = 0.16), but not fear. ART did not differ between the two groups overall, or for any specific type of emotion. Overall, between-group differences remained significant after controlling for neuropsychiatric symptoms (all p < 0.001).
RMET. On the RMET, we observed significantly reduced average response accuracy in the autism group overall (χ2 = 22.25, pcorrected < 0.01, r = − 0.23). When split by schedule the mean-group effect for adults and children was only significant at the uncorrected level (adults: χ2 = 5.68, p < 0.05, r = 0.24; adolescents: χ2 = 6.47, p < 0.05, r = 0.2; children: χ2 = 0.64, p < 0.05, r = 0.21) and non-significant for adolescents/adults with mild ID (χ2 = 1.48, p < 0.05). There were no significant group differences in ART overall, or for any subgroup (all p > 0.05), except between those with mild ID (χ2 = 5.29, pcorrected < 0.05, r = 0.4) with autistic participants showing faster ART than non-autistic participants with mild ID. Overall between-group differences remained significant after controlling for VIQ and neuropsychiatric symptoms (all pcorrected < 0.05).
FET. Overall, there was a significant mean-group difference in accuracy on the FET (χ2 = 11.59, pcorrected < 0.001, r = 0.24). However, this difference was driven by the adults (χ2 = 11, p < 0.01, r = 0.43), with no other schedules showing significance, and became non-significant after controlling for VIQ and neuropsychiatric symptoms (p > 0.05). Autistic adults also had on average slower reaction time on accurate trials than non-autistic adults (χ2 = 9.18, pcorrected < 0.05, r = 0.36).
When analyses were split between trials with simple versus complex emotions, we found significant group differences between autistic and control adolescents and adults in basic emotions (adults: χ2 = 10.59, pcorrected < 0.05, r = − 0.36; adolescents: χ2 = 5.8, pcorrected < 0.05, r = 0.30) and only nominally significantly in the adults in complex emotions (χ2 = 9.44, p < 0.05, r = − 0.23). Autistic children did not differ from comparison children in recognition accuracy for basic or complex emotions, nor did participants with mild ID (all p > 0.05).
On the visual processing control task there were no significant group differences in response accuracy (χ2 = 0.064, p > 0.05) or ART (χ2 = 0.90, p > 0.05), which rules out the possibility that performance differences on the FET were due to more general difficulties in processing briefly presented stimuli.
Additional file 1: Materials 4 shows that in both groups, the accuracy measures (and ART measures) were moderately positively correlated with one another. Accuracy and ART were only negatively correlated on the FET, indicating that on this task difficulties in correctly identifying expressions were accompanied with longer response times and vice-versa.
Table 3 gives participant details of the subset of participants who had completed all three expression recognition tasks and were included in the clustering analyses.
We repeated between-group comparisons on each expression recognition test in this group and found similar patterns to the overall group in case–control comparisons. The autism group had significantly lower accuracy in the RMET (r = − 0.22, p < 0.01) and the FET (r = − 0.28, p < 0.001), though not in the KDEF (p > 0.05), our test for basic emotions.
Cluster analyses were carried out separately for the autism and control groups using the accuracy scores of each test. For each group, a two-cluster solution was found. Further, 91.6% of participants were found in the same clusters by hierarchical clustering and K-means, while 8.4% were inconsistently clustered.
In the autism group, the larger cluster comprised n = 83 (69.7%) individuals and was subsequently characterised by ‘high expression recognition’ performance; the smaller cluster comprised 36 (30.3%) individuals and was characterised by ‘low expression recognition’ performance (see Fig. 3).
Ten of the 36 autistic individuals (27.7%) in the low-performing cluster also had ID, and 17 (47.2%) had neuropsychiatric symptoms. Twenty-nine (75%) were male 12 were children (33.3%), 7 adolescents (19.4%) and 17 adults (47.2%).
The high-performing autistic cluster included only 1 participant with ID but 29 (34.9%) with neuropsychiatric symptoms. Fifty-one (61.1%) were male; 21 were children (25.3%), 29 adolescents (34.9%) and 33 adults (39.8).
This suggests that in the autism group, expression recognition difficulties persisted across all IQ levels: although almost all autistic individuals with ID fell in the low-performing cluster, the majority of autistic low performers actually had IQ in the normal range. There was no significant difference between autistic high- and low-expression recognition performers in terms of neuropsychiatric symptoms.
In the control group, 66 (73.3%) fell in the high expression recognition cluster and 24 (26.7%) in the low expression recognition cluster. In the low-performing cluster, 8 individuals (33.3%) also had ID; 3 (12.5%) had neuropsychiatric symptoms. Both autistic and non-autistic individuals in the low-performing clusters had on average significantly lower verbal, performance, and full-scale IQ than individuals in the high-performing clusters (all p < 0.001).
To examine individual task performance within each cluster, we used age-adjusted medians of the typically developing group (excluding those with ID) as a reference point. In the autism high-performing expression recognition cluster, the majority of participants performed above or within one SD of the age-group-related median (KDEF: 82%, RMET: 78.3%, FET: 82%). In the autism low-performing cluster, the majority of participants performed below 2 SD of their age-group-related median (KDEF: 58.3%, RMET: 58.3%, FET: 66.7%). In the control high-performing expression recognition cluster, the majority of individuals performed at or above their age-group-related median in all three tasks (KDEF: 63.6%, RMET: 74.2%, FET: 53%). However, only approximately a third of participants in the low-performing control cluster performed below 2 SD of their age-group-related median (KDEF: 37.5%, RMET: 29.2%, FET: 29.2%). This indicates that the cluster boundaries were more stringent in the autism than control group.
Interestingly, individuals in the ‘non-consistent’ clusters were predominantly characterised by low performance on the RMET but good performance on the KDEF and FET, which may indicate a potential separate subgroup (see Fig. 3A, B, bottom).
Associations between expression recognition measures/clusters and clinical features
Table 3 shows participant characteristics of the individuals included in clustering analysis, including correlations of accuracy on each task to clinical features and social adaptive functions. In the autism group, performance on each expression recognition test was significantly related to current social symptom severity (ADOS, CSS) as well as level of social adaptive functioning (VABS-2; all r’s: 0.19–0.36, p < 0.01). Repetitive behaviours (RBS-R) only significantly correlated with the FET (r = − 0.28, p < 0.01). The strongest associations were found with the FET throughout, which remained significant after the effect of VIQ was controlled for (all r’s: − 0.29 to 0.21, p < 0.05). In the control group, only FET performance was significantly related to autism traits, which again remained stable after controlling for VIQ (see Fig. 4A, B).
When the high- and low-performing autism clusters were compared to each other in terms of clinical features, we found significant subgroup differences in severity of current social symptoms (t(88) = − 3.9, p < 0.001, r = 0.39), autism traits (t(100) = 3.39, p < 0.001, r = 0.31), repetitive behaviours (t(96) = 3.28, p < 0.001, r = 0.33), and level of social adaptive functioning (t(103) = 3.39, p < 0.001, r = 0.33) (see Fig. 5). Subgroups explained 14.8% of variance in current social symptoms, 8.8% of variance in autism traits, 10% in repetitive behaviours, and 11.1% in social adaptive functioning. Similar variance was explained by FET subgrouping alone (SRS: 16.2%; ADOS: 17.6%; RBS: 9.2%; VABS: 16.7).
In previous studies, VIQ was found to be the strongest predictor of clinical features. . Therefore, we also subdivided the autism group into IQ subgroups (above/below IQ 75) as a bar for comparison. The cut-off of 75 was used to correspond with the definition of our ID group in LEAP . Subgrouping by IQ explained 10.2% of variance in current social symptoms (t(220) = − 5.6, p < 0.001), 2.1% in autism traits (t(263) = − 2.4, p < 0.05), 2.9% in repetitive behaviours (t(259) = − 0.28, p < 0.05), and 11.5% in social adaptive functioning (t(247) = 5.7, p < 0.001). This shows that expression recognition subgroups explained more variance in clinical features than could be explained by IQ alone.
Relationship between clusters and individual task performance, respectively, and neurofunctional differences
To externally validate the performance scores and clusters in terms of underpinning mechanisms, we investigated associations with neurofunctional responses during an established emotional face-matching task. The characteristics of participants with fMRI data are given in Table 4. Case–control comparisons revealed no significant between-group differences in amygdala or fusiform gyrus activation overall or within age groups (all F > 0.36 < 2.7, all p > 0.08).
In the autism group, accuracy on the FET was dimensionally related to increasing left amygdala activation (r = 0.24, p = 0.05) (Fig. 6A). Comparisons between clusters also showed that the autistic low-performing subgroup had on average significantly lower activation in the right amygdala (F(2, 64) = 5.3, pcorr < 0.05), left and right middle FG (left: F(2,65) = 5.9, pcorr < 0.05, right: F(2,65) = 9.16, pcorr < 0.01) and left and right posterior FG (left: F(2,66) = 6.3, pcorr < 0.01, right: F(2,66) = 5.3, pcorr < 0.05), than the autistic high-performing cluster. Reduced activation in the right amygdala (Fig. 6B) and right middle FG (Fig. 6C, D) explained 22.2% and 11.31% of variance, respectively. In the control group, we saw no significant dimensional relationships to functional activation or subgroup differences (all F < 0.042, p > 0.05).
This study investigated the role of expression recognition as a potential stratification biomarker for social communication difficulties in autism. We tested a large group of autistic individuals, diverse in age and intellectual ability, on three tests assessing expression recognition of simple and complex emotions with varying presentation times. In a first step, we examined group-level differences. We then explored which measure(s), alone or in combination using cluster analyses, were most sensitive to differences in clinical features and/or neurofunctional responses.
In line with previous studies, we found significant mean-group differences in accuracy across all three tests [13, 23, 24], with medium effect sizes similar to those reported in a previous meta-analysis . In the autism group, difficulties were on average stronger on tests with complex emotions and/or brief presentation times, as indicated by somewhat higher effect sizes and, in some age groups, slower reaction times. However, contrary to the assumption that neurofunctional measures may be more sensitive than behavioural responses, in the present study we found only group-level differences in behavioural responses, but not in fMRI BOLD activity in regions previously associated with the recognition of facial emotion expressions.
Cluster analysis revealed two cluster solutions for both the autism and control groups. High agreement between hierarchical clustering and K-means (91.6%) indicated valid capture of the underlying data cluster structure. While both the majority of autistic (69.7%) and control participants (73.3%) fell in the high-performing clusters (the majority performing within or above 1 SD of the age-adjusted typically developing medians on at least two tests), 30.3% of autistic and 26.6% of control individuals fell in the low-performing clusters. Inspection of participant characteristics showed that the low-performing autistic cluster comprised individuals across all IQ levels (i.e. 27.7% had intellectual disability). Neuropsychiatric symptoms did not substantially affect expression recognition as 47.2% of autistic people in the low-performing cluster but also 34.9% of autistic people in the high-performing cluster had neuropsychiatric symptoms.
It should be noted that in the present study, we carried out clustering separately for the autism and control groups. This was motivated by our main goal to parse heterogeneity in expression recognition within the autism group, and to explore the clinical and functional relevance of expression recognition performance in autism. We repeated the approach in the control group to see whether there may be similar or different patterns of expression recognition performance across the three different tasks in non-autistic individuals. Although we found a two-cluster solution in both groups, the boundary threshold for ‘low-performing’ was somewhat more stringent in the autism group than the comparison group. Future studies may use the alternative, transdiagnostic approach of performing clustering on the combined sample and then characterise clusters in terms of potential enrichment by diagnostic groups.
Taken together, by comparing findings from our mean-group comparisons to the results from our cluster analyses, we highlight that mean-group differences are unsuited to making inferences about specific autistic individuals . Although the autism group were on average less accurate in expression recognition than the control group, in fact only a proportion of autistic people—as well as a proportion of non-autistic people—had difficulties in expression recognition.
Relationships to clinical features and neurofunctional underpinnings
A critical criterion for a test or measure to qualify it as a stratification biomarker consists of demonstration that it relates to—or predicts—a clinically relevant outcome. In the autism group, we observed significant subgroup differences in clinical features, which accounted for 9–14% of variance in autism traits, current severity of social difficulties and social adaptive behaviour. Previous studies suggested level of intellectual ability to be among the strongest predictors for functional outcomes in autism . We therefore also subgrouped our autistic participants by IQ. This showed that subgrouping by behavioural expression recognition patterns explained 1.5–3 times more variance than was explained by IQ subgroups.
Furthermore, by relating behavioural expression recognition against neurofunctional activation during a fearful face-matching task, we showed that the autistic expression recognition subgroups significantly differed from one another in terms of functional activation patterns. More specifically, the autistic low-performing subgroup had significantly lower activation in the amygdala and fusiform gyrus bilaterally than the high-performing subgroup, explaining 20% and 10% of variance, respectively. This reaffirms the integral role of these structures in emotion recognition and provides a neurobiological substrate for the uncovered heterogeneity of emotion recognition in autism.
Implications of findings for stratified or precision medicine approaches
Our finding that only subgroups of autistic and non-autistic people had behavioural difficulties in expression recognition, which were related to both clinical features and functional outcomes, highlights the need for more targeted ‘precision’ medicine or intervention approaches, rather than inferring treatment needs from the broad autism diagnosis alone.
First, we showed that not every autistic person requires support in expression recognition. In fact, the majority of autistic individuals appear to have no more difficulties in expression recognition than the majority of non-autistic individuals.
Second, analyses of neurofunctional responses indicate that the behavioural expression recognition difficulties we observed may be due to partly different mechanisms in autistic and non-autistic people. In the autism group, expression recognition difficulties were significantly related to lower amygdala and fusiform gyrus activation, while in the non-autistic group they were not. These findings should be treated as preliminary as they were based only on a subset of our participants who had completed all behavioural tests and the fMRI paradigm, and they were largely restricted to people without ID. However, if replicated, they may indicate that autistic and non-autistic subpopulations may have expression recognition difficulties for different reasons, and therefore may benefit from different behavioural or pharmacological interventions to increase their ability to recognise emotions from faces.
For measures of expression recognition to be used as (multi- or univariate) stratification markers in future clinical settings, we sought to empirically derive cut-off values. Emphasis on sensitivity vs. specificity (and thus stringency of cut-off values) depends on investigators’ or clinicians particular context of use, such as the cost of including individuals in a stratified clinical trial and their likely benefits . Moreover, from a practical perspective (e.g. costs, participant burden) it is useful to know whether multi-variate markers composed of performance on different measures have significantly greater accuracy than univariate markers based on a single index. We showed that of the three tests used, FET performance had highest accuracy in assigning individuals to the high vs. low-performing subgroups. This suggests that in the current instance FET performance alone (univariate marker) approximated the information provided by multi-variate assessments (clustering from 3 tests).
First, our findings will require independent replication—not only of cluster assignments/composition, but critically also of the relationship between clusters and clinical and neurofunctional features. Independent replication is often considered as a gold standard. It was not possible here as we are not aware of any current dataset that includes the relevant measures. However, we demonstrated internal robustness of results by only including participants who were consistently assigned to the same subgroup using two clustering approaches with fundamentally different assumptions. We argue that internal robustness—that is, generating identical results with different methods on the same data—is a critical though frequently understudied requirement before moving on to external replication (different methods, different data and likely participant characteristics).
A second limitation was that we had different sample sizes for different analyses. While we used the total number of participants who had completed a given task for the group comparisons of each to maximise power, numbers were reduced for clustering analyses by only including participants who had completed all three behavioural tasks. However, we replicated the mean-group findings in this smaller group. Furthermore, in the external validation study, only a subset of participants participated in the fMRI experiment, and per protocol this excluded participants with mild ID, to reduce participant burden as they had also completed another MRI session. Although the sample size (N = 137) was high for a task-related fMRI study in autism research, validation of the brain-behaviour associations was limited by the sizes of the low-performing clusters.
Third, future work should further optimise and standardise behavioural facial expression recognition tasks. The three tests included in this study were chosen because they tested different aspects of expression recognition, showed at least adequate performance characteristics (such as test–retest reliability), and/or obtained large effect sizes . However, further indicators relevant for clinical use, such as sensitivity to change, remain to be established. We found the FET to be most sensitive in subgroup assignments, and to account for the highest variance in clinical symptoms and functional activation. As effect sizes were comparable to those attained in the combined clusters, from a practical perspective, the FET alone may be carried forward as a proxy marker. The FET was less sensitive in children than adolescents/adults, possibly because a proportion of TD children also showed difficulties. Future task optimisation should aim to increase discriminant validity, particularly in children. Standardisation and creation of normative ranges will also help to interpret individual test scores based on age/ability level, or other demographic variables.
In a large and heterogeneous autism sample, we found that around 30% of autistic individuals had difficulties with expression recognition, which were related to both clinical features and neurofunctional differences. This indicates that a subset of autistic individuals may benefit from targeted support or interventions in expression recognition. Standardisation of tasks and replication of findings will be necessary to further validate expression recognition as a clinically relevant stratification biomarker for autism.
Availability of data and materials
The datasets generated and/or analysed during the current study are not publicly available due to an embargo period but are available from the corresponding author on reasonable request.
We use the term autism here in favour of the official term autism spectrum disorder to reflect the preference by several members of the autism community that advise on AIMS-2-TRIALS to stress differences over deficits.
Karolinska Directed Emotional Faces
Reading the Mind in the Eyes Task
Films Expression Task
Longitudinal European Autism Project
Autism Diagnostic Interview-Revised
Autism Diagnostic Observation Schedule 2
ADOS Calibrated Severity Score for Social Affect
Social Responsiveness Scale Second Edition
Repetitive Behavior Scale-Revised
Vineland Adaptive Behavior Scales-2nd Edition
Development and Wellbeing Assessment probability bands
Accuracy-adjusted response times
Regions of interest
American Psychiatric Association. American Psychiatric Association. DSM-5 Task Force. Diagnostic and statistical manual of mental disorders: DSM-5. 5th ed. Washington, D.C.: American Psychiatric Association; 2013. xliv, 947 p.
Loth E, Charman T, Mason L, Tillmann J, Jones EJH, Wooldridge C, et al. The EU-AIMS Longitudinal European Autism Project (LEAP): design and methodologies to identify and validate stratification biomarkers for autism spectrum disorders. Mol Autism. 2017;8:24.
Baron-Cohen S, Jolliffe T, Mortimore C, Robertson M. Another advanced test of theory of mind: evidence from very high functioning adults with autism or asperger syndrome. J Child Psychol Psychiatry. 1997;38(7):813–22.
Ashwin C, Baron-Cohen S, Wheelwright S, O’Riordan M, Bullmore ET. Differential activation of the amygdala and the “social brain” during fearful face-processing in Asperger Syndrome. Neuropsychologia. 2007;45(1):2–14.
Baron-Cohen S, Wheelwright S, Hill J, Raste Y, Plumb I. The, “Reading the Mind in the Eyes” Test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. J Child Psychol Psychiatry. 2001;42(2):241–51.
Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord. 1994;24(5):659–85.
Goodman R, Ford T, Richards H, Gatward R, Meltzer H. The Development and Well-Being Assessment: description and initial validation of an integrated assessment of child and adolescent psychopathology. J Child Psychol Psychiatry. 2000;41(5):645–55.
Black DO, Wallace GL, Sokoloff JL, Kenworthy L. Brief report: IQ split predicts social symptoms and communication abilities in high-functioning children with autism spectrum disorders. J Autism Dev Disord. 2009;39(11):1613–9. https://doi.org/10.1007/s10803-009-0795-3.
We would like to thank all participants for their time and cooperation, without whom this study would not have been possible.
This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under Grant agreement No 777394 for the project AIMS-2-TRIALS. This Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA and AUTISM SPEAKS, Autistica, SFARI. Additional funding was received by the European Community’s Seventh Framework Programme under the Grant agreement no. 602805 (Project EU-AGGRESSOTYPE) and no. 602450 (Project EU-IMAGEMEND), and by the German Federal Ministry of Education and Research under the Grant no. 01ZX1314GM (Project IntegraMent) and Grant no. 01GQ1102. C.M. is a recipient of an Olympia-Morata grant of the University Heidelberg.
Authors and Affiliations
Department of Psychiatry and Psychotherapy, University of Frankfurt, Frankfurt, Germany
Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Hannah Meyer-Lindenberg, Bethany Oakley, Jumana Ahmad, Hannah L. Hayward, Jennifer Cooke, Daisy Crawley, Tony Charman, Declan G. Murphy, Michael J. Brammer & Eva Loth
Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, University of Heidelberg/Medical Faculty Mannheim, Mannheim, Germany
Carolin Moessnang, Tobias Banaschewski, Heike Tost & Andreas Meyer-Lindenberg
Department of Psychology, Social Work and Counselling, Faculty of Education and Health, Greenwich University, London, UK
Centre for Brain and Cognitive Development, Department of Psychological Sciences, University of London, Birkbeck London, UK
Luke Mason & Emily J. H. Jones
Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK
Rosemary Holt & Simon Baron-Cohen
Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, The Netherlands
Julian Tillmann, Christian Beckmann & Jan K. Buitelaar
EL, EJ, TC, SB-C, TB, AM-L, JB, and DM substantially contributed to the conception or design of the LEAP study. HM-L, CM, BO, JA, LM, HH, JC, DC, RH, JT, HT, and MB substantially contributed to the acquisition, analysis, or interpretation of data for the work. HM-L, CM, BO, JA, LM, EJ, HH, JC, DC, RH, JT, TC, SB-C, TB, CB, HT, AM-L, JB, DC, MB, and EL were involved in drafting the work or revising it critically for important intellectual content; final approval of the version to be published; and agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors read and approved the final manuscript.
The study was approved by the local ethical committees of the participating centres, and written informed consent was obtained from all participants. For minors or those without capacity to consent, ethics consent was obtained from a parent or guardian. KCL, UCAM: London Queen Square Health Research, Authority Research Ethics Committee (ID/reference number: 13/LO/1156). RUNMC, UMCU: Instituut Waarborging Kwaliteit en Veiligheid, Commissie Mensgebonden Onderzoek, Regio Arnhem-Nijmegen (Radboud University Medical Centre Institute Ensuring, Quality and Safety Committee on Research Involving Human Subjects Arnhem-Nijmegen) (ID/reference number: 2013/455). CIMH: UMM Universitatsmedizin Mannheim, Medizinische Ethik Commission II (UMM University Medical Mannheim, Medical Ethics Commission II) (ID/reference number: 2014-540 N-MA). UCBM: Università Campus Bio-Medico di Roma, Comitato Etico (University Campus Bio-Medical Ethics Committee of Rome) (ID/reference number: 18/14 PAR ComET CBM).
Consent for publication
JT is a paid consultant to F. Hoffmann-La Roche Ltd. TC has served as a paid consultant to F. Hoffmann-La Roche Ltd. and Servier and has received royalties from Sage Publications and Guilford Publications. TB served in an advisory or consultancy role for Lundbeck, Medice, Neurim Pharmaceuticals, Oberberg GmbH, Takeda, and Infectopharm. He received conference support or speaker’s fee by Lilly, Medice, and Takeda. He received royalties from Hogrefe, Kohlhammer, CIP Medien, Oxford University Press; the present work is unrelated to these relationships. AM-L has received consultant fees in the past two years from Boehringer Ingelheim, Elsevier, Lundbeck Int. Neuroscience Foundation, Lundbeck AS, The Wolfson Foundation, Thieme Verlag, Sage Therapeutics, von Behring Stiftung, Fondation FondaMental, Janssen-Cilag GmbH, MedinCell, Brain Mind Institute, CISSN. Furthermore, he has received speaker fees from Italian Society of biological Psychiatry, Merz-Stiftung, Forum Werkstatt Karlsruhe, Lundbeck SAS France, BAG Psychiatrie Oberbayern. JB has been in the past 3 years a consultant to/member of advisory board of/and/or speaker for Takeda/Shire, Roche, Medice, Angelini, Janssen, and Servier. He is not an employee of any of these companies, and not a stock shareholder of any of these companies. He has no other financial or material support, including expert testimony, patents, royalties. EL is an Associate Editor at Molecular Autism. DM has been paid for advisory board work by F. Hoffmann-La Roche Ltd. and Servier, and for editorial work by Springer. The other authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Meyer-Lindenberg, H., Moessnang, C., Oakley, B. et al. Facial expression recognition is linked to clinical and neurofunctional differences in autism.
Molecular Autism13, 43 (2022). https://doi.org/10.1186/s13229-022-00520-7