Towards robust and replicable sex differences in the intrinsic brain function of autism

Background Marked sex differences in autism prevalence accentuate the need to understand the role of biological sex-related factors in autism. Efforts to unravel sex differences in the brain organization of autism have, however, been challenged by the limited availability of female data. Methods We addressed this gap by using a large sample of males and females with autism and neurotypical (NT) control individuals (ABIDE; Autism: 362 males, 82 females; NT: 409 males, 166 females; 7–18 years). Discovery analyses examined main effects of diagnosis, sex and their interaction across five resting-state fMRI (R-fMRI) metrics (voxel-level Z > 3.1, cluster-level P < 0.01, gaussian random field corrected). Secondary analyses assessed the robustness of the results to different pre-processing approaches and their replicability in two independent samples: the EU-AIMS Longitudinal European Autism Project (LEAP) and the Gender Explorations of Neurogenetics and Development to Advance Autism Research. Results Discovery analyses in ABIDE revealed significant main effects of diagnosis and sex across the intrinsic functional connectivity of the posterior cingulate cortex, regional homogeneity and voxel-mirrored homotopic connectivity (VMHC) in several cortical regions, largely converging in the default network midline. Sex-by-diagnosis interactions were confined to the dorsolateral occipital cortex, with reduced VMHC in females with autism. All findings were robust to different pre-processing steps. Replicability in independent samples varied by R-fMRI measures and effects with the targeted sex-by-diagnosis interaction being replicated in the larger of the two replication samples—EU-AIMS LEAP. Limitations Given the lack of a priori harmonization among the discovery and replication datasets available to date, sample-related variation remained and may have affected replicability. Conclusions Atypical cross-hemispheric interactions are neurobiologically relevant to autism. They likely result from the combination of sex-dependent and sex-independent factors with a differential effect across functional cortical networks. Systematic assessments of the factors contributing to replicability are needed and necessitate coordinated large-scale data collection across studies.


Background
Autism spectrum disorder (autism) is characterized by a marked male preponderance in prevalence with three times more males being diagnosed than females [1]. This pronounced sex-differential prevalence implies that sexrelated biological factors are likely implicated in the neurobiology of autism. However, little is known about the differential underlying neural expressions in males and females with autism. Such knowledge could widen our understanding of potential underlying mechanisms of autism and related neurodevelopmental conditions [2].
This has motivated research into the impact of biological sex on brain organization in autism [2][3][4][5]. With the widely accepted view that the neurobiology of autism involves differences in large-scale brain networks [6,7], resting-state functional magnetic resonance imaging (R-fMRI) has proven to be a valuable complementary tool for investigating atypicalities in intrinsic functional connectivity (iFC). While the exact nature of the intrinsic brain organization in autism remains to be established [6], research on the impact of biological sex differences in autism is just beginning to emerge.
Several R-fMRI studies have focused on autism-related sex differences in iFC [2,[8][9][10][11][12][13][14][15][16][17][18]. They vary on the extent of the functional networks and intrinsic properties examined. Most of them examined the strength of iFC between one or more regions/networks selected a priori [8-12, 14, 15], or via data-driven analyses [18]. A few others investigated either local or homotopic iFC across the whole brain [2,13,17]. Across these different efforts, the pattern of findings have also been mixed; some studies supported the predictions from the 'extreme male brain' theory [12,13], whereas others supported the predictions from the 'gender-incoherence' theory [8,11,14,15,18]. The 'extreme male brain' model predicts that brain characteristics in males and females with autism will resemble those in neurotypical males (i.e., shifts towards maleness in both sexes [19]). R-fMRI results consistent with a shift towards maleness in autism were reported in both Ypma et al. [12] and Kozhemiako et al. [13,17]. The 'gender incoherence' model predicts that brain characteristics in females with autism resemble those of neurotypical males, whereas brain characteristics in males with autism resemble those of neurotypical females (i.e., androgynous patterns in the sexes [20]). The 'gender incoherence' model has been supported by findings from prior R-fMRI studies [8,11,14], where the results largely revealed hyper-connectivity in females with autism similar to neurotypical (NT) males and hypo-connectivity in males with autism similar to NT females. Such seemingly inconsistent findings of sex-related differences were in part addressed by Floris et al. [2] who showed that, at least in males with autism, distinct patterns of atypical sex-differentiation coexist, and vary as a function of the neural networks involved. However, the intrinsic brain organization in females with autism has remained largely unclear and the scarce availability of female datasets in most studies may have contributed to the variability in findings in males and females [21,22].
Beside the role of small female samples, prior inconsistencies in autism-related sex differences in R-fMRI can be due to other factors that impact reproducibility. For example, while there are growing concerns on the role of pre-processing strategies [44,45], a recent study showed that their impact on autism-related mean group-differences is minimal [46]. Additionally, while several studies have reported some degree of consistency on R-fMRI findings across either independent, or partially overlapping, samples [41,[47][48][49], results from other studies have raised concerns on the replicability of group-mean diagnostic effects [46,50]. However, none of these studies have explicitly examined robustness and replicability of sex-by-diagnosis interaction effects, which account for a potentially relevant source of variability in autismbiological sex. Thus, we conducted secondary analyses to assess the extent to which the pattern of findings obtained in our discovery analyses were also observed (a) after applying different nuisance pre-processing steps that have been previously validated, though used inconsistently in the autism literature [46], and (b) across two independent, multisite R-fMRI datasets: the EU-AIMS Longitudinal European Autism Project (LEAP) [51,52] and the Gender Explorations of Neurogenetics and Development to Advance Autism Research (GENDAAR) dataset [53]-i.e., robustness and replicability.

Discovery sample: ABIDE I and II
For discovery analyses, we examined the R-fMRI dataset with one of the largest number of females and males in both the autism and the NT groups available to date, selected from the Autism Brain Imaging Data Exchange (ABIDE) repositories ABIDE I and II [22,23]. The final ABIDE I and II dataset of N = 1019 included N = 82 females with autism, N = 362 males with autism, N = 166 neurotypical females (NT F), and N = 409 neurotypical males (NT M), aggregated across 13 sites. Specific selection criteria are described in Supplementary Methods in the Additional file 1 and depicted as a figure in the Additional file 1: Figure S1. Briefly, we selected cases between 7 and 18 years of age (the ages most represented across ABIDE sites), with MRI data successfully completing brain image co-registration and transformation to standard space, with FIQ between 70 and 148 and with mean framewise displacement (mFD) [54] within three times the interquartile range (IQR) + the third quartile (Q3) of the sample (i.e., mFD 0.39 mm). Further steps included matching for mean age across groups, as well as for mFD and IQ within diagnostic groups. This latter step limited the number of exclusions while keeping average group motion low (mFD < 0.2 mm) and sampling biases that may result when matching neurodevelopmental conditions to NT around intrinsic features such as IQ [55,56]. At each step, any sites with less than three individual datasets per diagnostic/sex groups were excluded. Demographics and characteristics of this sample are summarized in Table 1 and in Supplementary Methods in the Additional file 1.

Discovery analysis pre-processing pipeline
We examined five whole-brain R-fMRI metrics previously reported to reflect typical sex differences and found to be atypical in autism, including (1) PCC-iFC, (2) VMHC, (3) ReHo, (4) DC and (5) fALFF (see Additional file 1: Supplementary Methods). R-fMRI image pre-processing steps included: slice time correction, 24 motion parameters regression [57], component-based noise reduction (CompCor) [58], removal of linear and quadratic trends, and band-pass filtering (0.01-0.1 Hz, for all metrics but fALFF). Functional-to-anatomical co-registration was achieved by Boundary Based Registration (BBR) using FSL FLIRT [59]. Linear and nonlinear spatial normalization of functional echo planar images (EPIs) to Montreal Neurological Institute 152 (MNI152) stereotactic space (2 mm 3 isotropic) was done using ANTS registration (Advanced Neuroimaging Tools) [60]. Computation of voxel-mirrored homotopic connectivity (VMHC) followed registration to a symmetric template. All R-fMRI derivatives were smoothed by a 6 mm FWHM Gaussian kernel. To account for site and collection time variability across each of the data collections in ABIDE I and II data repositories, site effects were removed using the ComBat function available in python [61] (https ://githu b.com/ brent p/comba t.py). This approach has been shown to effectively account for scanner-related variance in multisite R-fMRI data [61]. For further details see Additional file 1: Supplementary Methods.

Discovery group-level analyses
Statistical Z-maps were generated within study-specific functional volume masks including all voxels in MNI space present across all subjects. Main effects of diagnosis and sex along with their interaction were explored by fitting a general linear model (GLM) including diagnosis or/and sex as the regressors of interest respectively, and age and mean framewise displacement (mFD) [54] as nuisance covariates. In primary analyses, we did not include FIQ as a covariate as this is thought to be suboptimal when comparing groups selected from populations carrying intrinsic IQ differences such as autism and NT [56]. Nevertheless, to provide an indication as to whether IQ may affect primary findings, in supplementary Table 1 Characterization of sample merged across ABIDE I and II Cluster masks are overlaid on inflated brain maps generated by BrainNet Viewer ABIDE Autism Brain Imaging data exchange, ADI-R Autism Diagnostic Interview-Revised, ADOS-2 Autism Diagnostic Observation Schedule-2, ASD Autism Spectrum Disorder, CSS Calibrated Severity Score, F females, IQ intellectual quotient, M males, Mean FD mean framewise displacement [54]; NT neurotypical, RRB restricted repetitive behaviors a ABIDE I data collections: KKI, Leuven2, NYU, OHSU, Pitt, SDSU, Stanford, UCLA1, UM1, and Yale. ABIDE II data collections: ABIDEII-GU1, ABIDEII-KKI1, ABIDEII-KKI2, ABIDEII-NYU1, ABIDEII-OHSU1, ABIDEII-SDSU1, ABIDEII-UCD1 and ABIDEII-UCLA1. KKI and ABIDEII-KKI1, NYU and ABIDEII-NYU1, SDSU and ABIDEII-SDSU1, OHSU and ABIDEII-OHSU1 and UCLA1 and ABIDEII-UCLA1 were merged into one site across ABIDE I and ABIDE II collections b FIQ was available for 362 males with ASD (2 missing from UM1, ABIDEII-SDSU1),

Mean (SD) [Range] Mean (SD) [Range] Mean (SD) [Range] Mean (SD) [Range]
Age 13 analyses FIQ was also included as an additional nuisance regressor. We applied gaussian random field theory correction based on strict voxel-level threshold of Z > 3.1 as recommended by [62] and cluster level P < 0.01, given the assessment of five R-fMRI metrics in the same study (i.e., P 0.05/5 R-MRI metrics = 0.01).

Functional relevance of sex differences in autism
Post-hoc analyses were conducted to functionally characterize the sex-by-diagnosis interaction result(s). First, to explore the cognitive domains implicated in the cluster(s), we quantified the percentage of its overlap with 12 cognitive ontology maps [63] thresholded at P = 1e−5.
We labelled these components based on the top five tasks each component recruits [2]. Second, we used the Neurosynth Image Decoder (http://neuro synth .org/decod e/) [64] to visualize the terms most strongly associated with the significant cluster. After excluding anatomical (e.g., occipital) and redundant terms (synonyms [e.g., saccades and eye movements], plurals [e.g., object and objects] or noun/adjective/adverb equivalents [e.g., vision and visual]), we visualized the top 27 terms showing correlations with the cluster map between r = 0.64 and r = 0.10. Third, to explore potential clinical relevance of the significant cluster, we explored brain-behavior relationships as a function of sex within the autism group. Specifically, we ran a GLM examining the interaction between biological sex and available ADOS calibrated severity total score (CSS) [65], as well as social-affect (SA) and restricted, repetitive behavior (RRB) subscores (see Additional file 1: Supplementary Methods) with the dependent variable(s) being the R-fMRI metric(s) extracted from the cluster mask(s) showing a statistically significant sex-by-diagnosis effect(s).

Robustness and replicability Robustness
We assessed whether patterns of results from the discovery analyses were observable with two other nuisance regression analytical pipelines that include commonly used data preprocessing steps. One pipeline included global signal regression (GSR) [66] which has often been used in autism studies; the other included Independent Component Analysis-Automatic Removal of Motion Artifacts (ICA-AROMA) [67] which is a relatively novel but increasingly utilized approach [46]. Given the scope of the present study, unlike prior work focusing on a wide range of individual preprocessing pipelines [46], we selected GSR and ICA-AROMA as examples of previously validated approaches thought to have impact on motion and physiological noise [45]. To assess robustness of the results observed in discovery analyses, following the voxel-level GLM, we extracted means from the masks corresponding to the same clusters that showed significant effects. These values were averaged across all the voxels in the cluster mask for a given R-fMRI metric. We used them to implement a full regression model including the predictors of interest (sex, diagnosis and their interaction), as well as age and mFD as nuisance regressors and compute effect sizes as partial eta squared (η p 2 ) and their confidence intervals using the R-package 'effectsize' . For visualization purposes we also used regressions (including sex, diagnosis, sex-by-diagnosis interaction, age, and mFD) to obtain the residuals of these mask-averaged values.

Replicability
Similarly, we assessed whether the group patterns observed in significant clusters identified in discovery analyses were observed in two relatively large-scale, independent datasets selected from (a) the EU-AIMS Longitudinal European Autism Project (LEAP), a large multi-site European initiative aimed at identifying biomarkers in autism [51,52] and (b) the Gender Explorations of Neurogenetics and Development to Advance Autism Research (GENDAAR) dataset collected by the GENDAAR consortium and shared in the National Database for Autism Research [53]. For details on autism and NT inclusion and exclusion criteria for these samples, as well as our selection process, see Additional file 1: Supplementary Methods [52,53]. The resulting EU-AIMS LEAP (N = 309) R-fMRI datasets comprised N = 133 males and N = 43 females with autism as well as N = 85 NT males, and N = 48 NT females (see Additional file 1: Table S1); resulting GENDAAR (N = 196) R-fMRI datasets comprised N = 43 males and N = 44 females with autism, as well as N = 56 NT males and N = 53 NT females (see Additional file 1: Table S2). For a comparison of demographic and clinical information between ABIDE, EU-AIMS LEAP and GENDAAR, see Table S3, S4 and S5 in the Additional file 1. After applying the same ComBat and pre-processing pipeline as used in the ABIDE-based discovery analyses, we extracted each of the R-fMRI metrics means from the Z-maps. As for robustness, we extracted values for each R-fMRI metric from the masks corresponding to the clusters showing statistically significant effects in discovery analyses and computed the corresponding effect sizes and residuals using the same methods described above.
For both robustness and replicability, discovery findings were determined to be robust and/or replicable (R+) based on two criteria: (1) the group mean difference(s) observed were in the same direction as those identified in the findings from discovery analyses [68] and (2) their effects were not negligible as defined by partial eta squared (i.e,.η p 2 ≥ 0.01 any small, medium or large effects were R+) which is also consistent with prior work [41]. Finally, for consistency across analyses we also computed cluster-level effect sizes of the discovery findings using the same approach described above.

Discovery analyses: ABIDE Main effects of diagnosis
Analyses revealed a total of seven clusters showing a significant effect of diagnosis (voxel-level Z > 3.1; cluster-level P < 0.01, corrected) for three of the five R-fMRI metrics: PCC-iFC (three clusters), VMHC (two clusters) and ReHo (two clusters); Fig. 1 Figure S2 and Table S6).
Autism-related hyper-connectivity was only evident for PCC-iFC with left superior lateral occipital cortex, temporal occipital fusiform cortex and occipital fusiform gyrus (Additional file 1: Figure S2). These results remained essentially unchanged when additionally controlling for FIQ (Additional file 1: Figure S3). Further, to verify that these findings were not driven by particular acquisition site(s), post-hoc analyses computed group means for diagnostic subgroups for the R-fMRI metrics extracted at the cluster-level masks excluding one out of the 13 ABIDE sites at a time. The pattern of results was essentially unchanged (Additional file 2: Figure S5a).

Main effects of sex
Analyses revealed clusters showing statistically significant main sex differences (voxel-level Z > 3.1; cluster-level P < 0.01, corrected), again for three R-fMRI metrics out of five in a total of 10 clusters: PCC-iFC (five clusters), VMHC (three clusters), and ReHo (two clusters). Findings involved lateral and medial portions of the DN with bilateral PCC and precuneus showing the highest overlap ( Fig. 1 and Additional file 1: Figure S2). Specifically, Fig. 1 Overlap across R-fMRI metrics for main effects of diagnosis and sex. Upper panel: the surface inflated maps depict the extent of overlap across clusters showing significant main effects of diagnosis (left) and sex (right) across any of three resting state fMRI (R-fMRI) metrics showing statistically significant effects (Z > 3.1, P < 0.01). Purple clusters represent areas of significant group differences emerging for only one of any of the three R-fMRI measures, orange and yellow clusters indicate measures with overlap among 2 and 3 R-fMRI measures (see Additional file 1: Figure S2 for statistical maps of main effects for each R-fMRI metric). Cluster masks are overlaid on inflated brain maps generated by BrainNet Viewer. Lower panel: For each of the yellow and orange clusters in panel A, the table lists the cluster's anatomical label based on the Harvard Oxford atlas, the specific R-fMRI metrics involved, and the group difference direction (ASD < NT or M < F in blue, ASD > NT or M > F in red). L left hemisphere, R right hemisphere, PCG/FP paracingulate cortex/frontal pole, ACC anterior cingulate cortex, PCC/Prec posterior cingulate cortex/precuneus, ASD autism spectrum disorder, NT neurotypical, M males, F females regardless of diagnosis, relative to females, males showed decreased PCC-iFC with paracingulate cortex and frontal pole, right middle frontal gyrus, bilateral superior lateral occipital cortex and bilateral PCC and precuneus. Males also showed decreased VMHC and ReHo localized in PCC and precuneus. Decreased ReHo was also evident in the left angular gyrus and lateral occipital cortex in females relative to males (Fig. 1, Additional file 1: Figure S2 and Table S6). These results remained essentially unchanged when additionally controlling for FIQ (Additional file 1: Figure S3). Post-hoc analyses assessing the consistency of these findings across sites, as described above, revealed a similar pattern of results (Additional file 3: Figure S5b).

Sex-by-diagnosis interaction effect
Statistically corrected voxel-wise analyses (voxel-level Z > 3.1; cluster-level P < 0.01) revealed one cluster of significant sex-by-diagnosis interaction only for VMHC which was localized in the dorsolateral occipital cortex ( Fig. 2a and Additional file 1: Table S6). Post-hoc clusterlevel group means showed that NT females had higher VMHC than the three other groups, whereas autism females had lower VMHC than the three other groups (Fig. 2a). Similar to the main effects, results remained essentially unchanged when additionally controlling for FIQ (Additional file 1: Figure S3). Analyses assessing the consistency of these findings across sites, as described above, showed a similar pattern of results (Additional file 1: Figure S4).

Functional relevance of autism-related sex differences
Post-hoc analyses to functionally characterize this VMHC sex-by-diagnosis interaction indicated that the VMHC cluster in superior lateral occipital cortex overlapped with cognitive maps involved in higher-order visual, oculomotor, cognitive flexibility and language-related processes (Fig. 3a). Further, as shown in Fig. 3b, the most common terms were primarily related to lower-order visual processing and higher-order visual cognition, such as 'visuospatial' and 'spatial attention. ' To explore potential clinical relevance of the VMHC dorsolateral occipital cluster, we explored brain-behavior relationships as a function of sex, within the autism group using three available ADOS scores (calibrated severity total score, and non-calibrated social affect and RRB subscores; see Additional file 1: Supplementary Methods). Although not surviving a strict Bonferroni correction for multiple testing (i.e., 0.05/3 = 0.02), an interaction effect was observed for ADOS social affect scores. It revealed that more severe social affect deficits (F (1,311) = 4.44, p = 0.036) were associated with decreased VMHC in females with autism (r = − 0.29), but not in males with autism (r = 0.03).
Given that ABIDE data were aggregated and released when calibrated social affect scores [69] were not available to assess potential differences in language abilities and age, analyses were repeated after including ADOS module (ADOS Module 2 to 4) as a nuisance covariate: results remained unchanged (F (1,306) = 5.0, p = 0.026) as they did also after removing the few data with the less represented ADOS module 2 (see Additional file 1: Supplementary Methods). There were no significant findings, with regard to the CSS total score and non-calibrated RRB sub-score (Fig. 3c), even at an exploratory statistical threshold of P < 0.05.

Robustness
The same pattern of results identified in discovery analyses was observed when the datasets were pre-processed using GSR or ICA-AROMA, across the three R-fMRI metrics in all the clusters identified in the primary analyses across main effects of diagnosis, sex and their interaction; effect size ranges from small to moderate as in discovery analyses (η p 2 range = 0.01-0.07; Fig. 2b, Fig. 4, Additional file 1: Table S6 and Additional file 3: Figure  S6).

Main effects of diagnosis
Main effects of diagnosis showed higher replicability (i.e., non-negligible η p 2 effects showing a similar group mean pattern as observed in discovery analyses) in GENDAAR than in EU-AIMS LEAP. Specifically, across the three R-fMRI metrics that showed significant diagnostic group differences in discovery analyses, six of the seven clusters (86%) in GENDAAR were replicated (η p 2 range = 0.01-0.04); only two of those seven (29%) were replicated in EU-AIMS LEAP (η p 2 range = 0.01-0.04). Nevertheless, clusters showing decreased ReHo in ASD versus NT across the insula and central operculum, as well as in the frontal pole were replicated across all samples (Fig. 4, Additional file 1: Table S6 and Additional file 3: Figure  S7a).

Main effects of sex
Across all three R-fMRI metrics, the main effects of sex observed in the discovery analyses were evident in both independent samples, for most clusters in the EU-AIMS LEAP (80%; 8/10) with effects size ranging from small to moderate (η p 2 range = 0.01-0.06) and for half of the clusters in GENDAAR (50%; 5/10), albeit with small effects (η p 2 range = 0.01-0.02) (Fig. 4, Additional file 1: Table S6 and Additional file 3: Figure S7b). Notably, the pattern of typical sex differences localized along the default network midline (i.e., decreased VMHC, ReHo and PCC-iFC) was replicated across both independent samples (Fig. 4b).

Sex-by-diagnosis interaction effect
The pattern of autism-related VMHC sex differences observed in discovery analyses in the superior lateral occipital cortex was observed in the EU-AIMS LEAP dataset, (η p 2 = 0.01) (Fig. 2c, Fig. 4, Additional file 1: Table S6). In the GENDAAR dataset, while group means in males with autism, NT males and NT females showed a similar direction as in the ABIDE discovery For all graphs VMHC data are shown as residuals obtained after regressing out mean framewise displacement and age effects. L left, R right, A anterior, P posterior findings, females with autism differed in magnitude and in the direction of group differences, resulting in a negligible effect with η p 2 < 0.01 (Fig. 2c).

Discussion
We examined autism-related sex differences for intrinsic functional brain organization across multiple R-fMRI metrics in a large discovery sample of males and females with autism relative to age-group matched NT selected from the ABIDE repositories [22,23]. Analyses revealed significant main effects of sex and diagnosis across intrinsic functional connectivity (iFC) of the posterior cingulate cortex, regional homogeneity and voxel-mirrored homotopic connectivity (VMHC) in several cortical regions. Notably, main effects converged along the midline of the default network. In contrast, sex-by-diagnosis interactions were limited to VMHC in the superior lateral occipital cortex. Placed in the context of sex and diagnostic main effects on interhemispheric homotopic connectivity in cortical regions, this result suggests that atypical interhemispheric interactions are pervasive in autism but reflect a combination of sex-independent (i.e., main effect of diagnosis common across sexes) and sex-dependent (i.e., sex-by-diagnosis interaction) effects, each specific to a different functional cortical system. This sex-by-diagnosis interaction effect was robust to distinct pre-processing strategies as those observed for main effects. Further, despite the lack of a priori harmonization for data acquisition among the three samples, this finding was replicable in the larger of the two independent samples (i.e., EU-AIMS LEAP). On one hand, this, together with largely replicable main effects of sex with variable replicability of main diagnostic effects by sample, suggests that inter-sample replicability of R-fMRI can be feasible in autism when sources of variability in diagnostic groups are accounted for in samples sized properly to address such variability. On the other hand, our results highlight the urgent need to obtain multiple harmonized datasets properly powered to systematically address and understand sources of heterogeneity, including and beyond the role of biological sex.

Sex-dependent and sex-independent atypical interhemispheric interactions in autism
VMHC reflects inter-hemispheric homotopic relations. The strength has been suggested to index coordinated cross-hemispheric processing: stronger VMHC indexes weaker hemispheric specialization and vice versa [33,70]. Several lines of evidence support the notion that the neurobiology of autism is related to atypical hemispheric interactions, including homotopic connectivity and hemispheric lateralization [35,[71][72][73][74][75][76][77][78][79][80]. VMHC and functional hemispheric lateralization have also been shown to be sex-differential in NT [33,81,82]. The dorsolateral occipital association cortex identified in our discovery analyses is known to serve hemispherically specialized processes, such as visuospatial coordination [83]. Thus, our findings of NT males' VMHC in dorsolateral occipital cortex being lower than that of NT females are consistent with the notion of increased hemispheric lateralization in this cortical region in NT males relative to NT females. In our data, females with autism instead showed even lower VMHC than NT males, while males with autism showed slightly higher VMHC than NT males. This pattern is indicative of 'gender-incoherence' [20] as males and females with autism display the opposite pattern expected in NT per their biological sex. Findings of 'gender incoherence' have been reported in earlier neuroimaging studies of autism using different modalities [3,84,85]. Among them, several R-fMRI studies explicitly focusing on detecting sex-by-diagnosis interactions (i.e., the regression model included a sex-bydiagnosis interaction term) [3,11] yielded a pattern of results consistent with 'gender incoherence. ' In contrast, other studies [12][13][14] reported a pattern consistent with the 'extreme male brain' model [19]-i.e., a shift towards maleness in both females and males with autism. While the seemingly diverging conclusions of these two sets of studies may be attributed to methodological differences, such as the extent of brain networks explored and the statistical modelling employed, findings from our prior work suggest that both shifts towards either maleness or femaleness co-occur in the intrinsic brain of males with autism, in a network-specific manner [2]. However, such prior work did not include female data. Thus, by not directly assessing sex-by-diagnosis interactions, unlike the present study, results could not point to patterns affecting diagnostic differences between the sexes versus those that are common to autism across sexes [4]. This is relevant for efforts focusing on identifying underlying mechanisms. Findings resulting from sex-by-diagnosis interactions may shed light on sex-differential mechanisms that are atypical in autism and may reflect sexspecific susceptibility mechanisms. On the other hand, atypicalities common for both sexes may reflect factors central to the emergence of autism, regardless of whether they overlap with patterns known to be differential between sexes [86]. Interestingly, a recent study based on a sample selected from GENDAAR [16] revealed that the iFC between the nucleus accumbens (selected a priori) and a region of the dorsolateral occipital cortex partially overlapping with that identified by our VMHC analyses, was differentially modulated by the aggregate number of oxytocin receptor risk alleles in females with autism versus NT females and versus males with autism. Although VMHC was not directly tested in the said study [16], its result in dorsolateral occipital cortex is consistent with our observation of atypical sexual differentiation of this visual network region and, together, suggest the need for future whole-brain studies of oxytocin effects in autism. Along with the sex-dependent autism patterns, our analyses found statistically significant main effects of diagnosis in inter-hemispheric interactions indexed by VMHC in distinct cortical circuits. These were localized along the midline of the DN (paracingulate/frontal cortex consistently and PCC/precuneus) where main effects of PCC-iFC and ReHo also converged. Our results are consistent with prior reports of atypical intrinsic organization of the DN in autism [12,23,26,[87][88][89]. Together they support a common, sex-independent role of DN in autism. This is also highlighted by a recent autism neurosubtyping study that identified three latent iFC factors, all sharing DN atypicalities along with their neurosubtype-specific patterns [90]. Building on this evidence to disentangle the specific role of each of the factors affecting autism in sex-independent and sex-dependent ways, a necessary next step is to engage in novel large-scale data collection efforts including more female data.

Robustness, replicability and sources of variability
The growing awareness of the replication crisis in neuroscience [91][92][93] motivated our analyses examining robustness and replicability of findings. While a comprehensive and systematic reproducibility assessment is beyond the scope of the present study, here we focused on examining whether the findings observed in the discovery analyses were also seen after using different preprocessing pipelines-robustness-as well as in fully independent, albeit of convenience samples (i.e., not harmonized a priori with each other)-replicability. To this end, given the lack of consensus on quantitative metrics of replicability, we opted to use measures of effect size. These are considered complementary to null hypothesis significance testing [94]. In the context of this study, given the use of convenience samples of different sizes, their selection was considered an advantageous and practical means to provide information on the magnitude of group differences in diagnosis and sex, as well as their interaction. Here, we considered findings to be robust and/or replicable for any non-negligible effects (i.e., η p 2 ≥ 0.01 [95]). We reasoned that given their distributed and heterogenous nature [6], atypicalities in the autism connectome can stem from a combination of differently sized non-negligible effects, as shown for autism in other biological domains such as genetics [96,97].
With this in mind, across the two preprocessing methods examined here, the patterns of findings were consistent with those observed in discovery analyses across all R-fMRI metrics and effects. These robust results are consistent with a prior study by He and colleagues [46] reporting that differences in a wider range of pre-processing pipelines have marginal effects on variation in diagnostic group average comparisons. Our study confirms and builds on this earlier report by extending findings of robustness to sex group mean differences and their interactions with diagnosis.
A more nuanced picture emerged from the intersample analyses as replicability varied by sample, across the effects and R-fMRI metrics examined. Specifically, while inter-sample main effects of sex were moderately to largely replicable across R-fMRI metrics on both independent samples (~ 50 to 80% of the clusters in GENDAAR and EU-AIMS-LEAP, respectively), replicability of diagnostic effects significantly varied by sample (86 to 29%) across R-fMRI metrics. This is at least in part consistent with findings by King et al. [50] who showed that, depending on the R-fMRI feature examined, diagnostic group differences varied across samples. Even in this scenario, King et al. [50] also reported that findings of decreased homotopic connectivity in autism were relatively more stable than other R-fMRI metrics. This observation, combined with the replicability of our VMHC sex-by-diagnosis interaction findings in the larger of the two independent samples (EU-AIMS LEAP), suggests that measures of homotopic connectivity may have specific biological relevance for autism. It is also possible that given the moderate to high test-retest reliability, VMHC is more suitable in efforts assessing replicability [98,99].
The striking clinical and biological heterogeneity in autism should be considered as a major contributor to discrepancies in findings of studies focusing on the main effects of diagnostic group means contrasts/interactions [100][101][102][103]. Against this background, we interpret our replicability findings on diagnostic effects and, in turn, diagnosis-by-sex interactions. Inter-sample differences may have contributed to the more variable results of replicability on the diagnosis main effects. These may include autism symptom level, age, and IQ, albeit secondary analyses suggested that the examined IQ range did not substantially affect the pattern of discovery results. For example, the EU-AIMS LEAP sample was on-average older, had lower VIQ and most notably, lower symptom severity across all subscales of the ADOS and ADI-R than the ABIDE sample. On the other hand, the GENDAAR sample (which has greater number of replicable diagnostic mean group patterns) did not differ from ABIDE in these variables, except for mean age. Furthermore, a fact that is often neglected, is that the NT groups may also present with considerable sample heterogeneity between studies [100,104]. For instance, our NT controls in the EU-AIMS LEAP sample had lower VIQ than both ABIDE and GENDAAR NT controls. This has potentially influenced the low replicability of diagnosis main effects in EU-AIMS LEAP.
In contrast, sex-by-diagnosis interaction effect on VMHC in the dorsolateral occipital cortex was replicable in the larger sample, the EU-AIMS LEAP, but not in GENDAAR. Small samples introduce larger epistemic variability (i.e., greater variation related to known and unknown confounds) [105]. Increasing the number of subjects/data allows mitigating epistemic variability and, thus, capturing the underlying variability of interest. Thus, although the rate of replicability for the main effect of diagnosis in EU-AIMS LEAP was limited, accounting for biological sex, a known key source variability in autism, may have substantiated a replicable sex-by-diagnosis pattern in this larger sample. In line with sample size concerns, using four datasets sized between 36 and 44 individuals selected from the ABIDE repository, He et al. [46] found low similarity rates of diagnostic grouplevel differences on the strength of iFC edges in contrast with the largely similar pattern of results across pipelines. Of note, unlike prior efforts [46,50], we controlled for site effects within each of the samples (i.e., ABIDE, GENDAAR and EU-AIMS LEAP), using ComBat. Future large-scale harmonized data collections are needed to control and assess the impact of inter-sample variability. Taken together, these findings highlight that sample differences can impact replicability.
Beyond clinical and biological sources of variation, samples may differ in MRI acquisition methods, as well as in approaches used to mitigate head motion during data collection and its impact on findings [106]. Adequately controlling for head motion remains a key challenge for future studies assessing inter-sample replicability. For the present study, we excluded individuals with high motion, retained relatively large samples with group average low motion (mean ± standard deviation of mFD range = 0.09-0.16 ± 0.06-0.10 mm), as well as included mFD at the second-level analyses as a nuisance covariate. Overall, the extent to which each sample-related factor affects replicability needs to be systematically examined in future well-powered studies. Only this type of studies will allow for emerging subtyping approaches to dissect heterogeneity by brain imaging features using a range of data-driven methods [107,108], including normative modelling [72,109,110].
Inter-sample differences and methodological differences, beyond nuisance regression, may have contributed to some differences in findings between the present and earlier studies, conducted with independent or partially overlapping samples [11,23,41]. For example, Alaerts et al. [11] also examined sex-by-diagnosis interaction in PCC-iFC in a dataset selected from ABIDE I only. Although their pattern of results was consistent with the 'gender incoherence' model, the resulting circuit(s) did not involve the dorsolateral occipital cortex as identified with VMHC in the present study. Along with differences in samples selected from the same data repositories, other methodological choices may also affect results.
For example, prior studies differed with the present one in the inclusion of sex-by-diagnosis interaction [17], the extent of the whole-brain voxel-based analyses [15,16], or the statistical threshold utilized [23]. Nevertheless, it is remarkable that even in light of these differences, consistent results have emerged including the overarching atypical inter-hemispheric interactions in autism, and sex-dependent and sex-independent atypical intrinsic brain function across distinct functional networks.

Limitations
Along with the inter-sample differences resulting from the lack of sufficiently available harmonized multi-site replication datasets in the field, other limitations of this study should be addressed in future efforts. One regards the lack of measures differentiating the effects of sex versus those of gender so as to disentangle their relative roles (e.g., gender-identity and gender-expression) in the intrinsic brain properties [111]. Further, in-depth cognitive measures to directly characterize the role of VMHC findings were not available. Additional behavioral measures are needed to establish whether our result in VMHC of the dorsolateral occipital cortex mainly applies to low-level (bottom-up) visual processing differences or higher-level (top-down) attentional/controlled processes in males and females with autism. As a neurodevelopmental condition, autism shows striking inter-individual differences in clinical and developmental trajectories, as well as outcomes. Thus, age may influence symptom presentation [112,113] and neurobiology [13,110,114]. Despite the considerable size of the samples available for this study, it is still difficult to sufficiently cover a broad age range across both males and females and diagnostic groups across contributing sites, and to evaluate age effects appropriately. Even larger crosssectional samples are needed to derive meaningful agerelated information that ultimately requires confirmation in longitudinal studies. Such longitudinal studies would allow to examine the potential impact of puberty and related surge of sex steroid hormones reported in NT boys and girls [115], in autism specifically. Here, we have no direct measure of puberty other than age, but future studies should aim to include such measures. Further, the value of a large-scale and publicly available multi-site resource such as ABIDE also comes with unavoidable site differences which must be considered in data selection, analyses and interpretation of results. Although residual site-related effects may have remained in findings even after using the novel Bayesian approach for correcting for batch-effects, replicability in independent samples suggest that effects are not simply driven by site variability. These results are consistent with earlier reports of reproducible imaging biomarkers even when accounting for inter-site differences in multisite datasets such as the ABIDE I repository [47]. Finally, despite the advantages of effect sizes over p-value when comparing independently collected samples of different sizes and potentially different variances, it is important to acknowledge that they are not without limitations [116] and, thus, should be interpreted with caution. Similar to p values, they are influenced by sample sizes and have the equivalent risks of p-hacking. Finally, the standard errors or effect sizes can be large-a concern we addressed by reporting their confidence intervals.

Conclusions
The present work revealed sex differences in the intrinsic brain of autism, particularly in dorsolateral occipital interhemispheric interactions, which were robust to pre-processing pipeline decisions and replicable in the larger of the two independent samples. While differences in nuisance regression pipelines have little influence on the consistency of findings, sample heterogeneity represents a challenge for replicability of findings. Lateralized cognitive functions and crosshemispheric interactions should be further explored in relation to sex differences in autism while addressing this challenge with future harmonized data acquisition efforts with even larger samples.