Placebo response in pharmacological and dietary supplement trials of autism spectrum disorder (ASD): systematic review and meta-regression analysis

Background Placebo response in autism spectrum disorder (ASD) might dilute drug-placebo differences and hinder drug development. Therefore, this meta-analysis investigated placebo response in core symptoms. Methods We searched ClinicalTrials.gov, CENTRAL, EMBASE, MEDLINE, PsycINFO, WHO-ICTRP (up to July 8, 2018), and PubMed (up to July 4, 2019) for randomized pharmacological and dietary supplement placebo-controlled trials (RCTs) with a minimum of seven days of treatment. Single-group meta-analyses were conducted using a random-effects model. Standardized mean changes (SMC) of core symptoms in placebo arms were the primary outcomes and placebo positive response rates were a secondary outcome. Predictors of placebo response were investigated with meta-regression analyses. The protocol was registered with PROSPERO ID CRD42019125317. Results Eighty-six RCTs with 2360 participants on placebo were included in our analysis (87% in children/adolescents). The majority of trials were small, single-center with a duration of 8–12 weeks and published after 2009. Placebo response in social-communication difficulties was SMC = − 0.32, 95% CI [− 0.39, − 0.25], in repetitive behaviors − 0.23[− 0.32, − 0.15] and in scales measuring overall core symptoms − 0.36 [− 0.46, − 0.26]. Overall, 19%, 95% CI [16–22%] of participants were at least much improved with placebo. Caregiver (vs. clinician) ratings, lower risk of bias, flexible-dosing, larger sample sizes and number of sites, less recent publication year, baseline levels of irritability, and the use of a threshold of core symptoms at inclusion were associated with larger placebo response in at least a core symptom domain. Limitations About 40% of the trials had an apparent focus on core symptoms. Investigation of the differential impact of predictors on placebo and drug response was impeded by the use of diverse experimental interventions with essentially different mechanisms of action. An individual-participant-data meta-analysis could allow for a more fine-grained analysis and provide more informative answers. Conclusions Placebo response in ASD was substantial and predicted by design- and participant-related factors, which could inform the design of future trials in order to improve the detection of efficacy in core symptoms. Potential solutions could be the minimization and careful selection of study sites as well as rigorous participant enrollment and the use of measurements of change not solely dependent on caregivers.


(Continued from previous page)
Conclusions: Placebo response in ASD was substantial and predicted by design-and participant-related factors, which could inform the design of future trials in order to improve the detection of efficacy in core symptoms. Potential solutions could be the minimization and careful selection of study sites as well as rigorous participant enrollment and the use of measurements of change not solely dependent on caregivers.
Keywords: Autism spectrum disorder, Placebo, Trials Background Autism spectrum disorder (ASD) is a group of heterogeneous neurodevelopmental conditions, characterized by social-communication difficulties as well as repetitiverestricted behaviors and sensory abnormalities [1]. The prevalence is about 1-2% [2,3], and lifetime costs are substantial (at US $1.4-2.44 million per individual) [4]. Behavioral interventions are the cornerstone of treatment and there is still no approved medication for the core symptoms [5]. Despite that, about half of the individuals with ASD, who might be more susceptible to side effects than neurotypical populations [5], receive psychotropic drugs [6]. Currently approved medications target associated symptoms, e.g., aripiprazole and risperidone for irritability [5]. Therefore, there is an unmet need to develop effective and safe treatments that target causal pathophysiological pathways, improve core symptoms and quality of life.
In spite of the recent advances in "translational" research, late-stage clinical trials for neurodevelopmental disorders have failed [7]. The low success rate could be explained by several factors, such as poor translational validity of preclinical models, true lack of drug efficacy, and suboptimal trial design [8]. One concern is also that placebo effects might dilute effect sizes. However, the magnitude and predictors of placebo response in core symptoms of ASD are still unknown; only investigated in post-hoc analyses of single trials [9,10] and meta-analyses using aggregated outcome measures, potentially confounded by associated symptoms [11,12]. In summary, placebo response may play an important role in the failure of clinical trials and the subsequent lack of approved medications for core symptoms. In order to improve the design and sensitivity of future trials, we metaanalyzed placebo response of core symptoms in pharmacological and dietary supplement ASD trials.

Participants and interventions Participants
Participants with a diagnosis of ASD using standardized diagnostic criteria (e.g., DSM-III, ICD-10, or more recent versions) and/or validated diagnostic tools (e.g., ADI-R) [5]. There were no restrictions in terms of age, sex, ethnicity, setting, severity, or the presence of co-occurring conditions.

Interventions
Any pharmacological treatment or dietary supplement was eligible, when compared with placebo. We excluded psychological/behavioral and combination interventions (since placebo response might be confounded by the active component of the combination) as well as other interventions (e.g., elimination diets, milk formulations, or homeopathy). The minimum duration of treatment was 7 days, since we aimed to investigate a broad range of data but to exclude trials with a clearly very short duration, e.g., single-dose interventions. There was no restriction in terms of route of administration and dosingschedule.

Type of studies
Blinded and unblinded randomized placebo-controlled trials (RCTs) were eligible. In case of cross-over studies, we used only data from the first phase of the crossover to avoid carryover effects [14]. We excluded studies with placebocontrolled discontinuation or cluster randomization [15], published before 1980 or smaller than ten participants [16]. Risk of bias of included studies was evaluated by at least two independent reviewers (SS, OC, AR) using the Cochrane Collaboration risk-of-bias tool [17]. Disagreements were resolved by discussion, and if needed, a third author was involved (SL, JST). Studies with a high risk of bias in sequence generation or allocation concealment were excluded (e.g., allocation by alternation or by an unblinded investigator). Studies were further classified as having an overall low, moderate, or high risk of bias [18].

Outcome measures and data extraction
We investigated placebo response in core symptoms. The following primary outcomes, as measured by published scales, were analyzed: (1) social-communication difficulties (e.g., ABC-L/SW [28] or VABS-Socialization [29]), (2) repetitive-restricted behaviors (e.g., ABC-S [28] or CYBOC-PDD [30]), and (3) overall measures of core symptoms (e.g., SRS [31] or CARS [32]). There is no agreement on the optimal outcome measures to use in clinical trials of ASD and so preference was given to the aforementioned most frequently used scales (Additional file 3: eAppendix-5.3) [5,[33][34][35][36]. A higher score indicated more difficulties and when necessary, scores were minus-transformed. In the primary analysis, we pooled all studies by preferring ratings by clinicians (observations or interviews) to caregivers/teachers. Separate analyses by type of raters and positive response to treatment defined as at least much improvement in CGI-I, preferably anchored to global autism or core symptoms (when more than one CGI-I evaluations were reported), were analyzed as secondary outcomes. When the number of participants with a positive response was not reported, it was imputed from mean and standard deviation (SD) of CGI-I using a validated method (Additional file 3: eAppendix-2.2) [37,38].
At least two independent reviewers/contributors selected relevant records and extracted data from eligible studies in an Access database (SS, OC, IB, AR, AC, GD, MK, YZ, and TF). Intention-to-treat data were preferred when available, and for a positive response to treatment, if the original authors presented only the results of completer population, we assumed that participants lost to follow-up did not have a positive response to treatment. Missing SDs were calculated according to the following hierarchy from available statistics (e.g., SE, p values, t tests) [39], median/range [40], pooling subscales (e.g., SRS subscales, assuming a correlation of 0.5) [41], or using a validated imputation method [39,42]. Corresponding authors were contacted by e-mail for additional data, with a reminder e-mail in case of no response (complete list in Additional file 3: eAppendix-4).

Statistical analysis Synthesis of the results
Single-group meta-analyses of placebo arms were conducted using a random effects model [43]. The effect size for continuous outcomes (core symptoms) was the standardized mean change (SMC) with raw score standardization using the baseline SD of the placebo arm [44,45]. When baseline SDs were not reported, change or follow-up SDs were used. In the primary analysis, a common pre-post correlation of 0.5 [41] was used for the calculation of variance of SMC [44]. Positive response rates were logit transformed, and backtransformed for presentation [46]. Heterogeneity was evaluated by visual inspection of forest plots and with the χ 2 (p value < 0.1) and I 2 statistics (considerable heterogeneity when > 50%); χ 2 might detect small amounts of clinically unimportant heterogeneity; therefore, we based our evaluation on I 2 [17].

Sensitivity analyses and publication bias
Predefined sensitivity analyses of the primary outcomes were conducted using a fixed-effects model or by exclusion of studies with genetic syndrome as inclusion criteria, using only diagnostic tools, single-blind, shorter than 4 weeks, presenting only completers data, with at least moderate overall risk of bias, with estimated SD (imputed, from medians/range, or pooled subscales). Post-hoc, we excluded studies without baseline SDs and we used the correlations of 0.25/0.75 for the calculation of variance of SMC [41]. Regarding responder rates, we post-hoc excluded studies with imputed responder rates [38]. We explored small study effects as proxy for publication bias with contour-enhanced funnel plots, Egger's test [47], and trim-and-fill [48].

Meta-regression analyses
The dependent variable was SMC and the independent variable was selected from a list of covariates from the literature [9,11,12,[49][50][51]. First, we conducted univariable and then multivariable meta-regressions similar to our previous analyses in schizophrenia [51]: we used the factors that were significant in the univariable analysis and then a formal backward stepwise algorithm with a removal criterion of p = 0.15. Meta-regressions were not performed for categorical covariates with less than five data points per level. Spearman's ρ were calculated posthoc between SMCs of placebo and experimental intervention as well as between covariates.
Intervention-related factors Intervention-related factors were route of administration (oral versus others) [52], type of experimental intervention (pharmacological versus dietary supplement), and dose-schedule (fixed versus flexible).
Study-related factors Study-related factors were duration of treatment (weeks), publication year, washout from psychotropic medications (coded post-hoc as the presence of washout or not, because definitions varied), placebo lead-in with exclusion of those showing a positive response, type of rater (clinicians versus caregivers), total sample size, number of sites, %academic sites, number of arms and medications, %participants on placebo, sponsorship (industry-funded/patent application versus industry-independent), country of origin (US versus not only US), and risk of bias domains.
Participant-related factors Participant-related factors were the presence of any associated conditions by inclusion criteria (i.e., irritability, ADHD, and other conditions apart from intellectual disability or genetic syndrome), mean age and age group (children/adolescent versus adults/mixed, post-hoc), %participants with intellectual disability (at least mild or IQ < 70), %female (post-hoc), ethnicity (%Caucasian/Hispanic, post-hoc), and baseline BMI (post-hoc) [9]. Due to inconsistent reporting of baseline severity [11,12], we used CGI-Severity (ranging 1-7) as a measure of global severity and ABC-Irritability (ranging 0-45) as a measure of serious behavioral problems [53]. Baseline severity in core symptoms could not be investigated as a potential predictor due to the large diversity of scales and standardization methods (such as using the lower and upper limits of the measurement scale [54]) could not be utilized (trials reported raw and standard scores such as of VABS or T-scores of SRS). We also examined the use of a threshold of core symptom severity for inclusion (not only for the confirmation of diagnosis).

Description of included studies
The PRISMA flow diagram is presented in Fig. 1. In this analysis, 86 (k) studies were included, 71% comparing pharmacological treatments and 29% dietary supplements with placebo (eAppendix-5.1 and Table-S). Of the 86 studies, 75 were conducted in children/adolescents, eight in adults, and three included both age groups. The overall sample size (n) was 5365, 44% on placebo. The majority of studies were parallel (85%), single-center (60%, indicated in k = 78), and double-blind (only one single-blind [57] and none was open) with two arms (88%) and small sample sizes (median 45, interquartile-range ). About half of the studies (48%) had a duration of 12 weeks or more and three less than 4 weeks [57][58][59] (median 10 weeks [8][9][10][11][12]) as well as half used a fixed dose schedule (51%, k = 84) and had a washout from psychotropic drugs (55%, k = 75), yet definitions and duration varied. Placebo lead-in with exclusion of those with a positive response was used in five studies.
All of the studies used standardized diagnostic criteria, except five that used only diagnostic tools [60][61][62][63][64]. Associated conditions were the focus of and they were  for inclusion in 29 studies (irritability in 52% of them), and a genetic syndrome in one (neurofibromatosis-type-1) [60]. Core symptoms was the primary focus in 34 trials, while the focus was unclear in 23 studies. Nine studies included participants using a threshold of core symptom severity, using ABC-L/SW, CARS, RLRS, SRS, and YBOCS-versions (in five out of eight). Participants on placebo had a median age of 8. Overall, 40% of the studies had an overall low risk of bias, 52% moderate and 8% high. Description of the methods was adequately reported in more than half of the studies for sequence generation (63%), allocation concealment (54%), and blinding (72%). Missing outcomes were adequately addressed in 60%, with a median overall dropout rate from placebo of 12.93% ([6. .6%] and k = 70 trials out of 86 reported attrition rates). Of the studies, 23% had a high risk of selective reporting, and 13% high risk in other biases, mainly due to imbalances between groups (Additional file 3: eAppendix-5.2). Finally, 38% of the studies were industry-sponsored (including five in which investigators applied for a patent on the experimental intervention), and sponsorship was unclear in three studies.

Sensitivity analysis and publication bias
The results did not change materially in sensitivity analyses (Additional file 3: eAppendix-6.1). There was no indication of small-study effects from a visual inspection of the funnel plot and Egger's test or publication bias (z = − 0.38, p = 0.70) (Additional file 3: eAppendix-6.2). Also, fixed and random effects summaries were identical, an indication that smaller and larger studies give similar results.

Meta-regressions
The results of the univariable and multivariable meta-regression analyses are presented in Table 1, Figure In the multivariable meta-regression, using the backward selection procedure, other bias, baseline ABC-Irritability, and the type of rater remained as covariates in the model, but the latter two were not significant due to their interaction with the other covariates. In a model without ABC-Irritability (available in 31 studies), publication year, other bias, and type of rater remained, the latter was not significant.
Sensitivity analysis and publication bias Sensitivity analyses did not change the results materially, though there was a small difference between fixed and randomeffects summary estimates, indicating possible smallstudy effects. Egger's test was not significant and yielded a marginal p value (z = 1.71, p = 0.09); it has been suggested that for this test, a threshold of 0.1 should be employed. By visual inspection of the funnel plot, we detected a possible asymmetry (Additional file 3: eAppendix-6.2) and the trim-and-fill adjusted placebo response was − 0. 33  . These covariates remained in the multivariable model, but the use of a threshold of core symptoms at inclusion was not significant. Nevertheless, the findings might have been driven by three antidepressant trials in children/adolescents [65][66][67], with larger sample sizes (~150) and multiple sites (3, 6, and 18), as well as using flexibledosing and a threshold of CYBOCS-PDD for inclusion (Table 1).

Overall core symptoms
Primary analysis Forty-five studies with 1063 participants were included in the primary analysis. Caregivers filled about half of the scales (51%). The most frequently Fig. 2 Placebo response in scales measuring social-communication difficulties. Squares and bars represent standardized mean changes (SMC) and 95% confidence intervals for each study. The size of the square is proportional to the weight of the study in the meta-analysis. The diamond represent the pooled SMC. Heterogeneity is quantified with a χ 2 test (Q) and I 2 . *In Chugani 2016, standard errors might have been reported as SDs. Therefore, we calculated SDs from the reported values (no reply from the corresponding author). It should be noted that in Niederhofer 2003, an aggregated score of ABC-L/SW rated by both caregivers and teachers were reported, in Amminger 2007, ABC-L/SW was rated by clinicians of the day care center. Scale: the scale used (clinician rated scales based on observation or interviews were preferred in the primary analysis); n: the number of participants on placebo; mean: mean change from baseline to endpoint (negative values for improvement); sd: the standard deviation used for the standardization (baseline standard deviations were preferred); SMC: standardized mean changes, 95% CI: 95% confidence intervals, k = total number of studies included in the analysis   (Fig. 4).
Sensitivity analysis and publication bias Sensitivity analyses did not change the results materially, no asymmetry was detected in the funnel plot, and Egger's test yielded a marginal p value (z = − 1.82, p = 0.07). Fig. 3 Placebo response in scales measuring repetitive behaviors. Squares and bars represent standardized mean changes (SMC) and 95% confidence intervals for each study. The size of the square is proportional to the weight of the study in the meta-analysis. The diamond represent the pooled SMC. Heterogeneity is quantified with a χ 2 test (Q) and I 2 . *In Chugani 2016, standard errors might have been reported as SDs. Therefore, we calculated SDs from the reported values (no reply from the corresponding author). In Amminger 2007, ABC-S was rated by clinicians of the day care center. Scale: the scale used (clinician rated scales based on observation or interviews were preferred in the primary analysis); n: the number of participants on placebo; mean: mean change from baseline to endpoint (negative values for improvement); sd: the standard deviation used for the standardization (baseline standard deviations were preferred); SMC: standardized mean changes, 95% CI: 95% confidence intervals, k = total number of studies included in the analysis . In the multivariable model, allocation concealment and number of sites were both significant. Number of medications and the use of placebo lead-in had not sufficient data for all outcomes, while number of arms, selective reporting, and the use of the threshold of core symptoms did not have sufficient data for metaregressions in overall core symptoms.

Secondary outcomes Placebo response by type of rater
Results based on scales filled by different type of raters (Additional file 3: eAppendix-6.4) were similar to those of meta-regressions by type of rater (one effect size per study, clinician ratings were preferred whenever available).

CGI-I positive response rates
The overall positive response rate as defined by at least much improvement in the CGI-I was 19% [16-22%] (k = 57, n = 1686, I 2 = 53%) (Fig. 5). The anchoring system of CGI was unclear in 35 studies, while seven considered both core and associated symptoms (three used OACIS [69]), three reported separate evaluations for global autism symptoms and for the trial target symptom, and three considered mainly core symptoms and nine associated symptoms (two reported the RUPP-framework [70]) ( Table-S). Between placebo and drug response SMCs of placebo and experimental intervention were correlated in socialcommunication difficulties (Spearman's ρ = 0.525, p < 0.001) and overall core symptoms (ρ = 0.539, p < 0.001), but no correlation was found in repetitive behaviors (ρ = 0.233, p = 0.096) (Additional file 3: eAppendix-6.5).

Discussion
In pharmacological and dietary supplement ASD trials, placebo response was substantial and comparable among core symptoms; about 20% of the participants were at least much improved with placebo. We found potential predictors of larger placebo response in at least one symptom domain, i.e., baseline irritability, the use of a threshold of core symptoms at inclusion, caregiver ratings, larger sample size and number of sites, lower risk of bias, flexible-dosing, and less recent publication year.

Predictors of placebo response Participant-related factors
It has been argued that placebo response might be larger in children/adolescents than adults [71]. We did not find a difference between age groups or an effect of mean age. Nonetheless, extrapolations between age groups should be interpreted with caution because the majority of studies were in pediatric populations (87%). Other participant characteristics did not predict placebo response (e.g., sex, ethnicity, BMI, intellectual disability).
Low baseline severity has been found to predict placebo response in most psychiatric conditions [50]. We did not find an effect of baseline global severity (CGI-S), yet available data were sparse (baseline CGI-S was reported in less than half studies, k = 38, 44%) and narrowly ranged between 3.88 and 6 (Additional file 3: eAppendix-5.1); also because most of the studies required participants to be at least moderately ill (i.e., CGI-S ≥ 4). Baseline severity in core symptoms could not be analyzed as a potential predictor due to the large diversity of scales. On the other hand, we found that trials using a cut-off of core symptoms for inclusion might have a larger placebo response in repetitive behaviors, yet this association was not significant in a multivariable meta-regression and it might have been driven by three antidepressant trials that used a cut-off of the clinicianadministered scale CYBOCS-PDD [65][66][67]. Trials that utilize a baseline score cut-off could be prone to regression to the mean effects as well as baseline score inflation, especially for clinician-administered scales and under participant recruitment pressure [72]. These effects could be partially avoided by using different scales at assessing participants for inclusion and as primary outcomes [73], yet this might be challenging given the lack of optimal scales in ASD. Centralized raters blind to inclusion criteria might also reduce baseline inflation and increase inter-rater reliability, yet the execution of the trial could become complicated [72]. Since inflated scores are usually very close to the inclusion cut-off, a potential solution could be that the primary analysis is conducted by including participants with a higher cutoff (that is blinded to the investigators) than the inclusion cut-off [74].
The presence of an associated condition was required as inclusion criteria in about one-third of the trials (29 out of 86), and it was not found to predict placebo response. Nevertheless and since co-occurring symptoms and diagnoses are highly prevalent in participants with ASD [5], it can be expected that participants in other studies had also associated symptoms of varying levels. Accordingly, the median of baseline ABC-Irritability was 17.18 IQR [13.71-22.70], while normative data suggested a mean of 12.8 [75]. Thus, our sample in general could be consisted of participants with somewhat higher levels of irritability. Indeed, the most frequently investigated responders and its 95% confidence interval for each study. The size of the squares is proportional to the weight of the study. The diamonds represent the pooled proportion and its 95% confidence intervals for each subgroup and overall. Heterogeneity is quantified with a χ 2 test (Q) and I 2 . CGI-I positive responders: number of participants with a positive response defined as at least much improvement in the CGI-I (if not reported, it was imputed using a validated method); Total: total number of participants on placebo associated condition in our sample was irritability (k = 15) and the presence of an associated condition was correlated to baseline ABC-Irritability (ρ = 0.49, p < 0.001, Additional file 3: eAppendix-5.4). We found that baseline ABC-Irritability was associated with a larger placebo response in social-communication difficulties, yet this association was not significant in a multivariable metaregression. The contrary was found in a quite large trial (n = 149) investigating citalopram for repetitive behaviors, yet participants had lower levels of irritability (mean ABC-Irritability = 11.2) [9]. Additionally, a small 8-week observational study investigating the effects of participation in a study protocol suggested that placebo-effects may be mainly observed in children with higher levels of irritability [76]. Such participation effects could be decreased by a screening phase with adequate duration, which could also investigate the stability of symptoms and incorporate a potential washout of psychotropic drugs. However, no effect was found for the use of a washout phase and there were not enough data to investigate the use of a placebo lead-in phase, which is in general not recommended [72].

Design-and intervention-related factors
Caregiver ratings seemed to be more prone to placebo response in social-communication difficulties, but the effect was not consistent in multivariable metaregressions. It has been argued that placebo-by-proxy effects are important components of placebo response in child/adolescent psychiatry, since they can alter caregiver perception of symptoms (thus improving directly scores in caregiver scales), and/or modify caregiver behaviors toward children and subsequently improving symptoms (thus improving scores also in non-caregiver scales) [71,77]. In addition, many of the existing scales were not designed to measure change but rather as screening (e.g., SRS [31]) or diagnostic tools (CARS [32] and ADOS [78]), and efforts have been made for their improvement and adaptation, such as the ADOS calibrated severity score [79]. Given the lack of optimal scales, CGI has been extensively used and it is recommended for all trials irrespective of their target in order to investigate global autism symptoms and incorporate both core and associated symptoms [80,81]. However, the anchoring system of CGI should be clearly reported, since it could vary materially among trials with different target symptoms (Table-S).
Therefore, there is a critical need to develop standardized and sensitive measures of core symptoms, which do not solely depend on caregivers [82,83]. The semistructured interview of VABS might be a promising measure of change in social-communication difficulties [33], with potential sensitivity to detect efficacy [68,84] and empirically derived cut-offs of minimal-clinical-important differences [85]. Recent instruments have also been developed, among others the Brief Observation of Social Communication Change (BOSCC) [86,87], the Autism Behavior Inventory [88], and the Autism Impact Measure (AIM) [89], but their utilization is yet to be determined. Patient-(or parent-) reported outcomes have also gained recently greater attention [90], yet they should not be considered immune to placebo-effects [91]. The utilization of scales that require more extensive training and experience (e.g., ADOS, BOSCC, and VABS) might be challenging in larger scale trials, and thus a low inter-rater reliability could increase the variance of measurements and subsequently decrease drug-placebo differences. A notable example is the multi-center arbaclofen trial [84], in which VABS should have been completed by the same clinician and caregiver for each participant. However, there was quite low adherence to the protocol (rater change in about 25% of the participants), potentially because VABS-Socialization was a secondary outcome, not expected to be sensitive in the context of the trial. A post-hoc per-protocol analysis of no rater change found a significant improvement of arbaclofen in comparison to placebo, in contrast to the non-significant difference of the primary analysis [84]. Therefore, proper training of the raters and interrater reliability of the measurements as well as guidance and adherence to the protocol should be ensured, especially in multi-site trials.
Sample size and number of sites have been suggested as predictors of placebo response [50,51,92]. We also found that a larger sample size was associated with a larger placebo response in repetitive behaviors, yet the results might be driven by three antidepressant trials [65][66][67]. This association could also be explained by a potential publication bias and the small-study effects found in the funnel plot (see Additional file 3: eAppendix-6.2), since the results of less precise trials with larger placebo response in repetitive behaviors might have been not published. Additionally, sample size was closely related to the number of sites (Spearman's ρ = 0.77, p < 0.001, see Additional file 3: eAppendix-6), which predicted placebo response in overall core symptoms, yet the latter was driven by another outlier study with 26 sites [68]. Trials with more sites were more frequently industrysponsored (ρ = 0.27, p = 0.04) and consisted of less academic sites (ρ = − 0.51, p < 0.001). It should be noted though that the majority of included studies were singlecenter (median number of sites 1 IQR [1][2][3][4]), had academic sites (about 83% consisted only of academic sites), and small sample sizes (median 45 IQR ); therefore, the results could not be extrapolated to a wider range of potential values. Nevertheless, more sites and the recruitment of non-academic professional sites, which could have less experience and enroll competitively, might increase variability, be prone to less rigorous participant selection and baseline score inflation [73,74,92]. Therefore, trials should be well powered, yet extremely large sample sizes could be avoided, as well as sites should be carefully selected and their number should be kept at the minimum feasible.
Studies with low risk of bias in other biases (mainly baseline imbalance) and allocation concealment were associated with larger placebo response in socialcommunication and overall core symptoms, respectively. It is intriguing that studies with a better quality in terms of risk of bias might have a larger placebo response. However, the above risk of bias domains evaluate the randomization process, and in inadequately randomized trials, control groups might have a poorer prognosis [93].
The association between dosing schedule and placebo response can be puzzling, e.g., both flexible- [94] and fixed-dosing schedules [95] have been associated with larger placebo responses in depression. We found an association between flexible-dosing and larger placebo response in repetitive behaviors, yet it was driven by three antidepressant trials [65][66][67]. Flexible-dosing could allow dose optimization guided by clinical response and/or the occurrence of side effects. The dose titration schedule and criteria as well as the starting dose and dose ranges should be carefully selected in the context of large placebo responses. For example, in one the aforementioned antidepressant trials, large placebo responses (> 25% reduction from baseline in CYBOCS-PDD) might have impeded dose escalation from a low starting dose (2 mg of fluoxetine) to a stable appropriate dose (> 10 mg) for sufficient duration of treatment (> 4 weeks) [67]. On the other hand, dose-response studies are a special type of fixed-dosing studies that might be prone to larger placebo responses. They are multiarm and participants have an increased chance to receive active medication, as well as larger sample sizes and multiple sites are usually required. These factors have been associated with a larger placebo response in psychiatry [50], yet not all of them were replicated in our analysis, probably due to the limited number of studies with those characteristics. A notable example is the dose-response study of aripiprazole [96], which had a placebo positive response rate of 33% in comparison to 16% in the similarly designed but flexible-dosing study [97] (Fig. 3). However, this has not always been observed, such as in risperidone trials, i.e., 14% in the dose-response study [98] in comparison to 12% [53] and 18% [99] in the flexible-dose studies.
Country of origin and type of experimental intervention (pharmacological or dietary supplement) was not found to predict placebo response, in contrast to a previous meta-analysis [11], which included also many Iranian trials with risperidone-combined treatments that were excluded from our review (combination treatments such as risperidone + placebo were excluded, see Additional file 3: eAppendix-4). Therefore, the findings in the previous meta-analysis could have been confounded by larger responses in combined placebo groups, i.e., response of risperidone + placebo.
There is no clear consensus about the adequate trial duration and half of the included studies had a duration of at least 12 weeks, yet the duration of the trial should be based on the mechanism of action of the experimental intervention and a longer duration could be required in order to observe sustained changes in core symptoms [100]. We did not find an effect of trial duration, yet shorter-term trials have been associated with larger placebo response in psychiatry [50]. However, in longerterm trials including young children, anticipated developmental trajectories could also explain placebo effects and subsequently mask drug-placebo differences [101]. Therefore, developmentally based scales might be necessary to overcome this challenge [82] as well as trial designs could include additional follow-up assessments in order to confirm stability of improvement [101].
In most psychiatric disorders, placebo response has increased over a period of 60 years [49,50,102], but this trend was not replicated in ASD trials, which were more recent, mainly published between 2009 and 2017. Even, placebo response in social-communication difficulties might have decreased over years. However, this effect was not found when ABC-Irritability was included in multivariable meta-regression. Temporal changes in the definition of ASD and research practices might play an important role per se, as differences between ASD and neurotypical populations might have been decreased over the years [103].

Limitations
Our analysis has limitations. First, our analysis focused on placebo response in core symptoms of pharmacological and dietary supplement interventions. Therefore, we did not investigate placebo response in associated symptoms or of psychological/behavioral or multimodal interventions, which could also be of interest. However, core symptoms was the apparent focus in about 40% of the included trials, while many trials focused on associated symptoms, mainly irritability or ADHD symptoms. Second, there was a large diversity of scales used as well as a wide variability of their use, e.g., different CGI-I anchoring systems. Third, moderators of drug-placebo differences were not investigated and efforts to minimize placebo response could also affect drug response, since they were correlated in social-communication difficulties and overall core symptoms, but not in repetitive behaviors (Additional file 3: eAppendix-6.5). In addition, some predictors might have a different impact on placebo and drug response [51]. Nevertheless, a more finegrained analysis was impeded by the use of diverse experimental interventions with essentially different mechanisms of action (Additional file 3: eAppendix-5.1), e.g., contrary to schizophrenia [49,51,102], for which antipsychotics are the cornerstone of treatment [104].
Fourth, a common estimated pre-post correlation was used, but effect sizes were not materially changed in sensitivity analyses (Additional file 3: eAppendix-6.1). Fifth, despite the large number of eligible studies, about half did not provide data in spite of our efforts (authors of 85% of included studies published after 1990 could be contacted, with a reminder e-mail in case of no response, and 17% of them provided additional data/clarifications, Additional file 3: eAppendix-4), and a priori we did not use data from the whole crossover period (in forty trials), in order to avoid carry-over effects [14]. Sixth, due to the fact that information for many predictors, especially for participantrelated factors (Additional file 3: eAppendix-5), was missing in many studies, we could not employ a full multivariable meta-regression and we focused on a series of univariable meta-regressions. Therefore, we cannot exclude the possibility of omitted variable bias in the results, i.e., the fact that the effect of the omitted variables may be added to the predictor considered in the univariable metaregression. It should be noted that meta-regressions of aggregate data have an observational nature and they are prone to ecological fallacy, thus our findings should be considered exploratory and hypothesis-generating, considering also that there was no adjustment to multiple testing. Accordingly, individual-participant-data meta-analysis could allow for a more fine-grained analysis and further elucidate the impact of participant-level factors, such as age, sex, as well as baseline severity of core/associated symptoms.

Conclusions
In order to increase the detection of efficacy of experimental interventions for ASD, high-quality and adequately powered trials are required, and predictors of placebo response should be considered. Extremely large sample sizes could be avoided and when multiple sites are needed, they should be carefully selected, trained, and monitored as well as their number should be kept at the minimum feasible. This would also facilitate a more rigorous selection of participants and a higher inter-rater reliability of measurements. Furthermore, scales that do not solely depend on caregiver reports could be selected as primary outcomes, since placebo-by-proxy effects are expected. Nevertheless, our findings highlight the urgent need for optimal and developmentally-based measures of change in core symptoms [82,83]. The mechanism of action of the experimental intervention could guide the selection of an appropriate, yet sufficiently long, trial duration as well as of the dose schedule and dose ranges. Participant-related factors, such as age, sex, and baseline severity of core/associated symptoms as well as factors that could differentially moderate drug response warrant further investigation. Last, in order to facilitate comparability between studies and synthesis of evidence, trials should better characterize their participants and improve their reporting, including the CGI anchoring system.

Additional file 2.
Additional file 3. Funding This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 777394 for the project AIMS-2-TRIALS. This Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA and AUTISM SPEAKS, Autistica, SFARI. CA, MP and DF were supported by the Spanish Ministry of Science, Innovation and Universities. Instituto de Salud Carlos III, cofinanced by ERDF Funds from the European Commission, "A way of making Europe", CIBERSAM, Madrid Regional Government (B2017/BMD-3740 AGES-CM-2), European Union Structural Funds and European Union Seventh Framework Program and H2020 Program; Fundación Familia Alonso, Fundación Alicia Koplowitz and Fundación Mutua Madrileña. The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Availability of data and materials
All data generated during this study are included in this published article (and its supplementary information files). The datasets analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
In the last 3 years, Stefan Leucht has received honoraria as a consultant/ advisor and/or for lectures from LB Pharma, Otsuka, Lundbeck, Boehringer Ingelheim, LTS Lohmann, Janssen, Johnson&Johnson, TEVA, MSD, Sandoz, SanofiAventis, Angelini, Recordati, Sunovion, and Geodon Richter. David Fraguas has been a consultant and/or has received fees from Angelini, Eisai, IE4Lab, Janssen, Lundbeck, and Otsuka. He has also received grant support from Instituto de Salud Carlos III (Spanish Ministry of Science, Innovation and Universities) and from Fundación Alicia Koplowitz. Mara Parellada has received educational honoraria from Otsuka, research grants from FAK and Fundación Mutua Madrileña (FMM), Instituto de Salud Carlos III (Spanish Ministry of Science, Innovation and Universities) and European ERANET and H2020 calls, travel grants from Otsuka and Janssen. Consultant for Exeltis and Servier. Celso Arango has been a consultant to or has received honoraria or grants from Acadia, Angelini, Gedeon Richter, Janssen Cilag, Lundbeck, Otsuka, Roche, Sage, Sanofi, Servier, Shire, Schering Plough, Sumitomo Dainippon Pharma, Sunovion and Takeda. In the last 3 years, Maximilian Huhn has received speakers honoraria from Janssen. Declan Murphy has received consulting fees from Roche. The other authors have nothing to disclose.