Skip to main content
Fig. 2 | Molecular Autism

Fig. 2

From: Identifying the neurodevelopmental and psychiatric signatures of genomic disorders associated with intellectual disability: a machine learning approach

Fig. 2

Performance of final models on test data. A Plot of performance (AUROC) of four ML models (ANN = artificial neural network, penalised LR = penalised logistic regression, random forest, RBF SVM = radial basis function support vector machine) fit to 7 variable sets (all variables = all 176 variables; ANN = 30 most important variables in an ANN fit to all variables; penalized LR = 30 most important variables in a penalized logistic regression fit to all variables; random forest = 30 most important variables in a random forest model fit to all variables; > 1 Model = variables identified as being in the 30 most important variables by more than one ML model; > 2 Models = variables identified as being in the 30 most important variables by more than two ML models; SVM = the 30 most important variables in a Radial Basis Function SVM fit to all variables. Points show the median posterior AUROC, error bars show the 95% credible interval of the AUROC. B Receiver-operator characteristic curves for the 4 machine learning models, using the 30 variables from the random forest dataset. C Top—histogram of predicted probability of ND-GC status in the 100 participants in our testing dataset using the best performing random forest model; bottom—plots of sensitivity, specificity of model classification performance at different thresholds for categorising a predicted probability. D Calibration plot for the best performing RF model. Points are performance in each decile, vertical lines show 95% confidence intervals, thick diagonal line shows a linear model fit to the data, with the shade area showing the 95% confidence interval of the linear model. A perfectly performing model would follow the diagonal dashed line. E Variable importance for the best fitting model. Mean dropout loss is the mean change in model AUROC after a given variable is permuted (repeated 500 times). Horizontal line indicates (1—AUROC) of the full model; therefore, variables with mean values above this line have a negative impact on model fit when permuted. Variable definitions are provided in Additional file 1: Table S7

Back to article page