Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism

Table 1 Summary of tested classifiers

Classification family	Models used	Built-in sparsifying coefficient, other penalization	Under-sampling used	Relevance
Penalized linear regression	Linear Regression	L ₁ penalization	Yes	∙ Very interpretable
	Lasso	L ₂ penalization		∙ Simple model
	Ridge			∙ Linear like ADOS
	Elastic net			∙ Can use gradation in label
	Relaxed Lasso			(ASD vs spectrum)
Nearest neighbors	Nearest shrunken centroids	L ₁ penalization	Yes	∙ Can identify subgroups within classes,
				which is likely for our sample
				∙ Simple model
General linear models for classification	LDA (L ₁)	L ₁ penalization	No	∙ Simple model
	Logistic regression (L ₁, L ₂)	L ₂ penalization		∙ Interpretable
				∙ Based on linear assumptions
Support vector machines	Linear kernel (L ₁)	L ₁ penalization	No	∙ Can capture more complex shapes in data when using nonlinear kernels
	Polynomial kernel	Regularization parameter
	Radial kernel
	Exponential kernel
Tree-based classifiers	Decision tree	Tree depth	No	∙ Performs well on categorical data
	Random forest	Number of trees		∙ Better captures feature interactions
	Gradient boosting			∙ Tree is interpretable
	AdaBoost			∙ Boosting techniques often gives higher accuracy than simpler models

We trained and tested 17 unique machine learning classifiers on both our module 2 and module 3 training data sets. Linear regressions models were trained to differentiate autism, spectrum, and non-ASD (3 prediction classes) but tested to detect only ASD from non-ASD

ISSN: 2040-2392