Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism

Table 2 Summary of feature selection techniques used for classifiers without sparsity enforcing parameters

Feature score	Description	Sparsifying coefficient	Advantages
ANOVA	The k most discriminative features	k	∙ Simple test
	when doing the ANOVA test		∙ Fast
			∙ A priori information on what features
			would not be useful in classification using
			only the variance for each features
Lasso	Nonzero coefficients of the Lasso	L ₁ coefficient	∙ Linear model
	trained on the data for a given L ₁ coef		∙ Features used by a more parsimonious model
Tree	The k most important features when building	k	∙ Good with categorical data as it can use
	a full decision tree on the data		multiple cuts per feature, unlike linear models

The third column gives the parameter that will be used by the full model as the sparsifying coefficient for the grid search

ISSN: 2040-2392