Skip to main content

Table 2 Summary of feature selection techniques used for classifiers without sparsity enforcing parameters

From: Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism

Feature score


Sparsifying coefficient



The k most discriminative features


∙ Simple test


when doing the ANOVA test


∙ Fast


∙ A priori information on what features


would not be useful in classification using


only the variance for each features


Nonzero coefficients of the Lasso

L 1 coefficient

∙ Linear model


trained on the data for a given L 1 coef


∙ Features used by a more parsimonious model


The k most important features when building


∙ Good with categorical data as it can use


a full decision tree on the data


multiple cuts per feature, unlike linear models

  1. The third column gives the parameter that will be used by the full model as the sparsifying coefficient for the grid search