Skip to main content

Table 2 Summary of feature selection techniques used for classifiers without sparsity enforcing parameters

From: Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism

Feature score

Description

Sparsifying coefficient

Advantages

ANOVA

The k most discriminative features

k

∙ Simple test

 

when doing the ANOVA test

 

∙ Fast

   

∙ A priori information on what features

   

would not be useful in classification using

   

only the variance for each features

Lasso

Nonzero coefficients of the Lasso

L 1 coefficient

∙ Linear model

 

trained on the data for a given L 1 coef

 

∙ Features used by a more parsimonious model

Tree

The k most important features when building

k

∙ Good with categorical data as it can use

 

a full decision tree on the data

 

multiple cuts per feature, unlike linear models

  1. The third column gives the parameter that will be used by the full model as the sparsifying coefficient for the grid search