Feature score | Description | Sparsifying coefficient | Advantages |
---|---|---|---|
ANOVA | The k most discriminative features | k | ∙ Simple test |
 | when doing the ANOVA test |  | ∙ Fast |
 |  |  | ∙ A priori information on what features |
 |  |  | would not be useful in classification using |
 |  |  | only the variance for each features |
Lasso | Nonzero coefficients of the Lasso | L 1 coefficient | ∙ Linear model |
 | trained on the data for a given L 1 coef |  | ∙ Features used by a more parsimonious model |
Tree | The k most important features when building | k | ∙ Good with categorical data as it can use |
 | a full decision tree on the data |  | multiple cuts per feature, unlike linear models |