Rank and select features (input variables)
Contents
Load data
We use MAGIC telescope data from the UCI machine learning repository. Earlier we split these data in training and test sets in the proportion 2:1. In this exercise, we use only the training data and estimate the model accuracy by cross-validation. These data have about 12k observations and 10 features.
load MagicTelescope
size(Xtrain)
ans = 12742 10
Prepare a pool of parallel workers
To speed up computation, we use the Parallel Computing Toolbox in MATLAB. In this example, parallel computation is used twice. First, we estimate the classifier accuracy for sequential backward elimination (SBE) using 5-fold cross-validation with each fold processed in parallel. Second, we rank variables by the "feature-based sensitivity of posterior probabilities" (FSPP) method: We remove one variable at a time and evaluate the classifier accuracy on the feature set with this variable eliminated. Evaluation of the reduced feature sets is carried out in parallel.
To prepare for parallel computation, we open a pool with 3 workers. The computer used in this example has 4 cores, and we could open at most 4 workers.
matlabpool open 3
Starting matlabpool using the 'local' profile ... connected to 3 workers.
Run sequential feature selection
The sequentialfs function in the Statistics Toolbox provides sequential forward selection (SFS) and sequential backward elimination (SBE).
First, we need to specify a function to evaluate the optimized criterion. We set Q to an anonymous function handle with 4 input arguments: training X, training Y, test X and test Y. We pass the training data to the fitensemble function to grow an ensemble of 100 bagged trees for classification. Then we apply the loss method of the trained ensemble to estimate the classification error for the test data. sequentialfs assumes that the criterion is computed by taking a sum over all observations in the test set, and loss returns the mean classification error. To compensate for this discrepancy, we multiply the output of loss by the number of observations in the test set.
Having set up Q, we define options for sequentialfs. By setting UseParallel to 'always', we tell sequentialfs to execute cross-validation in the parallel mode. We use cross-validation with 5 folds specified by the 'CV' argument. We set the search direction to 'backward' for SBE. By setting the minimal number of in-model features to zero, we tell the search to continue eliminating features until none is left. If we neglected to do so, sequentialfs would only remove features as long as Q did not increase.
sequentialfs returns two output arguments, a vector of in-model feature indices and a search history comprised of criterion values and in-model features at every step. In this case, the vector of in-model features is empty because we set NFeatures to 0. Since we do not need it, we replace the first output with a tilde.
Q = @(xtrain,ytrain,xtest,ytest) ... size(xtest,1)*loss( fitensemble(xtrain,ytrain,... 'Bag',100,'Tree','type','classification'), xtest, ytest); options = statset; options.UseParallel = 'always'; [~,history] = sequentialfs(Q,Xtrain,Ytrain,'CV',5,... 'Direction','backward','NFeatures',0,'Options',options);
Obtain feature ranks and criterion values
In-model features at every step of the search are saved in the history output as a logical array of size D-by-D for D features. Every row of this array shows features included in the model at the respective step. Features with large ranks (unimportant) are eliminated first, and features with low ranks (important) are eliminated last. We save the feature ranks in a variable ranked . We then reproduce the plot of the classification error against the model size shown earlier in the text.
D = size(Xtrain,2); ranked = 1 + D - sum(history.In,1) figure; plot(D:-1:1,history.Crit,'bo','MarkerSize',10); grid on; xlabel('Number of variables in the model'); ylabel('Classification error'); axis([0.5 10.5 0 0.4]);
ranked = 1 9 7 8 6 4 5 10 2 3

Estimate feature importance by FSPP
First, we bag 100 trees on the full feature set and obtain PfitFull, posterior class probability estimates by 5-fold validation.
Then we eliminate one feature at a time and estimate Pfit, the posterior class probabilities without this feature. The mean magnitude of the change in the posterior probability estimates for the signal class ('Gamma') is used to compute the importance of this feature. These importance values are saved in the fsppImp variable.
The loop over D features is a parfor loop. parfor stands for "parallel for". One pass of the loop runs independently on one of the parallel workers. Instead of looping through the features consecutively, we process up to 3 features at the same time.
cvens = fitensemble(Xtrain,Ytrain,'Bag',100,'Tree','type','classification',... 'kfold',5); [~,PfitFull] = kfoldPredict(cvens); PfitFull = PfitFull(:,strcmp(cvens.ClassNames,'Gamma')); fsppImp = zeros(1,D); parfor d=1:D X = Xtrain; X(:,d) = []; cvens = fitensemble(X,Ytrain,'Bag',100,'Tree','type','classification',... 'kfold',5); [~,Pfit] = kfoldPredict(cvens); Pfit = Pfit(:,strcmp(cvens.ClassNames,'Gamma')); fsppImp(d) = mean(abs(PfitFull-Pfit)); end
Show the importance values obtained by FSPP.
fsppImp
fsppImp = Columns 1 through 7 0.1382 0.0461 0.0454 0.0458 0.0570 0.0557 0.0496 Columns 8 through 10 0.0454 0.0712 0.0601