Learning by radial basis functions
Contents
Load data
Just like in the kernel ridge regression example, we use the MAGIC telescope data from the UCI repository. We split the data into training and test sets in the proportion 2:1 and code class labels as a numeric vector: +1 for gamma and -1 for hadron. The training data have Ntrain=12742 observations and D=10 input variables.
load MagicTelescope; [Ntrain,D] = size(Xtrain) Ntest = size(Xtest,1); Ytrain = 2*strcmp('Gamma',Ytrain)-1; Ytest = 2*strcmp('Gamma',Ytest)-1;
Ntrain = 12742 D = 10
We then standardize the variables and choose the kernel following the same steps as in the kernel ridge regression example.
Standardize variables
See the kernel ridge regression example (Section 14.2.1) for an explanation.
[Xtrain,mu,sigma] = zscore(Xtrain); Xtest = bsxfun(@minus,Xtest,mu); Xtest = bsxfun(@rdivide,Xtest,sigma);
Define kernel
See the kernel ridge regression example (Section 14.2.1) for an explanation.
sigma = 1; kernelfun = @(x1,x2) exp(-sum((x1-x2).^2,2)/2/sigma^2);
Find radial basis function (RBF) centers
To find the kernel expansion centers, we cluster the data using the kmeans function provided in the Statistics Toolbox. By quick experimentation (not shown here) we find that M=500 clusters give a reasonable balance between speed and accuracy of the classification model.
kmeans searches for cluster centers iteratively. At the first iteration, it assigns observations to clusters at random. At every next iteration, kmeans evaluates cluster fitness and moves observations across clusters if the move improves the fitness measure. We set the maximal number of iterations to 1000 to limit the search time. The output of kmeans can depend on the initial random cluster assignment. By setting the 'replicates' parameter to 10, we instruct kmeans to run 10 rounds of the initial random partitioning and select the one producing the most fit clusters. We also set the random generator seed by executing the rng function prior to calling kmeans for reproducibility.
M = 500; options = statset('MaxIter',1000); rng(1); [~,C] = kmeans(Xtrain,M,'options',options,'replicates',10);
The kmeans function returns C, an M-by-D array of cluster centers. These centers may not coincide with the actual points in the dataset. To position every expansion center exactly at a training point, we find the nearest neighbor to every cluster center in the training set using the knnsearch function provided in the Statistics Toolbox. This function returns indices of observations in Xtrain closest to the cluster centers in C. To shift the cluster centers to the found positions, we assign the found points to C.
idxNearestX = knnsearch(Xtrain,C); C = Xtrain(idxNearestX,:);
Map the training data into the feature space
We compute an Ntrain-by-M matrix of features for the training data. This computation is similar to the one in the kernel ridge regression example.
Gtrain = zeros(Ntrain,M); for m=1:M q = C(m,:); Gtrain(:,m) = kernelfun(Xtrain,repmat(q,Ntrain,1)); end
We then compute the condition number of the feature matrix. The condition number is defined as the ratio of the largest over smallest eigenvalue (by magnitude). In the kernel ridge regression example, the condition number of the full Ntrain-by-Ntrain matrix of features is at the order of 1e+28. Here, the condition number is at the order of 1e+3. The new matrix with 500 features likely does not need to be regularized. We assume that the optimal regularization parameter is zero.
cond(Gtrain)
ans = 2.5416e+03
Map the test data into the feature space
In a similar fashion, we fill an Ntest-by-M matrix of features for the test data.
Gtest = zeros(Ntest,M); for m=1:M q = C(m,:); Gtest(:,m) = kernelfun(Xtest,repmat(q,Ntest,1)); end
Compute the predicted response
In the absence of regularization, the optimal coefficients of the kernel expansion can be found in the same way as linear regression coefficients, by using the backslash operator. We multiply the test feature matrix by the estimated coefficients to obtain the predicted response.
alpha = Gtrain\Ytrain; Yfit = Gtest*alpha;
Compute HIACC
Following the paper by Bock et al., we estimate the mean signal efficiency at the background acceptance 0.1 and 0.2, just like we did in the kernel ridge regression example. The obtained HIACC value is noticeably lower than 0.837 obtained by kernel ridge regression.
[~,tprkr] = perfcurve(Ytest,Yfit,1,'xvals',[0.1 0.2])
accKR = mean(tprkr)
tprkr = 0.7118 0.8968 accKR = 0.8043
The modest loss in accuracy comes with more than 25-fold reduction in the CPU and memory requirements for the predictive model.
Ntrain/M
ans = 25.4840