The FieldTrip Multivariate Module (FMM) is a generic machine learning toolbox with additional support for the analysis of neuroimaging datasets and is written in object-oriented Matlab. It is especially suitable for applications in Brain-Computer Interfacing (BCI) and multivariate pattern analysis (MVPA). Contrary to other FieldTrip modules, it makes use of object-orientation and requires Matlab version >= 7.6.0 (R2008a). The module offers support for preprocessing, feature selection, classification, regression and reconstruction.
FieldTrip also offers a more high-level way of calling the module using the statistics_crossvalidate function (i.e., it automatically performs cross-validation for some specified multivariate analysis procedure). Please consult the multivariateanalysis tutorial for this.
The organization of the FMM folders is as follows:
The basic philosophy of FMM is that new methods can be added as toolboxes which are called through generic wrapper code. This makes it easy to extend the toolbox to fit your own needs. The currently used external toolboxes are listed here:
Different multivariate methods can be concatenated together in a processing pipeline, constituting a multivariate analysis ft_mv_analysis, or MVA for short. Finally, an ft_mv_analysis object can either be used standalone or be evaluated on data using a ft_mv_crossvalidator object. This automates the whole process of analyzing (neuroimaging) data.
In FMM, a dataset is given by a multidimensional array, where the first dimension indexes datapoints and the remaining dimensions are arbitrarily chosen, e.g., size(X) = trials x channels x frequencies. Internally, the module represents this data as trials x features matrix. Missing values are represented using NaN.
An exception to this rule holds for ft_mv_timeseries objects, where the last dimension must index time, e.g., size(X) = trials x channels x frequencies x timepoints. Internally the module represents this data as a trials x features x timepoints matrix.
Multiple datasets are represented as a cell-array of multidimensional arrays. E.g., X = {X1 X2} and Y = {Y1 Y2} represents two input datasets and two output datasets.
A multivariate analysis is a Matlab object which can be constructed as follows:
obj = ft_mv_analysis({method1,...,methodN})
where the argument is a cell-array of multivariate methods. By calling
obj = obj.train(X,Y)
we can train our MVA on input X and output Y. Finally, we can test on new data U by calling
V = obj.test(U)
such that V contains the predictions. The MVA object will sequentially call each ft_mv_method in the pipeline where output of the previous method will be input to the next method. The method obj.name will return the MVA as a string and the method obj.model will return the associated model, in this case the model of the last method in the pipeline (more on this later).
Multivariate methods are organized into different subclasses according to the way they transform input data X to output data Y. Currently supported methods are preprocessors, which preprocess the data, selectors, which perform some kind of feature selection, predictors, which make predictions for new data, and timeseries methods, which perform timeseries analysis on the data. Many methods handle missing data, multiple outputs, multiple datasets and online (sample by sample) learning. If a method does not handle any of these situations then it will throw an error. Handling of some of these situations can be done by invoking a wrapper function.
All ft_mv_method classes are structured as follows (here we show an example of a predictor):
classdef mymethod < ft_mv_predictor
properties
% insert class-specific properties here
end
methods
function obj = mymethod(varargin)
% constructor
obj = obj@ft_mv_predictor(varargin{:});
end
function obj = train(obj,X,Y)
% estimate mvmethod parameters
end
function Y = test(obj,X)
% compute output
end
function X = invert(obj,Y)
% optional function which inverts the mapping performed by this class
end
function [m,d] = model(obj)
% returns the model parameters as a cell-array m
% desc contains a description of each element of m
end
Note that a predictor can perform classification tasks, regression tasks or represent multiple-input multiple-output (MIMO) models. In case of K-class classification, it is assumed that the output vector Y consists of labels 1 : K. For example, Y = [2 1 1 2 1 2]′ assigns examples 2, 3 and 5 to class 1 and the remaining examples to class 2. In case of regression, Y just consists of the observed real output per example. In case of MIMO models , the same behaviour holds, but Y consists of multiple columns where each column stands for a separate output.
In the following subsection we list the currently supported objects. More complete descriptions can be found in the help of the respective methods.
Base classes should not be called directly but rather form the core of the module.
A multivariate analysis is just a sequence of multivariate methods {method1 method2 method3 …} that are called in this order and where the output of the previous method acts as input to the next method. Example:
>> f = ft_mv_analysis({ft_mv_standardizer ft_mv_svm})
f =
ft_mv_analysis
Properties:
mvmethods: {[1x1 ft_mv_standardizer] [1x1 ft_mv_svm]}
Methods
>> f.name
ans =
{ ft_mv_standardizer ft_mv_svm }
Abstract class for multivariate methods. Introduces the verbose property to allow methods to show or hide output.
Helper class to handle multiple datasets in a uniform manner.
Helper class to handle multiple outputs in a uniform manner.
Preprocessors are used to preprocess or filter the input data X in such a way that it will be more suitable for making predictions.
Abstract base class from which other preprocessors are derived. Should not be called directly.
Centers and scales the data such that it has mean zero and a standard deviation of one.
Whitens the input data.
Common spatial pattern algorithm.
Principal component analysis. Uses parameter proportion. It defines the proportion of pc's or number of pc's. If < 1 then it is interpreted as a proportion of accounted variance. Otherwise it is treated as the absolute number of pc's. If empty then all components are used (default = 0.80).
Selectors are used to perform feature selection on the input data X.
The base class from which other selectors are derived. It defines properties validator and subset. The validator is used to select the optimal feature subset using an inner cross-validation. The subset defines the indices of the original features which are selected by the algorithm.
Performs feature selection by computing a univariate measure for each feature and taking the best N features based on a crossvalidator.
Performs a searchlight analysis. Suppose our features are given by voxels in multiple fMRI volumes. The searchlight moves a sphere through this volume and computes decoding performance per sphere. Below, an example is shown using a sphere of radius 3 and a step size of 3. Both radius and step size are measured in number of features (e.g. voxels). In order to define the structure of our search space, we use indims to specify the dimensions of the input data (e.g. [110 110 37] for our fMRI volumes). The used decoding procedure is by default a 5-fold cross-validation using a support vector machine but this behaviour can be overridden.
sl = ft_mv_searchlight('radius',3,'step',3,'indims',dims);
sl = sl.train(X,Y);
estimating 39 spheres with radius 3 with 3 steps for a
volume of size [110 110 37]
estimating sphere 1 of 39
estimating sphere 2 of 39
estimating sphere 3 of 39
estimating sphere 4 of 39 ...
average sphere volume: 52.2308
performance for sphere 1 of 39: 0.736842 (p-value: 0.342782)
performance for sphere 2 of 39: 0.842105 (p-value: 0.148915)
performance for sphere 3 of 39: 0.789474 (p-value: 0.267257)
performance for sphere 4 of 39: 0.684211 (p-value: 0.546494)
...
Furthermore, we could use a mask to specify our volume (e.g. a grey matter mask in case of fMRI data) or specify an irregular neighbourhood structure. Please see the function documentation for further details.
Predictors are used to predict new outputs Y based on input data X. Some predictors expect discrete outputs Y and others expect continuous outputs Y. Some methods are suitable for predicting multiple outputs. Methods which are not directly suitable for multiple outputs will process each output independently and combine the results.
Abstract class from which other predictors are derived. Should not be called directly.
Bayesian logistic regression with a multivariate Laplace prior as described here.
Naive Bayes classifier with normally distributed feature values.
Abstract class. Superclass of the following kernel methods.
Support vector machine.
Kernel logistic regression.
Regularized kernel least squares.
Regularized Fisher discriminant analysis.
Fast implementation of the elastic net algorithm as described here. This method can be used to perform regularized linear or logistic regression:
unregularized linear regression
ft_mv_glmnet('lambda',0,'family','gaussian')
unregularized logistic regression
ft_mv_glmnet('lambda',0,'family','binomial')
L2 regularized logistic regression
ft_mv_glmnet('alpha',0,'lambda',1,'family','binomial')
L1 regularized logistic regression
ft_mv_glmnet('alpha',1,'lambda',1,'family','binomial')
elastic net linear regression
ft_mv_glmnet('alpha',0.5,'lambda',1,'family','binomial')
Instead of setting the regularization parameter lambda manually, we can also optimize it using an inner cross validation as follows:
ft_mv_glmnet('validator',ft_mv_crossvalidator('nfolds',5),'family','gaussian') % 5-fold inner cross validation
Recursive least squares.
Sparse orthonormalized partial least squares algorithm.
Timeseries methods are used to predict new outputs Y based on input data X which should be of size repetitions x features x timepoints.
Status: unstable
Abstract class from which all timeseries objects are derived.
Hidden Markov model
Linear dynamical system
Meta analysis methods have in common that they operate on other multivariate analysis methods.
Base class for meta analysis methods.
For many of the described methods there are free parameters which need to be optimized. For instance, the C parameter of the SVM or the regularization parameters L1 and L2 for the elastic net. In those cases, it is useful to employ the ft_mv_gridsearch. This object can be used to optimize free parameters of a multivariate analysis in a fully automated way. The grid search is called as folllows:
ft_mv_gridsearch('mva',mymva,'validator',ft_mv_crossvalidator... ('metric',mymetric),'mvidx',myidx,'vars',myvars,'vals',myvals)
where mymva is the employed MVA, mymetric the employed performance measure to test which configuration is optimal, myidx the index of the method in mymva that is to be optimized, myvars the variable of mymva that needs to be optimized and myvals the values which that variable may assume. This sounds complicated but it is not. Let’s clarify with some examples. Suppose we wish to optimize the C parameter of an SVM in the range logspace(-3,3,7). Then, we can use
% crossvalidator with 80% of the data and accuracy as the metric
cv = ft_mv_crossvalidator('nfolds',0.8,'metric','accuracy');
opt = ft_mv_gridsearch('verbose',true,'mva',svm,'validator',cv,...
'vars','C','vals',logspace(-3,3,7))
% standard behavior from here on
opt = opt.train(X,Y) ...
Wrapper class to implement one-against-one classification in case of multiple classes. For example,
ft_mv_one_against_one('mva',ft_mv_svm,'combfun',[])
creates an SVM for all pairs of class labels and combines the predictions using the default combination function (sum of the probabilities).
Wrapper class to implement one-against-rest classification in case of multiple classes. For example,
ft_mv_one_against_one('mva',ft_mv_svm,'combfun',[])
creates an SVM for each class label and combines the predictions against the remaining class labels using the default combination function (sum of the probabilities).
Ensemble methods evoke methods in parallel and combine their outputs in some prespecified way. In FMM this is realized through the ensemble object. For example, we can apply a naive Bayes classifier and a support vector machine through ensemble methods as follows:
m = ft_mv_ensemble(’mvas’,{ft_mv_naive ft_mv_svm},’combfun’,myfun); m = m.train(X,Y); ...
Here, the user-specified function myfun specifies how the ouputs of naive Bayes and the SVM should be combined. For example, myfun could implement a majority vote as follows
function y = myfun(x)
y = zeros(size(x{1}));
for k=1:length(x)
[temp,pcls] = max(x{k},[],2);
for p=1:length(pcls)
y(p,pcls(p)) = y(p,pcls(p)) + 1;
end
end
% resolve ties for p=1:size(y,1)
m = find(ismember(y(p,:),max(y(p,:))));
y(p,:) = 0;
if length(m) > 1
m = m(ceil(rand*length(m)));
end
y(p,m) = 1;
end
Likelihood based classification. Takes another probabilistic method m, builds a model for each class and then compares likelihoods. Requires the method m.likelihood(X), which returns the log likelihood of the data X, to be defined. Example:
X = randn(20,10,15); % trials x features x timepoints
X(1:10,:,:) = X(1:10,:,:) + 0.5;
Y = [ones(1,10) 2*ones(1,10)]';
k = ft_mv_likelihood('mva',ft_mv_hmm('indims',[10 15]));
k = k.train(X,Y);
k.test(X) % show log likelihood of training data
We can either use a MVA to perform online state estimation or to perform an offline analysis of neuroimaging data. The former is used in BCI applications (prediction) whereas the latter is used in offline analysis of BCI data or MVPA (model inference). Online state estimation is handled in the next section. Here, we describe statistics for models learned on offline data.
Suppose we have acquired an offline dataset where subjects had to attend to the left or right visual field. In a MVPA approach, we want to get an estimate of how well our MVA can predict the at- tended location in individual trials and which features (brain regions, channels, latencies, frequencies, etc) contributed to this prediction. This can be assessed using the ft mv crossvalidator object. It splits the data into separate folds and learns for each fold a model on the remaining folds. For example, in case of ten-fold cross-validation we will obtain ten different models which are evaluated on ten different parts of the data. It is quite easy to perform such an analysis using FFM, as shown below.
% create crossvalidator object which % standardizes the data and applies an svm
cv = ft_mv_crossvalidator('mva',{ft_mv_standardizer ft_mv_svm});
% train cv on input data X and output data Y
cv = cv.train(X,Y);
% display classification accuracy
cv.metric = 'accuracy';
disp(cv.performance);
>> 0.80 % 80% correctly classified
% display outcome of mcnemar test
cv.sigtest = 'mcnemar';
disp(cv.significance);
>> 1e-4 % p-value; null-hypothesis rejected
In the example, a ten-fold cross-validation is performed by standardizing the data and applying a support vector machine. Subsequently, the classification accuracy (proportion of correctly classified trials) is computed. Finally, an approximate binomial test (McNemar test) is computed which compares the predictions with that of a naive classifier that assigns all outcomes to the majority class. The different performance measures and significance tests can be examined by consulting the help for ft_mv_performance and ft_mv_significance.
One may changenfolds to any number of folds, with nfolds = inf implementing leave-one- out cross-validation. Specifying a proportion for nfolds will use that proportion for training and the remaining data for testing. Finally, specifying nfolds = ’online’ will simulate an online session by taking all trials in sequence where we train on trials 1, . . . , n and test on trial n + 1 for n = 0, . . . , N − 1.
If you want to manually define which trials should end up in which folds then this can be done via either the trainfolds or testfolds property. These should define an Nx1 cell-array containing the trial indices for each of the N folds. Just one of the properties needs to be defined since the other property will automatically take the complement of the trials in each fold.
We have not yet mentioned how to determine which features were responsible for the predictions. This is realized through the model field of the cross-validator. It returns the average model that is produced by the last method in the specified MVA; in our case, the model of a SVM. It is up to each multivariate method to specify how its model is defined. For example, for the above example, assuming we used only five features, we have:
disp(cv.model); >> [ 0.0176 -0.0305 0.0445 0.0904 0.0710] disp(cv.description) >> 'primal form parameters; positive values indicate condition 2'
Sometimes it can be very expensive to save all the parameters learned in each fold. In that case, you may want to set the property compact to true. This will still save the average model but not all the parameters separately for each fold.
Performance measures of prediction results. Operates on the design matrix and the predictions. Called as
res = ft_mv_performance(design,post,metric)
The following performance metrics can be used:
Significance testing of prediction results. Operates on the design matrix and the predictions. Called as
res = ft_mv_significance(design,post,sigtest)
and returns a p-value as the result. The sigtest determines the significance test. It supports
A helper function which allows easy testing of a new dataset.
Multitask learning is the the notion that if we have multiple datasets (e.g., subjects, sessions) then we can learn better models for each dataset by being informed by the other datasets. For instance, if multiple datasets are given to ft mv blogreg then the same features are coupled between the different datasets. This leads to dataset-specific models that are easier to compare. For instance, in the following example we apply this method to two datasets, both generated by adding random noise to the original data.
>> load 69digits;
>> X1 = X + 0.05*randn(size(X));
>> X2 = X + 0.1*randn(size(X));
>> a = ft_mv_test(’mva’,{ft_mv_blogreg},’X’,X1,’Y’,Y);
>> disp(a)
0.6600
>> a = ft_mv_test(’mva’,{ft_mv_blogreg},’X’,X2,’Y’,Y);
>> disp(a)
0.5800
>> a = ft_mv_test(’mva’,{ft_mv_blogreg(’taskcoupling’,100)},’X’,{X1 X2},’Y’,Y);
>> disp(a)
[0.6900] [0.5900]
Hence, if we apply multitask learning then the results per dataset are influenced by the other datasets. The taskcoupling parameter effectively couples the models:
>> disp(corr(d.model{1,1},d.model{1,2}))
0.9999
Weaker coupling will cause weaker correlations between the models.
If we use the toolbox for prediction then we need to work under strict time constraints. During training we want an optimal MVA to be learned quickly and be able to update our MVA with new incoming data. During testing we want to obtain an MVA output fast such that the online system does not stall. In the following example, we show an example of how online training and testing is realized.
% create a naive Bayes classifier clf = ft_mv_naive % train on initial data clf = clf.train(X1,Y1) % get output for new data; produces a posterior over classes out1 = clf.test(x1); % train some more (update the trained classifier) clf = clf.train(X2,Y2); % get output for new data; produces a posterior over classes out2 = clf.test(x2);
In the above example we used a naive Bayes classifier but the MVA can be arbitrarily complex. For example,
clf = ft_mv_analysis({ft_mv_standardizer ... ft_mv_filterer(’maxfeatures’,10) ft_mv_svm})
creates a MVA which first standardizes the data, then performs feature selection and finally applies a support vector machine.
Some of the described methods can be resource intensive. Most notably, cross validation, grid search, feature selection and the application of ensemble methods since they all require iterating over a collection of multivariate analyses. These methods all support parallel computing as implemented through FieldTrip’s peer distributed computing module.
This module is maintained by Marcel van Gerven (m.vangerven@donders.ru.nl).
Share this page: