This module is dedicated to the multivariate analysis of neuroimaging data. Contrary to other modules, it makes use of object-orientation and requires Matlab version >= 7.6.0 (R2008a). The module offers support for preprocessing, feature selection, classification, regression and reconstruction.
The module can be used for offline analysis in order to produce statistics through cross-validation as an alternative to more conventional statistical tests. The module can also be used for online analysis, for example in BCI experiments using the realtime module. We encourage the sustained development of the multivariate module by other developers.
Here we describe the multivariate module in detail. FieldTrip also offers a more high-level way of calling the module using the statistics_crossvalidate function (i.e., it automatically performs cross-validation for some specified multivariate analysis procedure). Please consult the multivariateanalysis tutorial for this.
This module is maintained by Marcel van Gerven (marcelge@cs.ru.nl). It is also available as a standalone toolbox called NMLT.
The core of this module is formed by a multivariate analysis. This is an object which holds a number of multivariate methods and can be instantiated as follows:
myproc = mva({mvmethod1 ... mvmethodN})
where an mvmethod can be a preprocessor, feature selector, or predictor. A predictor is either a classifier, a regressor or a reconstructor.
An multivariate analysis procedure can be learned using
myproc = myproc.train(X,Y)
Here, X is typically an N x M_1 x … x M_u multidimensional array of N examples and M_1 x … x M_u measurements (features) and Y is the design matrix which is typically an N x K_1 x … K_v multidimensional array of class labels or real-valued numbers.
Some of the predictors offer support for transfer learning, which can be thought of as a group level analysis in conventional statistics. In that case, data and design are cell arrays which contain multiple datasets and their associated designs.
An mva can be applied to unlabeled data using
Y = myproc.test(X)
which produces the multivariate analysis results. E.g., Y(1,:) = [0.1 0.9] would be the posterior probability distribution for one sample in a binary classification problem. For a regression problem, the outcomes are the predicted means (and possibly other statistics) of each sample. Reconstruction problems can be interpreted as classification/regression problems with multiple outputs. Examples are given below.
Multivariate methods are preprocessors, feature selectors, or predictors: classifiers, regressors or reconstructors. A preprocessor transforms the input data in order to facilitate subsequent classification. A feature selector is used to create a smaller number of features per example. This allows one to localize regions of interest and also often improves classification performance. A multivariate analysis normally ends with a predictor since this produces the outcomes of interest. Multivariate methods make use of a special field called params where all estimated parameters are stored.
In this section, we list some of the methods which are currently stable; all other methods in the toolbox can be considered beta versions. These methods are as yet undocumented and can be used at your own risk. For more information please check the function documentation by calling
help multivariate
in your Matlab session. For more information on the implemented methods, use one of
help preprocessors help featureselectors help classifiers help regressors help reconstructors
A standardizer can be used to standardize the data to have zero mean and a standard deviation of one.
A pcanalyzer performs a principal component analysis. If called with
pcanalyzer('proportion',p)
it will return the transformation using a restricted number of components. A value of p >= 1 specifies the number of components and 0 < p < 1 uses that number of components such the explained variance is p.
Feature selectors are methods which identify some optimal feature subset. This optimal subset can then be used as input to a predictor. For example,
mva(filterer('nfeatures',10) svmmethod())
creates a multivariate analysis which uses a filterer method as feature selector (see below) and gives the optimal feature subset as input to svmmethod.
A filterer orders the features according to some criterion and selects the first N features that give best performance. If we use
filterer('filter',@mutual_information,'nfeatures',10)
then the best 10 features will be used according to the mutual information filter (see help file for additional filters).
A more complex approach is to use a prespecified validator object which tests what the optimal number of features is. It can be called as follows:
filterer('filter',@mutual_information,'procedure',nb,'validator',crossvalidator('nfolds',0.9))
This creates a filterer that computes the mutual information between each feature and the class labels. It selects the first N features for which crossvalidation performance is best. The crossvalidator uses a naive Bayes classifier, trains on 90 percent of the data and tests on the remaining ten percent of the data.
A searchlight analysis uses a sphere with a certain radius and moves across the input data in order to get local performance estimates. For example,
searchlight('procedure',svmmethod,'step',2,'radius',3,'metric','accuracy')
will use cross-validation based on svmmethod and the evaluation metric accuracy to get an estimate for each sphere. The estimated average performance per feature can be retrieved by calling searchlight.getmodel after training.
The naive Bayes classifier assumes that P(C,X) can be written as P(C)P(X1|C)…P(XM|C) where P(Xi|C) are normal distributions. It can simply be called using
nb()
Other distributions can be used by calling the generalized naive Bayes classifier. For instance, we can use exponential distributions
gnb('conditional','exponential')
or truncated gaussians
truncgauss = @(x,mu,sigma)(normpdf(x,mu,sigma)./(normcdf(500,mu,sigma)-normcdf(0,mu,sigma))); truncmle = @(x)(mle(x,'pdf',truncgauss,'start', [nanmean(x) nanstd(x)],'lower', [0 0])); gnb('conditional',truncgauss,'mle',truncmle)
In the former case, we can specify the conditional as a string (see code for more options). In the latter case, both the conditional and maximum likelihood estimates are functions that need to be specified.
linear discriminant analysis can be called using
da()
but typically it is wiser to use a regularized version such as regularized linear discriminant analysis (rlda) or regularized fisher discriminant analysis (rfda).
A kernel method can be called using
kernelmethod()
Note that in this case, we use a particular kernel method. We may also use
kernelmethod('method',@l2svm_cg)
where the method is the (default) l2 regularized support vector machine @l2svm_cg, regularized kernel least squares @rkls, or kernel logistic regression @klr_cg.
In the above cases, we use a default setting for the regularization parameter C. Of course, this parameter can also be specified:
kernelmethod('method',@l2svm_cg,'C',90)
It could be the case that the mex files don't compile on your machine. As an alternative you may use svmmethod() which is the default classifier when using high level functions or svmnative() which is a simple wrapper to the bioinformatics toolbox SVM function.
logistic regression can simply be called using
logreg()
L1 and L2 regularization is implemented through the respective regularization parameters lambda and nu. Note that nonzero values for both amounts to elastic net logistic regression.
Group-sparsifying logistic regression combined feature selection and classification since it uses only some features during classification. If it is called using
gslr()
and will find a sparse solution using only some features. Many more options are possible which allow the grouping of features, e.g., for finding a small number of important channels or regions of interest. This algorithm is slow and a maximum number of used groups can be set using the maxgroup option. Please refer to the function documentation for other options. This functionality is now largely replaced by the Bayesian method blogreg.
Logistic regression with a Gaussian prior on the regression coefficients. Uses either Assumed Density Filtering or the Laplace method as approximation technique. Called as
lrgauss_adf
and
lrgauss_lap
One of the most sophisticated methods in the toolbox. Called as
blogreg
and implements the ideas mentioned in the paper Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior. This is currently the method of choice for doing classification. The scale parameter influences the regularization towards zero. Coupling of variables can be realized by specifying a prior. However, it is easier to realize by using the parameter coupling. This is a vector which specifies for each input dimension the strength of the coupling. E.g., suppose our data has dimensions ntrials x dimx x dimy x dimz and we use coupling [100 0 10] then a coupling with strength 100 will be assumed for the x dimension, no coupling for the y dimension and a coupling of 10 will be assumed for the z dimension. For other options, please consult the code.
linear regression can simply be called using
linreg()
Analogous to logistic regression, L1 and L2 regularizers can be specified, giving the lasso, ridge regressor or elastic net regularizer (when both L1 and L2 are specified).
not yet supported…
Not yet supported…
In order to use the module for offline analysis there are two options. Either, one can use timelockstatistics, freqstatistics or sourcestatistics, as mentioned in the multivariateanalysis tutorial, or, in order to gain more control, one can use lower level functions. For examples, in order to perform ten-fold cross-validation which standardizes the data and uses naive Bayes as a classifier, we may call:
cv = crossvalidator('procedure',{standardizer() nb()});
cv = cv.validate(X,Y);
cv.evaluate('metric','accuracy')
cv.significance
These calls construct a crossvalidator, validate using data and a design matrix, evaluate the classification accuracy and compute significance of the results. Note that other metrics can be computed using evaluate and significance also allows for some more advanced options. For more information, use one of
help validator.evaluate help validator.significance
For online analysis, we simply specify our favorite multivariate analysis, e.g.,
myproc = mva({standardizer svmmethod});
Then, the procedure can be trained with
myproc = myproc.train(X,Y);
using previously acquired data. Online, each data sample which comes in can be transformed into a prediction using:
clf = myproc.predict(sample);
This retrieves in this case the class with highest posterior probability.
In order to test a separate test set you may use
Y = myproc.test(testdata);
which may again be evaluated using
classifier.evaluate(Y,testdesign);
Note that different evaluation metrics and significance tests are implemented for classifiers, regressors, and reconstructors. Please consult one of
help classifier.evaluate help regressor.evaluate help reconstructor.evaluate help classifier.significance help regressor.significance help reconstructor.significance
for the possibilities. You can also check out the realtime module in order to learn how to do realtime processing with FieldTrip.
In this section, we describe some more advanced options of the classification module.
For many of the methods, we have free parameter we need to choose. Often, this is hard and we would like to have some sort of inner cross-validation which identifies the most optimal setting. This can be realized with the optimizer class. It allows the specification of several free parameters and their possible values. The optimizer will then do a grid search over all value combinations and use a specified cross-validation procedure to test performance. For example,
mva({optimizer('mvmethod',elasticlr,'validator',crossvalidator('nfolds',0.8,'verbose',true),'vars',{'lambdaL1' 'lambdaL2'},'vals',{[1 10] [1 10]},'metric','accuracy')})
will construct a multivariate analyis where we use the elasticlr classifier and where we do a grid search by varying the parameters lambdaL1 and lambdaL2 using values 1 and 10 (i.e., we test 4 possible configurations). Performance is tested by computing classification accuracy using a cross-validator which uses 80 percent of the data for training and the rest for testing. This procedure can be used to optimize any free parameter(s).
Ensemble methods apply multiple classifiers to a dataset. This can be realized by a nested cell-array in a multivariate analysis, which indicates that the containing methods should be evaluated in parallel. The resulting posteriors should be combined and then type-cast to the appropriate predictor class. For example
myproc = mva({standardizer {da svmmethod} combiner classifier});
standardizes the input dataset, applies discriminant analysis and a support vector machine in parallel, combines the outputs (posteriors) using a combination rule (product by default) and casts the results to a classifier output. Nested sequences can be realized by specifying a nested mva. E.g.,
myproc = mva({{mva({standardizer da}) mva({standardizer svmmethod})} combiner classifier});
realizes the same as the statement above.
Transfer learning allows us to learn models for multiple datasets where we get better models by borrowing statistical power from the other datasets. Multiple datasets can be used by specifying data and design matrices as cell arrays. If the receiving methods are not transfer learners then each dataset will be handled separately. If the receiving methods are transfer learners then the datasets will be considered together (but the results will still be defined w.r.t. individual datasets). Although there are more transfer learners available, we currently use
blogreg_transfer('taskcoupling',100)
whose parameter taskcoupling specifies the coupling between datasets (100 stands for a very strong coupling). Other parameters are similar to those of the blogreg method. I.e., we can combine transfer learning with coupling of other input dimensions etc.
The classification module contains support for (dynamic) Bayesian networks, hidden Markov models, Kalman filters and all that. These methods make use of the bayesbrain toolbox, which is part of the module but can also be used as a standalone toolbox.
More descriptions coming soon…
This code is still under heavy development. Contributions are welcome. For example, if you have a new multivariate method which you would like to include, simply do the following. Suppose you wish to create a file mymethod.m which implements a classifier. Then you may use the following template (regressors and reconstructors are implemented along the same lines):
classdef mymethod < classifier properties end methods function obj = mymethod(varargin) obj = obj@classifier(varargin{:}); end function p = estimate(obj,X,Y) % p is a struct that should contain the learned parameters and can later be called as obj.params end function Y = map(obj,X) % Y should return the posteriors as a matrix end function m = getmodel(obj) % return the classifier parameters % this function is useful but not required. end end end
Often, the simplest thing to do is to use this code as a wrapper which calls the train and test functions of your own toolbox. If the code is working then it can be added upon request. Send the wrapper and the (optional) toolbox for inclusion to marcelge@cs.ru.nl. Similar steps can be followed for preprocessors, feature selectors, regressors and reconstructors. Once the code is included I will prompt you for some documentation that is to be included on this Wiki. If you are a FieldTrip developer then you can contribute everything through the SVN repository.