Multivariate analysis of MEG/EEG data

Introduction

The objective of this tutorial is to give an introduction to multivariate analysis of neuroimaging data. Multivariate methods aim to find task-related features in the data by predicting to which task single trials belong. The idea is that if the features can be used for prediction then those features must be important from the perspective of the task. Note that this is very different from classical statistical testing, where such features are identified by pooling over multiple trials and/or subjects and where features are typically assumed to be independent. In this tutorial, you will learn to apply standard classification algorithms such as the Support Vector Machine to electrophysiological data.

This tutorial builds on skills acquired in the preprocessing, event related averaging and time-frequency analysis tutorials.

Note that multivariate analysis is a very broad topic which we only touch upon in this tutorial. An overview of the methods implemented by the multivariate module can be found here.

Background

Multivariate methods present an alternative approach to determine relevance of features in the data. Furthermore, they allow for real-time analysis of neuroimaging data which allows for new experimental designs based on the idea of brain-state dependent stimulation as well as the development of brain-computer interfaces. The methods described are also heavily used for the purpose of multivariate fMRI analysis. For an introduction into the use of these methods please consult the following tutorial.

Procedure

In this tutorial we will use classifiers to analyze a brain-computer interfacing dataset which has been used in this paper. In short: 275-channel MEG data was acquired while the subject was instructed with a centrally presented cue to covertly attend to the left (127 trials) or to the right (128 trials) visual field (one faulty sensor was removed in the subsequent analyses). The experimental question is whether we can predict on a single-trial level to which condition (attention to the left or to the right) the single trials belong.

The data has already been segmented into the trials of interest using ft_definetrial and has been preprocessed with ft_preprocessing. The data has been detrended and downsampled to 300 Hz. The trials start at cue offset and end 2.5 seconds later. The subject has been attending to either the left or right direction during this period. No artifact rejection has been performed.

You can find the data here.

In the following, we will work our way through the time- and frequency-domain analysis pipelines as shown in the figure.

Sensor level classification in the time domain

We will start by analyzing the data in the time domain for subject 1.

load covattpre1;

We now perform a timelock analysis in order to make the data suitable as input to ft_timelockstatistics. That is, we are going to predict attention direction from temporal data. For the purpose of demonstration we will focus on occipital channels only.

cfg             = [];
cfg.parameter   = 'trial';
cfg.keeptrials  = 'yes'; % classifiers operate on individual trials
cfg.channel     = {'MLO' 'MRO'}; % occipital channels only
tleft   = ft_timelockanalysis(cfg,left);
tright  = ft_timelockanalysis(cfg,right);

Now we specify cross-validation as a method. This ensures that we will perform a classification of our data based on ten-fold cross-validation. This splits up the data into ten partitions or folds and attempts to build ten different classifiers using the remaining nine folds. The end result is then averaged over folds.

cfg         = [];
cfg.layout  = 'CTF275.lay';
cfg.method  = 'crossvalidate';

We also need to specify a design matrix; this is simply a vector with labels 1 for the trials belonging to data for the first condition and labels 2 for trials belonging to data for the second condition

cfg.design  = [ones(size(tleft.trial,1),1); 2*ones(size(tright.trial,1),1)]';

Let's focus on the last segment of the data

cfg.latency     = [2.0 2.5]; % final bit of the attention period

Finally, we call ft_timelockstatistics which uses the default classification procedure; namely a standardization of the data (subtraction of the mean and division by the standard deviation), followed by applying a linear support vector machine:

stat = ft_timelockstatistics(cfg,tleft,tright);

The stat structure now contains the classification performance (proportion of correctly classified trials)

stat.performance

but it is not significant at p=0.05, as shown by the p-value:

stat.pvalue

Note that we may be interested in other representations of classification performance such as the contingency matrix with true classes in rows and predicted classes in columns. One way to compute this is to use the option

cfg.metric = 'contingency';

when running ft_timelockstatistics. A more direct way is to call

stat.cv.performance('contingency')

which re-evaluates the results in the stat.cv object. This object contains all the results about the classification, including the trained classifiers. Please consult the help of the ft_mv_performance function for other available metrics.

We may also plot the parameters of the used classifier as if it were neuroimaging data. For each n-th condition we will obtain a stat.modeln field, containing the classification parameters for that condition. The interpretation of the parameters depends on the used classifier. By default, the employed classifier is a Support Vector Machine (SVM).

cfg             = [];
cfg.interplimits = 'electrodes'; 
cfg.interpolation = 'linear';
cfg.layout      = 'CTF275.lay';
cfg.xlim        = [2.0 2.5];
cfg.zparam      = 'model1';
cfg.comments    = '';
cfg.colorbar    = 'yes';
ft_topoplotER(cfg,stat);

Note that the plot is hard to interpret. The fact that contributions extend beyond the selected channels is due to interpolation artifacts. If we look at individual features using imagesc(stat.model1) then it will be found that all features are used due to the way classifiers operate. One way to solve this is to use dimensionality reduction or feature selection. We will see examples later in this tutorial.

Exercise 1

* Explain which information the contingency matrix gives you, which the accuracy does not.

* Redo the above analysis with a latency of [0 0.5]. Explain what you believe to be the optimal latency with which to analyse this data.

* Suppose you use a dataset consisting of randomly generated data. What do you expect when you test classifier performance using the same data? And what do you expect if you use a second randomly generated dataset to test the classifier? Use the concepts of overfitting and generalization in your explanation.

* Suppose you try multiple different classification procedures and find at some point that you reach a classification performance that is significantly better than chance at p=0.05. Should you trust this result? Why (not)?

Sensor level classification in the frequency domain

We now try to classify the same data in the frequency domain. Therefore, we need to perform a frequency analysis. Let's focus on the alpha band at the end of the attention period.

cfg              = [];
cfg.output       = 'pow';
cfg.method       = 'mtmconvol';
cfg.taper        = 'hanning';
cfg.foi          = 8:2:14;
cfg.t_ftimwin    = ones(length(cfg.foi),1).*0.5;
cfg.channel      = {'MLO' 'MRO'};
cfg.toi          = 2.0:0.1:2.5;
cfg.keeptrials   = 'yes'; % classifiers operate on individual trials
tfrleft           = ft_freqanalysis(cfg, left);
tfrright          = ft_freqanalysis(cfg, right);

Now we call freqstatistics with crossvalidate as our method.

cfg         = [];
cfg.layout  = 'CTF275.lay';
cfg.method  = 'crossvalidate';
cfg.design  = [ones(size(tfrleft.powspctrm,1),1); 2*ones(size(tfrright.powspctrm,1),1)]';
stat = ft_freqstatistics(cfg,tfrleft,tfrright);

We can compare classification performance with the previous results

stat.performance
stat.pvalue

Again, we may plot the classifier parameters to obtain a so-called importance map

cfg             = [];
cfg.interplimits = 'electrodes'; 
cfg.interpolation = 'linear';
cfg.layout      = 'CTF275.lay';
cfg.zparam      = 'model1';
cfg.comment     = '';
cfg.colorbar    = 'yes';
ft_topoplotTFR(cfg,stat);

Instead of ten-fold cross-validation, we could also perform a so-called leave-one-out cross-validation. This can be realized by setting 'cfg.nfolds = inf'. However, this will take a lot of time since we need to build as many classifiers as there are trials in our data. You may also use a percentage of the data to train a classifier while the remaining percentage is used to test the classifier. E.g., 'cfg.nfolds=0.80' will use 80 percent of the data for training and the remaining 20 percent of the data for testing.

Exercise 2

* Give your interpretation of the importance map you have plotted.

* Rerun the previous cross-validation with 'cfg.nfolds=0.80'. Explain the difference and motivate why it is important to perform cross-validation instead of just dividing the data into one training and one test set.

Dimensionality reduction and feature selection

One important thing to consider when classifying neuroimaging data is that we wish to reduce as much as possible the number of features used to classify the data. There are two ways to achieve this: either we map the input data to another space with a smaller number of dimensions or we select a number of dimensions from the original space. Going back to our analysis of timelocked data, we could for instance use common spatial patterns (see this paper for an explanation) to map our data to a different space.

cfg         = [];
cfg.layout  = 'CTF275.lay';
cfg.method  = 'crossvalidate';
cfg.design  = [ones(size(tleft.trial,1),1); 2*ones(size(tright.trial,1),1)]';
cfg.mva = {ft_mv_standardizer ft_mv_csp('numchan',38) ft_mv_svm};
cfg.latency = [2.0 2.5];
stat = ft_timelockstatistics(cfg,tleft,tright);
stat.performance

or when analyzing the data in the frequency domain, we could for instance perform a feature selection in the original space

cfg         = [];
cfg.layout  = 'CTF275.lay';
cfg.method  = 'crossvalidate';
cfg.design  = [ones(size(tfrleft.powspctrm,1),1); 2*ones(size(tfrright.powspctrm,1),1)]';
cfg.mva = {ft_mv_standardizer ft_mv_glmnet('lambda',0.1)};

This multivariate analysis standardizes the data and subsequently calls elastic net logistic regression with regularization parameter lambda equal to 0.1. This parameter influences how many features will be selected for classification. The larger the parameter, the less features will be used. For a description of the used algorithm, you may consult the following paper. The elastic net algorithm gives the following result

stat = ft_freqstatistics(cfg,tfrleft,tfrright);
stat.performance

If we inspect stat.model1 then we find that just a small number of features from the total of 912 possible features are used. This is also reflected in the topoplot.

cfg             = [];
cfg.layout      = 'CTF275.lay';
cfg.zparam      = 'model1';
cfg.comment     = '';
cfg.colorbar    = 'yes';
ft_topoplotTFR(cfg,stat);

Exercise 3

Use matlab to compute the number of non-zero elements in stat.model1.

Repeat the above analysis with a different value for lambda. Explain the results.

Lambda is a free parameter in our model. How would you determine the optimal setting for this parameter?

If we use more and more features then classification performance will first go up but eventually starts to degrade. Explain why this may happen.

Suppose we wish to select the optimal feature subset by testing all possible subsets. How many subsets do we need to test when we have n features in total?

Multitask learning

Typically, different classifiers are constructed for individual datasets. However, sometimes it may be beneficial to combine multiple datasets. For example, in order to construct classifiers that in some way are related to each other. This is of importance when we wish to move from the individual to the group level in our analysis. Another benefit of this approach is that if we have little data to begin with, it can be useful to combine all data in order to obtain classifiers that show improved predictive performance. This is for instance important in the context of brain-computer interfacing, where we wish to construct good classifiers with as little data as possible. The idea of learning classifiers by combining multiple datasets is known as multitask learning.

A naive way to do multitask learning is to pool the trials of multiple datasets and apply a standard classifier. However, this is in general not a very good idea since it ignores the fact that there is real variability between these datasets. Hence, we would like to capture the commonalities as well as the differences between datasets. Here, we discuss one approach. We use a Bayesian logistic regression algorithm (described here) that will try to find the same features in multiple datasets but will fit parameters with respect to these features for individual datasets. This is achieved by coupling features over datasets.

As an example we will split up the trials for our subject into two subsets to artifically create two separate datasets:

earlyleft  = tfrleft; earlyleft.powspctrm = tfrleft.powspctrm(1:63,:,:,:); 
earlyright = tfrright; earlyright.powspctrm = tfrright.powspctrm(1:63,:,:,:); 

lateleft  = tfrleft; lateleft.powspctrm = tfrleft.powspctrm(64:126,:,:,:); 
lateright = tfrright; lateright.powspctrm = tfrright.powspctrm(64:126,:,:,:); 

In the design matrix we specify the conditions but now we also specify the dataset index in the cfg.dataset field. Furthermore, we call freqstatistics by listing the datasets in the sequence implied by the design. The classifier ft_mv_blogreg will now perform the multitask learning for us. We obtain the following result:

cfg         = [];
cfg.layout  = 'CTF275.lay';
cfg.method  = 'crossvalidate';
design1     = [ones(size(earlyleft.powspctrm,1),1); 2*ones(size(earlyright.powspctrm,1),1)];
design2     = [ones(size(lateleft.powspctrm,1),1); 2*ones(size(lateright.powspctrm,1),1)];
cfg.design  = [design1; design2]';
cfg.dataset = [ones(length(design1),1); 2*ones(length(design2),1)];
cfg.mva = {ft_mv_standardizer() ft_mv_blogreg};

stat = ft_freqstatistics(cfg,earlyleft,earlyright,lateleft,lateright);

stat.performance

If we plot the parameters for both datasets then we can observe the effect that the same features are selected whereas the estimated parameter values are different. Note that modelx_y stands for the x-th model of the y-th dataset.

cfg             = [];
cfg.layout      = 'CTF275.lay';
cfg.zparam      = 'model1';
cfg.comment     = '';

stat2 = stat;
stat.model1  = stat.model1_1;
stat2.model1 = stat2.model1_2;

subplot(1,2,1);
ft_topoplotTFR(cfg,stat);
subplot(1,2,2);
ft_topoplotTFR(cfg,stat2);

The figure shows the parameter plots for the two datasets. Note that the same features are found whereas the obtained parameter values are different.

Exercise 4

BCI data is often collected over multiple sessions. Explain why multitask learning may help to get better results as compared with just combining the multisession data into one big dataset.

Conclusion

In this tutorial we have touched on a number of important issues in the classification of neuroimaging data. However, we have barely scratched the surface of this field since there are many more possibilities to explore. First, many different procedures can be devised that use different forms of preprocessing, feature selection and/or prediction. E.g., we may want to deal with continuous instead of discrete outputs (regression versus classification), we may want to perform a Bayesian analysis which also gives error bars on the predictions, or we may want to use timeseries analysis in order to predict changes in ongoing activity.

In order to use some of the more advanced methods it is required to call lower level functions in FieldTrip's multivariate module directly, as described here. In order to construct experimental designs that make use of multivariate analysis or to build BCI or neurofeedback applications we have developed the realtime module.

—-

This tutorial was last tested with version 20110409 of FieldTrip by Jan-Mathijs, using Matlab 2009b on a 64-bit Linux platform.

tutorial/multivariateanalysis.txt · Last modified: 2011/12/15 16:45 by robert

You are here: starttutorialmultivariateanalysis
This DokuWiki features an Anymorphic Webdesign theme, customised by Eelke Spaak and Stephen Whitmarsh.
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0