The following is intended to provide guidelines and suggestions how to set up a chain of analysis steps that makes most efficient use of your (and your computer’s) time. Much effort can be spared by some foresight and by using a structure that allows you to retrace your steps – something you will need to do, sooner or later. In addition, Fieldtrip has a certain philosophy of use. Especially for a beginner this might be implicit and at times confusing. By suggesting the following way to set up an analysis pipeline we hope to help you along your way. Note, though, that although made in collaboration with others, the following is only the first try in the hope it will be improved upon further by FieldTrip community.
The analysis of an experiment typically involves a lot of repetition as similar analysis steps are taken for every condition and for every subject. Also, the same steps are often repeated with only slightly different settings (e.g. filter, timing). Because of this we should program our own functions around the FieldTrip functions. Fieldtrip functions are not intended to be just typed into Matlab’s command window. If you do, you are guaranteed to lose record of preceding steps, repeat yourself unnecessary, or unknowingly change settings between subjects or conditions. Another no-no is the practice of collecting all your steps in one large m-file and copy-pasting parts in the command window. Besides becoming easily cluttered with previous tries, different filter settings, etc., it does not create a clear continuity between steps, and most importantly, does not permit batching. Batching is the ultimate aim of any analysis pipeline. It means that in the end most of your analysis steps can be repeated over all subjects and/or conditions with a single command.
As stated before, by making our own function around FieldTrip functions we can in a later stage easily repeat them, e.g. over multiple subjects. However, every subject or condition can have different filenames, difference variables, different filter-settings, different trials that have to be rejected, etc. A good idea is to write all your subject-specific details in a separate m-file. You can choose to have one m-file per subject, or one in which you combine all subjects. In the current example we will use the first option:
example file: Subject01.m
subjectdata.subjectdir = 'Subject01-Marloes_Huiskens'; subjectdata.mfiledir = '/home/electromag/stewhi/Scripts'; subjectdata.datadir = 'mtw05a_1200hz_20090819_04_600Hz.ds'; subjectdata.subjectnr = '01'; subjectdata.MRI = 'huiskens_m'; % subject made a mistake on the first trial
Save this as Subject01.m in a personal folder that you will need to add to the Matlab path. Using the command line you can retrieve this personal data by calling “Subject01” or from any script by using “eval(‘Subject01’)”. This will return the structure subjectdata, containing all the fields. We use this structure as input for our own functions, giving us a flexible way of combining generic functions and subject-specific settings. In addition, you could use this file to add further comments such as ”% subject made a mistake on the first trial”.
After opening a new m-file write something in the style of:
function output = MyOwnFunction(input) begin output = sqrt(input); end
Make sure the filename is the same as the functionname (e.g. MyOwnFunction) and save it in your private folder. Now you can, from within any script or from the command line, use your function. E.g. “MyOwnFunction(4)” will give you the answer “2”. To put the answer in a variable for future use you need to call something like “test = MyOwnFunction(4)”. This is the way most Fieldtrip functions work: you provide data and configuration settings as input and the function will return data into the variable(s) you specified. Probably Fieldtrip will in the future also save the returned data to disk, but for now we need to do that ourselves – using our own functions. In addition, by using subject m-files as described above, we can also save stuff there. For instance which time-points to reject (after visual inspection):
subjectdata.visualartifacts = [160611,162906; 473717,492076; 604850,606076; 702196,703615; 736261,738205; 850361,852159;887956,895200; 959974,972785;1096344,1099772];
This will save us a lot of disk space since we do not have to save all in-between steps (e.g. before artifact rejection) while making it easy to retrace our steps to any previous time in the analysis pipeline.
In the end we’ll end up with a collection of several functions, either depending on the output of previous functions (e.g. preprocessing or artifact rejection) while others could in principle be called in parallel (e.g. averaging per condition or per subject). This could result in an analysis pipeline such as this (simplified) one:
This will allow us to automate most of the steps that do not require manual labor (in this case that is only when we need to do a visual inspection of the data to reject artifacts). This is called batching. Large dataset will often require quite some processing time and it will therefore often be the case that a batch will be run overnight. The worst that can happen is that the next morning you’ll see some red lines in your Matlab command window just because of a small mistake in one of the first subjects. Therefore, you might want to use the try-catch option in Matlab. Whenever something goes wrong between the try and catch it will jump to the catch after which it will just continue. E.g.:
for i = 1:number_of_subjects
try
my_preprocessing_function(i)
catch
disp([‘Something was wrong with Subject’ int2str(i) ‘! Continuing with next in line’]);
end
end
Neurophysiological data can become quite large with the result that disk space, RAM and processing time can become compromised. Also, how to program memory-efficient in Matlab is also not always obvious. Some common issues will now be discussed but it is highly advised to read the following links sooner or later:
http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/brh72ex-25.html http://www.mathworks.com/support/tech-notes/1100/1106.html
First off, consider if you really need to save all intermediate steps in your analysis pipeline. For instance, it might not take that much time to calculate your planar gradient and it will save you a lot of disk space it you only have to write your axial data to disk.
When using MEG data you might very well be able to downsample your data. This will save disk space and will speed up all subsequent processing steps. You can then backup your original data on external disks or DVD’s for long-term storage.
Do make sure you save the important parameters (e.g. rejected trials) so you can always rerun your script.
Within a script or function make sure you clear large variables that you don’t need anymore using the clear statement. Note that Matlab’s memory use might not be intuitive. For instance, reloading a large dataset into the same variable will result in matlab allocating twice the memory you need:
Using fieldtrip within your own functions will make it easier to stick to generic variable- and file names, and to organize your data in separate folders.
Only import into MATLAB as much of a large data set as you need for the problem you are trying to solve. Many users are tempted to try and load the entire file first, and then process it with MATLAB. This is not always necessary. Use the whos function with the -file option to preview the file. This command displays each array in the MAT-file that you specify and the number of bytes in the array:
whos -file session1.mat Name Size Bytes Class Attributes S2 1x1 723 struct x 100x200 72 double sparse Mat4 4x20 640 double A 3151872x1 3151872 uint8 Seq 1x912211 912211 int8
If there are large arrays in the MAT-file that you do not need for your current task, you can selectively import only those variables that you want using load, for instance:
seq = load(‘session1.mat’,’Seq’).
Avoid creating large temporary variables, and also make it a practice to clear those temporary variables you do use when they are no longer needed. For example, when you create a large array of zeros, instead of saving to a temporary variable A, and then converting A to a single:
A = zeros(1e6,1); As = single(A);
use just the one command to do both operations:
A = zeros(1e6,1,'single');
Using the repmat function, array preallocation and for loops are other ways to work on nondouble data without requiring temporary storage in memory.
When working with large data sets, be aware that Matlab makes a temporary copy of an input variable if the called function modifies its value. This temporarily doubles the memory required to store the array. One way to use less memory in this situation is to use nested functions. A nested function shares the workspace of all outer functions, giving the nested function access to data outside of its usual scope. In the example shown here, nested function setrowval has direct access to the workspace of the outer function myfun, making it unnecessary to pass a copy of the variable in the function call. When setrowval modifies the value of A, it modifies it in the workspace of the calling function. There is no need to use additional memory to hold a separate array for the function being called, and there also is no need to return the modified value of A:
function myfun
A = magic(500);
function setrowval(row, value)
A(row,:) = value;
end
setrowval(400, 0);
disp('The new value of A(399:401,1:10) is')
A(399:401,1:10)
end
Matlab provides you with different sizes of data classes, such as double and uint8, so you do not need to use large classes to store your smaller segments of data. For example, it takes 7,000 KB less memory to store 1,000 small unsigned integer values using the uint8 class than it does with double.
In the course of a MATLAB session, memory can become fragmented due to dynamic memory allocation and deallocation. For and while loops that incrementally increase the size of a data structure each time through the loop can add to this fragmentation as they have to repeatedly find and allocate larger blocks of memory to store the data. When memory is fragmented, there may be plenty of free space, but not enough contiguous memory to store a new large variable. To make more efficient use of your memory, preallocate a block of memory large enough to hold the matrix at its final size before entering the loop. When you preallocate memory for an array, MATLAB reserves sufficient contiguous space for the entire full-size array at the beginning of the computation. Once you have this space, you can add elements to the array without having to continually allocate new space for it in memory.
The following code creates a scalar variable x, and then gradually increases the size of x in a for loop instead of preallocating the required amount of memory at the start:
x = 0; for k = 2:1000 x(k) = x(k-1) + 5; end
Change the first line to preallocate a 1-by-1000 block of memory for x initialized to zero. This time there is no need to repeatedly reallocate memory and move data as more values are assigned to x in the loop:
x = zeros(1, 1000); for k = 2:1000 x(k) = x(k-1) + 5; end
For more information on preallocation, see http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/f8-784135.html#f8-793781
When you are working with a very large data set repeatedly or interactively, clear the old variable first to make space for the new variable. Otherwise, MATLAB requires temporary storage of equal size before overriding the variable. For example,
a = rand(100e6,1) % 800 MB array a = rand(100e6,1) % New 800 MB array ??? Error using ==> rand Out of memory. Type HELP MEMORY for your options. clear a a = rand(100e6,1) % New 800 MB array
Now we covered the basics it’s time for some specific examples. The following function will load the data as specified in Subject01.m, uses the databrowser for visual inspection of artifacts, rejects those trials containing artifacts and then saves the data in a separate folder as “01_preproc_dataM.mat”. You can simply call it by “do_preprocess_MM(‘Subject01’);”
function do_preproces_MM(Subjectm)
cfg = [];
if nargin == 0
disp('Not enough input arguments');
return;
end
eval(Subjectm);
outputdir = 'AnalysisM';
%%% define trials
cfg.dataset = [subjectdata.subjectdir filesep subjectdata.datadir];
cfg.trialdef.eventtype = 'frontpanel trigger';
cfg.trialdef.prestim = 1.5;
cfg.trialdef.poststim = 1.5;
cfg.continuous = 'no';
cfg.lpfilter = 'no';
cfg.continuous = 'yes';
cfg.trialfun = 'motormirror_trialfun'; % located in \Scripts
cfg.channel = 'MEG';
cfg.layout = 'EEG1020.lay';
cfg = ft_definetrial(cfg);
%%% if there are visual artifacts already in subject m-file use those. They will show up in databrowser
try
cfg.artfctdef.eog.artifact = subjectdata.visualartifacts;
catch
end
%%% visual detection of jumps etc
cfg.continuous = 'yes';
cfg.blocksize = 20;
cfg.eventfile = [];
cfg.viewmode = 'butterfly';
cfg = ft_databrowser(cfg);
%%% enter visually detected artifacts in subject m-file;
fid = fopen([subjectdata.mfiledir filesep Subjectm '.m'],'At');
fprintf(fid,'\n%s\n',['%%% Entered @ ' datestr(now)]);
fprintf(fid,'%s',['subjectdata.visualartifacts = [ ' ]);
if isempty(cfg.artfctdef.visual.artifact) == 0
for i = 1 : size(cfg.artfctdef.visual.artifact,1)
fprintf(fid,'%u%s%u%s',cfg.artfctdef.visual.artifact(i,1),' ',cfg.artfctdef.visual.artifact(i,2),';');
end
end
fprintf(fid,'%s\n',[ ' ]; ']);
fclose all;
%%% reject artifacts
cfg.artfctdef.reject = 'complete';
cfg = ft_rejectartifact(cfg);
%%% make directory, if needed, to save all analysis data
if exist(outputdir) == 0
mkdir(outputdir)
end
%%% Preprocess and SAVE
dataM = ft_preprocessing(cfg);
save([outputdir filesep subjectdata.subjectnr '_preproc_dataM'],'dataM','-V7.3')
clear all;