How to get started with peer distributed computing at the Donders Centre?

The best way to get started is by just giving it a try! At the DCCN we have a number of peerslaves running by default on the mentat cluster. These slaves can be used by everyone with a mentat account.

Log in on mentat and start MATLAB. Subsequently, add the peer toolbox to your path

addpath /home/common/matlab/fieldtrip/peer

Look at the list of peers

>> peerlist
peer: init
peer: spawning tcpserver thread
peer: spawning announce thread
peer: spawning discover thread
peer: spawning expire thread
there are  36 peers running in total (9 hosts, 4 users)
there are   1 peers running on  1 hosts as master
there are  30 peers running on  6 hosts as idle slave with 92.9 GB memory available
there are   0 peers running on  0 hosts as busy slave with 0 bytes memory required
there are   5 peers running on  4 hosts as zombie
idle slave  at public@mentat005.fcdonders.nl:1701, group = unknown, memavail = 8.0 GB, hostid = 2283800828
idle slave  at public@mentat005.fcdonders.nl:1702, group = unknown, memavail = 8.0 GB, hostid = 1668688972
idle slave  at public@mentat005.fcdonders.nl:1703, group = unknown, memavail = 4.0 GB, hostid = 1720787997
idle slave  at public@mentat005.fcdonders.nl:1704, group = unknown, memavail = 4.0 GB, hostid = 1114567742
idle slave  at public@mentat005.fcdonders.nl:1705, group = unknown, memavail = 8.0 GB, hostid = 2308744299
idle slave  at public@mentat005.fcdonders.nl:1706, group = unknown, memavail = 2.0 GB, hostid = 962355142
idle slave  at public@mentat005.fcdonders.nl:1707, group = unknown, memavail = 4.0 GB, hostid = 2482260930
idle slave  at public@mentat005.fcdonders.nl:1708, group = unknown, memavail = 2.0 GB, hostid = 2693960735
idle slave  at public@mentat005.fcdonders.nl:1709, group = unknown, memavail = 2.0 GB, hostid = 1168376778
idle slave  at public@mentat005.fcdonders.nl:1710, group = unknown, memavail = 2.0 GB, hostid = 1752673633
idle slave  at public@mentat005.fcdonders.nl:1711, group = unknown, memavail = 2.0 GB, hostid = 1473668088
idle slave  at public@mentat005.fcdonders.nl:1712, group = unknown, memavail = 2.0 GB, hostid = 1328645800
idle slave  at public@mentat241.dccn.nl:1701, group = unknown, memavail = 2.3 GB, hostid = 1359877943
idle slave  at public@mentat241.dccn.nl:1702, group = unknown, memavail = 2.4 GB, hostid = 1192383462
idle slave  at public@mentat241.dccn.nl:1703, group = unknown, memavail = 2.3 GB, hostid = 777082088
idle slave  at public@mentat241.dccn.nl:1704, group = unknown, memavail = 2.4 GB, hostid = 1994378816
idle slave  at public@mentat242.dccn.nl:1704, group = unknown, memavail = 2.8 GB, hostid = 2622928418
idle slave  at public@mentat242.dccn.nl:1705, group = unknown, memavail = 100.0 MB, hostid = 2767823420
idle slave  at public@mentat243.dccn.nl:1705, group = unknown, memavail = 2.8 GB, hostid = 4171164277
idle slave  at public@mentat243.dccn.nl:1706, group = unknown, memavail = 2.8 GB, hostid = 813068931
idle slave  at public@mentat243.dccn.nl:1707, group = unknown, memavail = 2.8 GB, hostid = 1428041752
idle slave  at public@mentat243.dccn.nl:1708, group = unknown, memavail = 2.8 GB, hostid = 996939241
idle slave  at public@mentat244.dccn.nl:1701, group = unknown, memavail = 3.4 GB, hostid = 3640941140
idle slave  at public@mentat244.dccn.nl:1706, group = unknown, memavail = 3.5 GB, hostid = 3169256307
idle slave  at public@mentat244.dccn.nl:1707, group = unknown, memavail = 3.2 GB, hostid = 1773755914
idle slave  at public@mentat245.dccn.nl:1701, group = unknown, memavail = 1.8 GB, hostid = 833242056
idle slave  at public@mentat245.dccn.nl:1702, group = unknown, memavail = 1.8 GB, hostid = 680637367
idle slave  at public@mentat245.dccn.nl:1703, group = unknown, memavail = 1.8 GB, hostid = 4216756569
idle slave  at public@mentat245.dccn.nl:1704, group = unknown, memavail = 1.8 GB, hostid = 3783967794
idle slave  at stavpel@mentat242.dccn.nl:1701, group = unknown, memavail = 4.0 GB, hostid = 2184992123
master      at stavpel@mentat236.dccn.nl:1701, group = unknown, memavail = 4.0 GB, hostid = 1430939496
zombie      at marzwi@mentat235.dccn.nl:1701, group = unknown, memavail = 4.0 GB, hostid = 4093926114
zombie      at public@mentat242.dccn.nl:1702, group = unknown, memavail = 1.5 GB, hostid = 3549471657
zombie      at public@mentat242.dccn.nl:1703, group = unknown, memavail = 3.6 GB, hostid = 2143043125
zombie      at public@mentat244.dccn.nl:1708, group = unknown, memavail = 3.2 GB, hostid = 2112539164
zombie      at roboos@mentat001.fcdonders.nl:1701, group = unknown, memavail = 4.0 GB, hostid = 3109138994

The slaves allow you to execute a job. Try the following

peercellfun(@pause, {10, 10, 10, 10, 10, 10})

and compare it to

cellfun(@pause, {10, 10, 10, 10, 10, 10})

This results in 6x a pause of 10 seconds, i.e. in total 60 seconds of time that the computer will be “busy”.

And now something more interesting

>> peercellfun(@rand, {1, 1, 1, 1})
submitted 1/4, collected 0/4, busy 1
submitted 2/4, collected 0/4, busy 2
submitted 3/4, collected 0/4, busy 3
submitted 4/4, collected 0/4, busy 4
submitted 4/4, collected 2/4, busy 2
submitted 4/4, collected 4/4, busy 0
approximate speedup ratio 0.028728
ans =
    0.8147    0.8147    0.8147    0.8147

You should notice that computing a single random number on 4 remote computers is not really efficient: the speedup factor is 0.029.

Now try

>> peercellfun(@rand, {1, 2, 3, 4})
submitted 1/4, collected 0/4, busy 1
submitted 2/4, collected 0/4, busy 2
submitted 3/4, collected 0/4, busy 3
submitted 4/4, collected 0/4, busy 4
submitted 4/4, collected 2/4, busy 2
submitted 4/4, collected 3/4, busy 1
submitted 4/4, collected 4/4, busy 0
??? Error using ==> peercellfun at 302
Non-scalar in Uniform output, at index 2, output 1. Set 'UniformOutput' to false.

Since the 4 different calls result in output with a different dimension, it cannot be represented in a vector. Instead you should do

>> peercellfun(@rand, {1, 2, 3, 4}, 'UniformOutput', false)
submitted 1/4, collected 0/4, busy 1
submitted 2/4, collected 0/4, busy 2
submitted 3/4, collected 0/4, busy 3
submitted 4/4, collected 0/4, busy 4
submitted 4/4, collected 2/4, busy 2
approximate speedup ratio 0.110529
ans = 
    [0.8147]    [2x2 double]    [3x3 double]    [4x4 double]

Now you can scale up, e.g. with the following

n = 100;
peercellfun(@pause, repmat({10}, 1, n));

to create some long-duration jobs, or

n = 100;
x = randn(500,500);
peercellfun(@inv, repmat({x}, 1, n));

to create some CPU intense jobs.

An interesting example is the following. Create a MATLAB function in your own directory with the following

function peertest(x, y)

% use the memory
tmp1 = zeros(x*1024*1024/8,1);

% create the cpu load
stopwatch = tic;
while toc(stopwatch)<y
  tmp2 = inv(randn(100));
end

and evaluate

n = 50;
peercellfun(@peertest, repmat({1000}, 1, n), repmat({30}, 1, n));

This should result in 50 evaluations, each internally using 1GB of memory and each creating 30 seconds of high-CPU computations. Compare this to a cellfun evaluation.

If you start another MATLAB session on the same or another machine, you can do

while(1); peerlist; pause(1); end

to monitor the behaviour of all the peers.

faq/how_to_get_started_with_peer_distributed_computing_at_the_donders_centre.txt · Last modified: 2010/09/02 16:23 by robert

You are here: startfaqhow_to_get_started_with_peer_distributed_computing_at_the_donders_centre
This DokuWiki features an Anymorphic Webdesign theme, customised by Eelke Spaak and Stephen Whitmarsh.
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0