The Parallelized PiBeta Analyzer

This page describes the parallelized version of the PiBeta analyzer and how to use it. The parallelization was necessary since the fitting of our DSC data appears too time consuming for a single computer. By using modern Linux clusters, this problem can be overcome. In order to simplify the use of the standard PiBeta analyzer, an intelligent scheme using the PVM (Parallel Virtual Machine, see PVM home page) has been introduced. Using this scheme, the analyzer takes care of spawning itself on multiple nodes of a Linux cluster, distributing data events and collecting the results.

Installation

The installation requires the newest Midas version 1.8.2, especially the file mana.o. On the PSI Fast cluster this has aleady been installed. Bryan Wright should be contacted to do it on the various UVa clusters. In addition, the PVM library version 3.4.2 or larger is required.

The the PVM system to work, a few environment variables have to be defined. Here is a typical .cshrc file:

setenv PVM_ROOT /usr/share/pvm3
setenv XPVM_ROOT /usr/share/xpvm
setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
setenv PATH "${PATH}:.:$PVM_ROOT/lib/LINUX"

It might be that the PVM_ROOT is at a different location, like /usr/local/pvm3.

The analyzer is compiled as normal, except that it is linked against the pmana.o file instead of the mana.o file and the libpvm3.a library. Following is an example snippet of the required Makefile, note the pmana.o and the libpvm3.a statements:

...
analyzer: $(LIB) $(LIB_DIR)/pmana.o analyzer.o $(MODULES)
  $(FF) $(FFLAGS) -o analyzer $(LIB_DIR)/pmana.o analyzer.o \
  $(MODULES) $(CERNLIB_DIR)/libpacklib.a /usr/share/pvm3/lib/LINUX/libpvm3.a \
  $(LIB) $(LDFLAGS) $(LIBS)
...

When the analyzer compiles correctly, one can proceed with the next step.

PVM configuration

The PVM system needs to be configured in order to obtain access to a number of cluster nodes. This can be done using the PVM command:

[stefan@fast ~]$ pvm
pvm> add node1
add node1
1 successful
                    HOST     DTID
                   node1    80000
pvm> add node2
add node2
1 successful
                    HOST     DTID
                   node2    c0000
pvm> conf
conf
3 hosts, 1 data format
                    HOST     DTID     ARCH   SPEED       DSIG
                    fast    40000    LINUX    1000 0x00408841
                   node1    80000    LINUX    1000 0x00408841
                   node2    c0000    LINUX    1000 0x00408841
pvm> quit
quit

Console: exit handler called
pvmd still running.

In the above case, the nodes node1 and node2 have been added to the "virtual" cluster, which can be inspected using the conf command.

I have not yet worked out a scheme who should use how many nodes. If nodes get used by two or more people in parallel, the performance drops down. So it would be better if noded are used exclusively by individuals, like node1-5 by person A, node6-10 by person B and so on.

The optimal number of nodes for an analysis tasks can vary. I made some tests with and without DSC fitting. Following figures list the computation time vs. number of nodes for 200k events (a single run), without DSC fitting (/analyzer/module switches/DSC = n).

As can be seen, the time it takes for one run starts at 360 sec. ("0" nodes mean the standard, non-paralellized analyzer, and "1" means that the master analyzer which distributes the events runs on the central node and once client node does tha analysis). When adding more nodes, the time drops to about 80 sec. which are already reached using five nodes. Adding more nodes does not help because the system is then limited by the harddisk and network bandwidth.

The picture is different when one uses fitting, since this is a much more CPU consuming process. Following figure shows the computation time for 2k events (1/100 of a run), with DSC fittin:

Since the system here is only limited by the CPU power, adding more than five nodes helps. But using 20 nodes, it still needs ~45min. to analyze one run, what tells me that we have to optimize the fitting algorithmus, which probably HP takes over.

Once the configuration is defined, one can leave the pvm program using the quit command which still leaves the configuration active.

User code modifications

In order to run the parallel analyzer correctly, one has to tell the system which events should be analyzed in parallel and which not. Usually you want the (normal) trigger events parallelized, meaning they are distributed to only one node per event. The scaler events instead should be sent to all nodes, one copy to each of them. Otherwise the first scaler event would be sent to node1, the second event to node2 and so on, which could be problematic if one wants to do a normalization based on scalers.

To tell the system which event to "farm" out, the GET_FARM flag needs to be put into the analyzer request like this:

NALYZE_REQUEST analyze_request[] = {

  { "Trigger",            /* event name */
    1,                    /* event ID */
    TRIGGER_ALL,          /* trigger mask */
    GET_SOME | GET_FARM,  /* get some events, farmed under PVM */
    "SYSTEM",             /* event buffer */
    TRUE,                 /* enabled */
    "", "",
    NULL,                 /* analyzer routine */
    trigger_module,       /* module list */
    trigger_bank_list,    /* bank list */
    10000,                /* RWNT buffer size */
    TRUE,                 /* use tests for this event */
  },

All other events are kept untouched.

Running the analyzer

For the parallel version of the analyzer to work, it has to be started from the directory where it resides, like /home/[yourname]/analyzer. If the data files are in a different directory, the full path has to be entered like:

analyzer -i /data/myname/run12345.mid -o /data/myname/run12345.rz

When the analyzer is started, it displays a notification about the number of nodes it is using:

[stefan@fast ~/analyzer]$ analyzer -i run36660.mid -o run36660.rz
Parallelizing analyzer on 5 machines
Running analyzer offline. Stop with "!"
Load ODB from run 36660...OK
run36660.mid:20671 events

Following things are slightly differnt:

The event counter increments in "bursts" due to the scheme events are distributed
Stopping the analyzer via "!" takes much longer since it has to stop also all other spawned analyzers
Ctrl-C should be a no-no!
All slave analyzers produce RZ files with their number, like n1.rz, n2.rz and so on
At the end of a run the master analyzer merges all RZ files, which takes also some time
Using analyzer tests still works normally, since the master analyzer sums all slave test at the end of the run

Troubleshooting

If the analyzer crashes, it can happen that the shared memory is now left over not on one machine but on all nodes. A command as been written which kills all leftover analyzer processes and shared memory. To execute it on one machine, enter:

mcleanup

to run it on all cluster machines, enter:

brsh mcleanup

The brsh can also be used to distribute other commands on all cluster nodes. Note that all nodes see the same disk directory residing on the central node.

S. Ritt, 17 Nov 2000