Getting Started - Hello World!, part 1

The RMS software has been packaged with three executable scripts, hello1, hello2 and hello3. Running these scripts will test your access to the software, and its ability to run across your cluster. The source code for the scripts are shown and explained in Getting Started - Hello World!, part 2.

The first script to run is hello1, which tests your access to RMS. The example here just uses “a b c d” as the arguments on the command-line, but the script can take any text as arguments. When you run hello1, the results should appear similar to this (“$” is the command-line prompt, and the text after it is the command that was run):

$ hello1 a b c d
Hello a!
Hello b!
Hello c!
Hello d!
Said hello to all of the arguments!

The hello1 script just runs on the current computer, not across the cluster. The second script, hello2, generates equivalent output, but it executes across the cluster. Running it should generate the following output:

$ hello2 a b c d
Hello a, from the cluster!
Hello b, from the cluster!
Hello c, from the cluster!
Hello d, from the cluster!
Said hello to all of the arguments from the cluster!

Be aware that this script will run slower (possibly much slower) than hello1. The reason is that RMS queues a job for a compute node on the cluster, then sends the commands to the RMS program running on that node for execution. So, the time it takes to generate the output will depend on how long it takes to allocate a compute node and then communicate the execution of those commands. To get a gauge of how long that might be, start up an interactive job on the cluster, and time how long it takes to get the command prompt.

If hello2 takes much longer than that to execute, use qstat (or your clusters’ equivalent) to determine if there is a compute job queued or running. RMS names its jobs “worker1”, “worker2”, … for each compute node it allocates. If the job is still queued, then RMS is waiting for the cluster to run the remote worker program. If there is no job queued or running, run “cat RMS_hello2*/worker1.pbs.err” to see if the RMS worker reported an error before it was able to contact the head RMS process.

If you get an error message from hello2, that is a signal that the cluster configuration is not right, and you’ll need to configure access to the cluster properly, so that RMS can run jobs on the cluster.

The third script, hello3, does the same computation across the cluster as hello2, but displays the output in the form that you will typically see when you run RMS with your own scripts. Running hello3 should display output similar to the following (where the lines between “Pipeline execution starting” and “Pipeline execution completed” actually overwrite each other on the screen as the scripts are run across the cluster, to show the progress messages about the computation):

$ hello3 a b c d
Commands:  5 commands to be executed.
[Wed Jan 20, 11:46am]:Pipeline execution starting.
[Wed Jan 20, 11:46am]:     hello[4]: 4q,0r,0f,0c
[Wed Jan 20, 11:46am]:     hello[4]: 3q,1r,0f,0c
[Wed Jan 20, 11:46am]:     hello[4]: 2q,2r,0f,0c
[Wed Jan 20, 11:46am]:     hello[4]: 1q,3r,0f,0c
[Wed Jan 20, 11:46am]:     hello[4]: 0q,4r,0f,0c
[Wed Jan 20, 11:46am]:     hello[4]: 0q,3r,0f,1c
[Wed Jan 20, 11:46am]:     hello[4]: 0q,2r,0f,2c
[Wed Jan 20, 11:46am]:     hello[4]: 0q,1r,0f,3c
[Wed Jan 20, 11:46am]:     helloAll[1]: 1q,0r,0f,0c
[Wed Jan 20, 11:46am]:     helloAll[1]: 0q,1r,0f,0c
[Wed Jan 20, 11:46am]:Pipeline execution completed.

In this output, “hello” and “helloAll” are the names of the two steps in the hello3 RMS script, the number in brackets is the count of how many commands of that step will run (4 hello commands and 1 helloAll command), and the abbreviation ‘q’ stands for queued, ‘r’ stands for running, ‘f’ stands for failed and ‘c’ stands for completed. These progress message lines display the currently queued and running commands, giving a real-time view of how the computation is executing across the cluster, and whether it has made any progress recently.

Whenever rms is executed in this mode (and this is the default execution mode), rms writes two files, in this case RMS_hello3.stdout and RMS_hello3.stderr. They contain the ordered standard output and standard error text from the scripts, output in the order the commands would be run if they were executed sequentially, regardless of the order they were actually executed across the cluster. So, if you run “cat RMS_hello3.stdout”, you will see the same output as was generated by running hello2.