Guardian

Batch Systems – Guardians of the Supercomputer

Batch Systems – Guardians of the Supercomputer

In most cases the nodes of your supercomputer will not be sufficient to run all the calculations of all users at the same time. Thus, someone has to schedule the calculations to the nodes such that all users get a fair share of computation time and the supercomputer is used as efficient as possible.

If you share your supercomputer with one or two colleagues one might be able to do this manually by mailing or calling each other. However, for a supercomputer with a dozens or even hundreds of users this is a task to be handled by software – the batch system.

Batch systems handle “jobs” or  “batches”.  Usually a job contains a list of resources it needs and a list of commands which should be executed on this resources. Most batch systems support interactive and non-interactive jobs. Within an interactive job you start your applications, wait for the completion and react on the results. A non-interactive job usually is a script with the commands to be run. The advantage of non-interactive jobs: They can run at night, while you sleep.

Beside managing jobs, the batch system can also show you various status information of your cluster. Most batch systems can show you how many resources are in use or which are currently free. They can also show you when your jobs will start or will be finished. If you need to run a interactive job, they can can reserve the resources for that run for the  next day – if you tell them.

The most commonly batch systems found on clusters today are

  • SLURM
  • PBS / Torque
  • IBM LoadLeveler
  • GridScheduler

Follow the links above for a short introduction of and usage guide for each batch system.

 

The SLURM Batch System

The SLURM Batch System

The Slurm Workload Manager is an open source batch system. In contrast to other batch systems, SLURM organizes nodes within partitions. One node can be part of multiple partitions. The partitions may have different restrictions. It is used by more then 50% of the Top500 supercomputers.

Status of the Supercomputer

With SLURM you can display the current state of your cluster with:

  • sinfo
    • This shows information on the nodes and available partitions
  • squeue
    • This shows information on the currently running and waiting jobs

Jobs

Batch systems handle “jobs” or  “batches”.  Usually a job contains a list of resources it needs and a list of commands which should be executed on this resources. For example, a typical Gromacs simulation needs 16 CPU cores, 32GB of memory and 2 hours of computation time. The command to start the simulation is “mdrun”.

Starting Interactive Jobs with SLURM

You can now request an interactive session from the batch system. For the SLURM batch system the command would be:

salloc –time=02:00:00 –ntasks=16 –mem=32768

SLURM will now try to allocate the resources you requested. Depending on the utilization of your HPC cluster, this can take anywhere between some seconds and days:

salloc: Pending job allocation 17295
salloc: job 17295 queued and waiting for resources

When the resources are available and allocated for you SLURM eventually responses with:

salloc: Granted job allocation 17295

You can now run the Gromacs simulation with

mdrun

When the simulation finished within the requested two hours, you can free the resources simply by typing

exit

salloc: Relinquishing job allocation 17295
salloc: Job allocation 17295 has been revoked.

If the simulation did not finish, SLURM will cancel your job shortly after the two hour deadline.

Starting Non-Interactive Jobs with SLURM

If you want to start multiple jobs or don’t want to wait until there are free resources for an interactive job write job scripts! A simple SLURM job script for the Gromacs simulation above would look like this:

#!/bin/bash
#SBATCH –job-name=GromacsSim1
#SBATCH –time=2:00:00
#SBATCH –ntasks=16
#SBATCH –mem=32768

cd gromacs/sim1/
mdrun

Safe it as “GromacsSim1.sbatch” and submit it to the batch system with

sbatch GromacsSim1.sbatch

To see the status of your job type

squeue

If the job is not listened, it finished. You should find log files of the job in the directory you submitted the job from.

Stopping SLURM Jobs

To cancel a job run

scancel <jobid>

you can also cancel all jobs of your user with

scancel -u <user name>

 

Running Applications on a Supercomputer

There are two types of supercomputers: The ones that have batch systems and the ones that don’t. Batch systems provide a way to start applications on nodes of the cluster that have free resources. Without a batch system you would have to manually search for a node which is not used by another user to start your application. You can of course run your application on a node which is occupied by another user but this will most likely result in a slower execution of your application and the one of the other user. In the best case both applications will run fine – in the worst case, one may crash because there is not enough memory in the system. To avoid this, batch systems exist. There are different batch systems out there so you might have to ask your system administrator which one is installed on your cluster. Or you can try some commands to see which work on your super computer.

Searching for Free Nodes on a Cluster Without a Batch System

To login to a node one has to know the name of the node. If you don’t know how the nodes are called, ask your administrator – or take a look in the /etc/hosts file:

cat /etc/hosts

This should return a list of names and their IP addresses (numbers). Ignore the numbers.

Now that you know the names of the nodes, try to see if you can connect and run commands on this nodes:

ssh <node name> uptime

If you can connect and run commands you should see the output of the “uptime” command. This command tells you how long the node is powered on and (more importantly) how high the “load” on the node is:

Linux Terminal Uptime Command

The last three numbers tell you how many processes were using a full CPU core for the last 1, 5 and 15 minutes respectively. The higher the number, the more processes are running on the node.

If you found a node with a low “load” login to this node running

ssh <nodename>

Here you can now start your application as described in Starting Applications on Linux – The Path Matters

Using a Batch System to Start Your Jobs on Nodes with Free Resources

As can be seen above, running applications on a cluster without a batch system is cumbersome and might result in overloaded nodes. If your cluster does not have a batch system, ask your administrator to install “Torque”. This is available for most Linux distributions free of charge and within their repositories.

All batch systems provide a command to show you the currently available resources and free nodes. To identify your batch system try the following commands:

  • If  “sinfo” returns  a list of nodes and their state, you have a SLURM batch system.
  • If  “qstat -n” returns  a list of nodes and their state, you have a PBS/Torque batch system.
  • If  “llclass” returns  a list of nodes and their state, you have a IBM LoadLeveler batch system.
  • If  “qhost” returns  a list of nodes and their state, you have a  GridScheduler batch system.

Use the link above to learn how to start applications on free nodes with the batch system installed on your cluster.

Top 10 Linux Command Line Tools You Should Know About

1. man

Type man before any command and you will get a manual page with a description of the command and its options. If this is too much detail, try whatis for a short description or use –help after the command as an option.

2. mc

If you don’t like typing all the commands for handling files and directories or editing files, use the Midnight Commander! You can start this program by typing mc.

You will see two columns showing directories. You can change to the other column by pressing the TAB key of your keyboard.

With the F* keys of your keyboard you can execute the respective commands in the lower row. So for example to copy a file you highlight the file or directory you want using the cursor keys or your mouse and press “F5”. You can also use your mouse and click on the “Copy” button in the lower row to do the same.

The midnight commander is able to display the contend and unzip / untar packed files like zip, tar, tar.gz and tar.bz2. You can handle this files like directories: Just highlight the file and press enter to see the content. Be careful with large files – in most cases, MC needs to unpack the files in a temporary location. For large files this might take a while and can fill all the free space in the temporary location (usually /tmp).

3 . wget

Need to download something from the Internet? Type wget http://www.example.com/file and the file will be downloaded.

3. screen

Screen is a very useful tool if you want to run an application in the background or without being actively logged in to the cluster. It is also a good idea to work within a screen over a bad internet connection – if the connection fails you can later resume your work were you left it.

Type screen to start a new virtual terminal. When you want to log off press Ctrl + A to get into the “command” mode and then press D to “detach” from the screen terminal. If you want to resume your work in the screen or check its status type screen -r to resume your screen session.

To exit, just type exit.

4. grep

Grep is a very powerful tool. It helps you to find anything within a text. This text can be within a file or the output of another command. To search for the word “hello” within all files in the current directory type grep “hello” *  If grep finds a file with “hello” in it, it will print the file-name and the full line the word was found in. You can also search recursively in all directories  with the -r option. To search in any directory in your home type: grep -r hello /home/nico/

As mentioned above, you can also search for something within the output of another command. This is done by “piping” the output of this command into grep with the special “|” char:  command | grep “hello”

For example you can search for a file name in the output of the ls command with ls -alh | grep “hello”  This will list all files with “hello” in the file name in the current directory.

6. time

The time command measure the runtime of the following command. To measure the runtime of your application type time ./myapp

7. date

Running date prints the current date in an human readable format on the terminal. One can format the output of the date command by adding special formatting options (which can be found on the man page). For example, the command date +%y-%m-%d_%H%M will return “15-08-30_1005” on August 8th, 2015 on 10:05am.

7. find

Find searches for files below the current directory or the one supplied to it. To find all files and directories with “hello” in their name one can either type find . -name hello or find . | grep hello Find can also search for certain types of files. To list only directories with type find . -name hello -type d

8. tail / head

The commands tail and head, print the last and first 10 lines of a file respectively. One can change the number of lines by adding the “-n” option followed by the number of lines to print.

9. awk

AWK is the Swiss-army-knife of automatically editing and analysing of structured text data. The command ls -alh | awk {‘print $3 “,” $6’} will only return the third and sixth column of the  ls -alh output separated by a comma. To use awk on a text file one can either use cat: cat myfile | awk {‘print $3 “,” $6’} or use redirectors:  awk < myfile ‘{ print $3 “,”  $6 }’

For a more detailed introduction see the simple awk tutorial by David A. Holland.

10. du

The du -h command shows the size of all files and directories in and below the current directory in a human readable format. This is especially interesting if you want to know the size of a whole directory. Type du -hs Downloads to only display the size of the “Downloads” directory without all subdirectories and files in it.

11. diff

The diff command shows the difference between two files line by line. Just run it with diff file1 file2 and it will show you the lines that differ between those files.