Guardian

Batch Systems – Guardians of the Supercomputer

Batch Systems – Guardians of the Supercomputer

In most cases the nodes of your supercomputer will not be sufficient to run all the calculations of all users at the same time. Thus, someone has to schedule the calculations to the nodes such that all users get a fair share of computation time and the supercomputer is used as efficient as possible.

If you share your supercomputer with one or two colleagues one might be able to do this manually by mailing or calling each other. However, for a supercomputer with a dozens or even hundreds of users this is a task to be handled by software – the batch system.

Batch systems handle “jobs” or  “batches”.  Usually a job contains a list of resources it needs and a list of commands which should be executed on this resources. Most batch systems support interactive and non-interactive jobs. Within an interactive job you start your applications, wait for the completion and react on the results. A non-interactive job usually is a script with the commands to be run. The advantage of non-interactive jobs: They can run at night, while you sleep.

Beside managing jobs, the batch system can also show you various status information of your cluster. Most batch systems can show you how many resources are in use or which are currently free. They can also show you when your jobs will start or will be finished. If you need to run a interactive job, they can can reserve the resources for that run for the  next day – if you tell them.

The most commonly batch systems found on clusters today are

  • SLURM
  • PBS / Torque
  • IBM LoadLeveler
  • GridScheduler

Follow the links above for a short introduction of and usage guide for each batch system.

 

The SLURM Batch System

The SLURM Batch System

The Slurm Workload Manager is an open source batch system. In contrast to other batch systems, SLURM organizes nodes within partitions. One node can be part of multiple partitions. The partitions may have different restrictions. It is used by more then 50% of the Top500 supercomputers.

Status of the Supercomputer

With SLURM you can display the current state of your cluster with:

  • sinfo
    • This shows information on the nodes and available partitions
  • squeue
    • This shows information on the currently running and waiting jobs

Jobs

Batch systems handle “jobs” or  “batches”.  Usually a job contains a list of resources it needs and a list of commands which should be executed on this resources. For example, a typical Gromacs simulation needs 16 CPU cores, 32GB of memory and 2 hours of computation time. The command to start the simulation is “mdrun”.

Starting Interactive Jobs with SLURM

You can now request an interactive session from the batch system. For the SLURM batch system the command would be:

salloc –time=02:00:00 –ntasks=16 –mem=32768

SLURM will now try to allocate the resources you requested. Depending on the utilization of your HPC cluster, this can take anywhere between some seconds and days:

salloc: Pending job allocation 17295
salloc: job 17295 queued and waiting for resources

When the resources are available and allocated for you SLURM eventually responses with:

salloc: Granted job allocation 17295

You can now run the Gromacs simulation with

mdrun

When the simulation finished within the requested two hours, you can free the resources simply by typing

exit

salloc: Relinquishing job allocation 17295
salloc: Job allocation 17295 has been revoked.

If the simulation did not finish, SLURM will cancel your job shortly after the two hour deadline.

Starting Non-Interactive Jobs with SLURM

If you want to start multiple jobs or don’t want to wait until there are free resources for an interactive job write job scripts! A simple SLURM job script for the Gromacs simulation above would look like this:

#!/bin/bash
#SBATCH –job-name=GromacsSim1
#SBATCH –time=2:00:00
#SBATCH –ntasks=16
#SBATCH –mem=32768

cd gromacs/sim1/
mdrun

Safe it as “GromacsSim1.sbatch” and submit it to the batch system with

sbatch GromacsSim1.sbatch

To see the status of your job type

squeue

If the job is not listened, it finished. You should find log files of the job in the directory you submitted the job from.

Stopping SLURM Jobs

To cancel a job run

scancel <jobid>

you can also cancel all jobs of your user with

scancel -u <user name>

 

Running Applications on a Supercomputer

There are two types of supercomputers: The ones that have batch systems and the ones that don’t. Batch systems provide a way to start applications on nodes of the cluster that have free resources. Without a batch system you would have to manually search for a node which is not used by another user to start your application. You can of course run your application on a node which is occupied by another user but this will most likely result in a slower execution of your application and the one of the other user. In the best case both applications will run fine – in the worst case, one may crash because there is not enough memory in the system. To avoid this, batch systems exist. There are different batch systems out there so you might have to ask your system administrator which one is installed on your cluster. Or you can try some commands to see which work on your super computer.

Searching for Free Nodes on a Cluster Without a Batch System

To login to a node one has to know the name of the node. If you don’t know how the nodes are called, ask your administrator – or take a look in the /etc/hosts file:

cat /etc/hosts

This should return a list of names and their IP addresses (numbers). Ignore the numbers.

Now that you know the names of the nodes, try to see if you can connect and run commands on this nodes:

ssh <node name> uptime

If you can connect and run commands you should see the output of the “uptime” command. This command tells you how long the node is powered on and (more importantly) how high the “load” on the node is:

Linux Terminal Uptime Command

The last three numbers tell you how many processes were using a full CPU core for the last 1, 5 and 15 minutes respectively. The higher the number, the more processes are running on the node.

If you found a node with a low “load” login to this node running

ssh <nodename>

Here you can now start your application as described in Starting Applications on Linux – The Path Matters

Using a Batch System to Start Your Jobs on Nodes with Free Resources

As can be seen above, running applications on a cluster without a batch system is cumbersome and might result in overloaded nodes. If your cluster does not have a batch system, ask your administrator to install “Torque”. This is available for most Linux distributions free of charge and within their repositories.

All batch systems provide a command to show you the currently available resources and free nodes. To identify your batch system try the following commands:

  • If  “sinfo” returns  a list of nodes and their state, you have a SLURM batch system.
  • If  “qstat -n” returns  a list of nodes and their state, you have a PBS/Torque batch system.
  • If  “llclass” returns  a list of nodes and their state, you have a IBM LoadLeveler batch system.
  • If  “qhost” returns  a list of nodes and their state, you have a  GridScheduler batch system.

Use the link above to learn how to start applications on free nodes with the batch system installed on your cluster.

Starting an Application on the Command Line

Starting Applications on Linux – The Path Matters

To start an application on the Linux command line one has to know its name and its location. Important applications like editors or other tools can be found in the /usr/bin/ directory. To start the editor “nano” one can enter the following command on the command line:

/usr/bin/nano

PATH

However, the /usr/bin directory is also listed in the so called “PATH” environment variable. All applications in directories which are listed in the PATH variable can be run by simply typing their name. In case of the “nano” editor:

nano

To check what directories are currently listed in the PATH variable you can print its content with:

echo $PATH

If you want to start your own application which is not in a directory in the PATH variable, you always need to enter the path where the application can be found. If you are currently  within the directory of the application you have to type

./my-application

to start it. Otherwise the shell will search for “my-application” within all directories listened in the PATH variable.

You can add directories to the PATH variable by running the following command:

export PATH=$PATH:/path/to/app1/:/path/to/app2/

The shell will replace $PATH with the current content of the PATH variable and add whatever you write after it. The directories are separated by a colon.

Supercomputers with Modules

If you are lucky, your administrator installed Modules on your cluster and configured the application you want to run in it. If she did this correctly you only need to load the appropriate module and start your application by its name. The Modules tool adds the location to your PATH variable (and other needed stuff)

So for example if you want to run Gromacs you don’t even have to know where it is installed on your systems. Just search for the module in the output of

modules avail

and add the module via

module load gromacs/5.1.0

Now to  start Gromacs you only need to enter “gromacs”.

PuTTY Terminal

The Linux Command Line – the “Shell”

While there are ways to remotely work on a Supercomputer using a Graphical User Interface (GUI), in most cases the fastest and more convenient one is to use the “shell”. The shell is a program that takes commands from a keyboard and gives them to the operating system to execute. When you connect to an HPC with Putty, Putty lets you interact with the shell.

PuTTY Terminal

After entering a command, use the enter key of your keyboard to execute it. A very simple and yet very useful command is “ls”. Type “ls” and press enter to list the content of the directory you are currently in:

ls command in a terminal

With the shell, you can manage files, start, hold, resume and end applications and analyse the output of applications.

A very good introduction to the shell and working with Linux in general can be found on http://linuxcommand.org They also have a book for sale / download.

 

Top 10 Linux Command Line Tools You Should Know About

1. man

Type man before any command and you will get a manual page with a description of the command and its options. If this is too much detail, try whatis for a short description or use –help after the command as an option.

2. mc

If you don’t like typing all the commands for handling files and directories or editing files, use the Midnight Commander! You can start this program by typing mc.

You will see two columns showing directories. You can change to the other column by pressing the TAB key of your keyboard.

With the F* keys of your keyboard you can execute the respective commands in the lower row. So for example to copy a file you highlight the file or directory you want using the cursor keys or your mouse and press “F5”. You can also use your mouse and click on the “Copy” button in the lower row to do the same.

The midnight commander is able to display the contend and unzip / untar packed files like zip, tar, tar.gz and tar.bz2. You can handle this files like directories: Just highlight the file and press enter to see the content. Be careful with large files – in most cases, MC needs to unpack the files in a temporary location. For large files this might take a while and can fill all the free space in the temporary location (usually /tmp).

3 . wget

Need to download something from the Internet? Type wget http://www.example.com/file and the file will be downloaded.

3. screen

Screen is a very useful tool if you want to run an application in the background or without being actively logged in to the cluster. It is also a good idea to work within a screen over a bad internet connection – if the connection fails you can later resume your work were you left it.

Type screen to start a new virtual terminal. When you want to log off press Ctrl + A to get into the “command” mode and then press D to “detach” from the screen terminal. If you want to resume your work in the screen or check its status type screen -r to resume your screen session.

To exit, just type exit.

4. grep

Grep is a very powerful tool. It helps you to find anything within a text. This text can be within a file or the output of another command. To search for the word “hello” within all files in the current directory type grep “hello” *  If grep finds a file with “hello” in it, it will print the file-name and the full line the word was found in. You can also search recursively in all directories  with the -r option. To search in any directory in your home type: grep -r hello /home/nico/

As mentioned above, you can also search for something within the output of another command. This is done by “piping” the output of this command into grep with the special “|” char:  command | grep “hello”

For example you can search for a file name in the output of the ls command with ls -alh | grep “hello”  This will list all files with “hello” in the file name in the current directory.

6. time

The time command measure the runtime of the following command. To measure the runtime of your application type time ./myapp

7. date

Running date prints the current date in an human readable format on the terminal. One can format the output of the date command by adding special formatting options (which can be found on the man page). For example, the command date +%y-%m-%d_%H%M will return “15-08-30_1005” on August 8th, 2015 on 10:05am.

7. find

Find searches for files below the current directory or the one supplied to it. To find all files and directories with “hello” in their name one can either type find . -name hello or find . | grep hello Find can also search for certain types of files. To list only directories with type find . -name hello -type d

8. tail / head

The commands tail and head, print the last and first 10 lines of a file respectively. One can change the number of lines by adding the “-n” option followed by the number of lines to print.

9. awk

AWK is the Swiss-army-knife of automatically editing and analysing of structured text data. The command ls -alh | awk {‘print $3 “,” $6’} will only return the third and sixth column of the  ls -alh output separated by a comma. To use awk on a text file one can either use cat: cat myfile | awk {‘print $3 “,” $6’} or use redirectors:  awk < myfile ‘{ print $3 “,”  $6 }’

For a more detailed introduction see the simple awk tutorial by David A. Holland.

10. du

The du -h command shows the size of all files and directories in and below the current directory in a human readable format. This is especially interesting if you want to know the size of a whole directory. Type du -hs Downloads to only display the size of the “Downloads” directory without all subdirectories and files in it.

11. diff

The diff command shows the difference between two files line by line. Just run it with diff file1 file2 and it will show you the lines that differ between those files.

 

Modules – Selecting the Right Software for the Job

The Environment Modules tool helps to organize and select different pre-installed software versions and implementations. It is a very common tool on HPC clusters.

To check if modules is installed and what software is available through it on your super computer, type

module avail

This should return a list of software packages. As one can see in the picture below, there are multiple applications installed on TITAN. To run a simulation with Gromacs 5.1, load the  appropriate module with

module add gromacs/5.1.0
or
module load gromacs/5.1.0

As Gromacs requires an MPI implementation, one is loaded automatically. You can see a list of activated modules with

module list

To remove a module type

module rm gromacs/5.1.0

The modules tool sets your environment variables (PATH and LD_LIBRARY_PATH) such that you only need to type “gromacs” to start a Gromacs run.

 

Module Environment Usage

Optimized Mathematical Libraries

If your application does linear algebra or fast Fourier transform (FFT) calculations, it most likely is able to use highly optimized mathematical libraries. This libraries contain functions for common tasks like matrix and vector multiplications or computing the discrete Fourier transform (DFT). Popular libraries are the “BLAS” and “LAPACK” libraries or the “FFTW”.

There are different implementations and versions available. The ATLAS (Automatically Tuned Linear Algebra Software) library should be available on every Linux cluster since it is open source and, if installed correctly by your administrator, comparable to commercial implementations. The Intel MKL (Math Kernel Library) is a commercial library optimized by Intel for their processors. It also runs perfectly on AMD CPUs. While the ATLAS library only supports BLAS and LAPACK functions, the Intel MKL comes with so called wrappers. This wrappers allow the usage of the Intel MKL for DFT calculations as the wrappers are compatible to the FFTW.

The FFTW (Fastest Fourier Transform in the West) library provides functions for DFT calculations.

There are many more libraries available for different tasks. If you write your own applications – try to find a library which does what you want. It most likely safes you a lot of time implementing, testing and tuning their functions.

Check your available modules – your administrator most likely preinstalled one of the libraries on you cluster.

 

Compilers

Compilers transfer the human readable source code of applications into machinecode which is understood by the processor. While doing so, they also try to optimize the machine code so it runs faster on your machine.

If your open source application is not preinstalled on your cluster, you might need to compile it yourself. Most applications come with auto-detection scripts which detect what compilers and libraries are available on your cluster. The application is then compiled with the compilers found by this script.

There are many compilers available on the market – the most common found on clusters are the GCC, Intel Parallel Studio or PGI compilers. If you want to have a fast running application, you might want to try different compilers.

GCC

The GCC (GNU Compiler Collection) is available on all clusters running Linux. However, the default one might be outdatet. Check the version with “gcc -v” and compare it to the ones your administatror provides through the modules envrionment.

To compile a simple application by yourself you can run the “gcc”, “gfortran” or “g++” command for C, Fortran or C++ code respectively.

Intel Parallel Studio

Since many HPC clusters are equipped with Intel or AMD CPUs, the Intel Compilers may be available as well. Compared to the GCC, especially the Intel Fortran compiler “ifort” produces faster running applications in many cases. The Intel Parallel Studio also comes with the Intel Math Kernel Library (Intel MKL). This library provides highly optimized mathematical functions for linear algebra and FFT calculations.

PGI Compilers

PGI was recently bought by NVidia. Thus, they are now focusing on GPGPUs and OpenACC. However, their Fortran compilers also produced faster running applications compared to GCCs fortran in some cases.

Accelerators – Of GPGPUs and Knights

In 2005, application developers searched for ways to accelerate their application by harnessing the processing power of graphic cards. Graphic Processing Units (GPUs) have so called “shaders” which are used by video game developers to create 3D effects on objects like water waves or hair. As this shaders are programmable, scientific application developers searched for ways to use them for their purpose. The hardware vendors (AMD and Nvidia) noticed this trend and started to create interfaces for the application developers to help them. GPUs which can be used for General Purpose calculations are called “GPGPUs”.

Today, there are a lot of interfaces and programming models available. AMD started with OpenCL and NVidia introduced CUDA for their GPUs. As the developers have to completely rewrite their application code, Intel saw a chance to develop an accelerator which understands normal code, written for CPUs. Eventually, they introduced the Intel Xeon Phi “Knights Corner” in 2011 with up to 72 cores. Similar to GPUs, the Xeon Phi “Knights Corner” is an add on card which is plugged into a PCIe slot of a computer. While Intel supports also OpenCL, they focus on OpenMP

If your application supports CUDA, OpenCL or OpenMP, ask your administrator if there are nodes with GPUs or Xeon Phis available. Using this accelerators can greatly increase the performance of your application.

The plot below shoes a comparison of different GPUs and CPUs running a simulation of thousands of atoms with the GROMACS molecular dynamics simulation software. As one can see, the Nvidia GTX 580 GPGPU outperforms an Intel Core i7 by a factor of more then 5 in three of the tests.

GRPOMACS 4.5 Performance Comparison GPUs vs. CPUs

A GROMACS simulation ran on different CPUs and GPUs. Source: http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM