Installing Torque

Bellow follow instructions on how to install Torque in a multiprocessor Ubuntu Linux server. In this case the same machine is used as server, scheduler, submission node and compute node. These notes have been borrowed from this blog post (thanks!) and are kept here for future records only. The version of Ubuntu used in this case was 14.04 LTS.

The first thing to note is that you should do all of these as root. Then we must ensure that the first line in the /etc/hosts file reads as follows

127.0.0.1   localhost

Next comes the installation of some packages, which we do using Ubuntu`s package manager.

apt-get install torque-server torque-client torque-mom torque-pam

After this step we simply stop these services since, apparently, the initial torque configuration does not really work as one would hope. In order to achieve this we simply type the following in the terminal

/etc/init.d/torque-mom stop
/etc/init.d/torque-scheduler stop
/etc/init.d/torque-server stop

We then can create a new setup for torque using the following

pbs_server -t create

When prompted about whether we want to overwrite the existing database we will reply yes ([y]). Next the just-started server instance is killed using the following command for further configuration

killall pbs_server

Next we will set up the server process. In my case the server is simply called localhost and I experienced some problems when trying to use a different server domain.

echo localhost > /etc/torque/server_name
echo localhost > /var/spool/torque/server_priv/acl_svr/acl_hosts
echo root@localhost > /var/spool/torque/server_priv/acl_svr/operators
echo root@localhost > /var/spool/torque/server_priv/acl_svr/managers

The following step is to simply add the compute nodes. Since here we are using the "head node" as "compute node" then we just need to type the following

echo "localhost np=56" > /var/spool/torque/server_priv/nodes

Then we start the MOM process that handles the compute node

echo localhost > /var/spool/torque/mom_priv/config

After all of these one has to restart the processes again

/etc/init.d/torque-server start
/etc/init.d/torque-scheduler start
/etc/init.d/torque-mom start

Finally we need to restart the scheduler, create the default queue and configure thee server to allow submissions from itself

qmgr -c "set server scheduling = true"
qmgr -c "set server keep_completed = 300"
qmgr -c "set server mom_job_sync = true"
# create default queue
qmgr -c "create queue batch"
qmgr -c "set queue batch queue_type = execution"
qmgr -c "set queue batch started = true"
qmgr -c "set queue batch enabled = true"
qmgr -c "set queue batch resources_default.walltime = 1:00:00"
qmgr -c "set queue batch resources_default.nodes = 1"
qmgr -c "set server default_queue = batch"
# configure submission pool
qmgr -c "set server submit_hosts = localhost"
qmgr -c "set server allow_node_submit = true"

Finally you can test whether everything is working right for you using the following command

qsub -I

An additional test script that can be done is to run this simples PBS script test.sh

#!/bin/bash
cd $PBS_O_WORKDIR
#direct the output to cluster_nodes
cat $PBS_NODEFILE > ./cluster_nodes

This should run by simply writing the following command on your terminal

qsub test.sh

Example PBS script

This is just an example PBS script for submitting jobs in the Archer supercomputing facility.

#!/bin/bash --login
#PBS -N jobname

# Select 1  node
#PBS -l select=1

#PBS -l walltime=24:00:00
#PBS -m abe
#PBS -M name@emailprovider.org

# Replace this with your budget code
#PBS -A budget

# Move to directory that script was submitted from
#export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)
#echo $PBS_O_WORKDIR
#exit
#cd $PBS_O_WORKDIR
cd "/work/directory/"

# Load the GROMACS module
module add gromacs

# Run GROMACS using default input and output file names
pull=1
k=100

options="-s ${file} -deffnm ${file}"

aprun -n 24 mdrun_mpi $options