In order to submit jobs, you need to first login to compara, the head node of compara, for example using ssh -X -l myself@meyerc-login01 by replacing myself with your own login id. On compara, you may edit files, for example your job submission scripts, but other than that interactive jobs are not allowed and will be killed.
qsub is the command used to submit jobs to the cluster. Here is simple example to make sure things are working:
echo date | qsub
The qsub command has a number of options which you should use to specify how your jobs should be run. The manual page
for qsub give you a full list of commands. The most commons qsub_options ones are:
- -d path
- Defines the working directory path to be used for the job executable. Make sure you specify the full path.
-
- -o filename
- Defines the path to be used for the standard output stream of the job. Make sure you specify the full path.
- -e filename
- Defines the path to be used for the standard error stream of the job. Make sure you specify the full path.
- -q queuename
- Sends the job to the specified queue.
- -l resources_list
- Specifies the resources to be allocated to this job. Example: mem and cput, see all options and examples at man pbs_resources.
- -t [int]-[int]
- Starts a job array. Sends multiple copies of the job to the cluster, each with a different task id in the range [int]-[int]. This is a good way of running the same job many times (i.e. with several different parameter settings). In the submitted script, the task id can be accessed through the environment variable PBS_ARRAYID, so the script can check its own task id to run job with the appropriate settings.
When using a script my_script.sh to submit jobs, all of the above qsub options can be moved to the script by placing lines starting with #PBS [option] after the
shebang line. Example:
#!/bin/bash
#PBS -d /ubc/cs/research/my_directory/
#PBS -q slow
#PBS -l mem=10G,cput=60
my_command_line
5 Torque/PBS setup & storage?
First, it is a good idea to always specify a job queue. This helps schedule more jobs more efficiently and ensures every job gets the correct time and memory
resources. There are two sets of
queues, those with default priority and those
with high priority. Unless you have a compelling reason to use the high priority queues, you should use the default ones. Please check with designated IT people
first, if you want to use the high priority queues.
|
/data/meyer is the share stored on fs02, common with /data/meyer on your workstations and max-cluster and available (via nfs) on all meyerc* nodes. knowing already the history of this share , we'd recommend to avoid running any computations implying it (try to limit just to copy data from and to it)
/data/basecalls is mounted over nfs from fs02 acdn can be used only read only
/data/meyerc is a filesystem stored on meyerc-fs01 and it is available only on meyerc* nodes. as it was designed just for storing data, we'd recommended to avoid using it in computations, except maybe just to read from it and write the output on /scratch
/home/{username} is shared across all meyerc* nodes from meyerc-fs01, available only inside your cluster and it is not the same /home as on max-cluster.
/scratch is a local SSD and there is the place where all computations I/Os should go; before starting using it, please be aware of it's size and also that it is not shared across multiple nodes
|
Resources' information:
|
Create and define queue default
- set queue default queue_type = Execution
- set queue default resources_max.mem = 382gb
- set queue default resources_max.nodes = 16
- set queue default resources_max.walltime = 24:00:00
- set queue default resources_max.nodect = 16
- set queue default resources_default.nodes = 1
- set queue default resources_default.walltime = 24:00:00
- set queue default resources_default.ncpus = 1
- set queue default resources_default.mem = 8gb
- set queue default resources_default.neednodes = small
- set queue default resources_default.nodect = 1
- set queue default acl_group_enable = True
- set queue default acl_groups = AG_Meyer
- set queue default acl_groups += bimsb_itsupport
- set queue default acl_group_sloppy = True
- set queue default enabled = True
- set queue default started = True
Create and define queue bigmem
- set queue bigmem queue_type = Execution
- set queue bigmem resources_max.mem = 1530gb
- set queue bigmem resources_max.nodect = 2
- set queue bigmem resources_max.walltime = 96:00:00
- set queue bigmem resources_default.nodect = 1
- set queue bigmem resources_default.walltime = 96:00:00
- set queue bigmem resources_default.ncpus = 1
- set queue bigmem resources_default.mem = 16gb
- set queue bigmem resources_default.neednodes = big
- set queue bigmem acl_group_enable = True
- set queue bigmem acl_groups = AG_Meyer
- set queue bigmem acl_groups += bimsb_itsupport
- set queue bigmem acl_group_sloppy = True
- set queue bigmem enabled = True
- set queue bigmem started = True
Create and define queue longtime
- set queue longtime queue_type = Execution
- set queue longtime resources_max.mem = 382gb
- set queue longtime resources_max.nodect = 16
- set queue longtime resources_max.walltime = 9999:00:00
- set queue longtime resources_default.nodect = 1
- set queue longtime resources_default.walltime = 336:00:00
- set queue longtime resources_default.ncpus = 1
- set queue longtime resources_default.mem = 8gb
- set queue longtime resources_default.neednodes = small
- set queue longtime acl_group_enable = True
- set queue longtime acl_groups = AG_Meyer
- set queue longtime acl_groups += bimsb_itsupport
- set queue longtime acl_group_sloppy = True
- set queue longtime disallowed_types = interactive
- set queue longtime enabled = True
- set queue longtime started = True
Create and define queue longbigmem
- set queue longbigmem queue_type = Execution
- set queue longbigmem resources_max.mem = 1530gb
- set queue longbigmem resources_max.nodect = 2
- set queue longbigmem resources_max.walltime = 9999:00:00
- set queue longbigmem resources_default.nodect = 1
- set queue longbigmem resources_default.walltime = 336:00:00
- set queue longbigmem resources_default.ncpus = 1
- set queue longbigmem resources_default.mem = 16gb
- set queue longbigmem resources_default.neednodes = big
- set queue longbigmem acl_group_enable = True
- set queue longbigmem acl_groups = AG_Meyer
- set queue longbigmem acl_groups += bimsb_itsupport
- set queue longbigmem acl_group_sloppy = True
- set queue longbigmem disallowed_types = interactive
- set queue longbigmem enabled = True
- set queue longbigmem started = True
|
6 What are my jobs doing ?
You can use qstat to check the status of your jobs. You can also use the -m flag of qsub to get an email when your jobs is finished. By default, only active jobs will show, i.e. those that are still running. Check the manual page
of qstat for how to get more detailed information on your jobs. Some useful flags are -q to see the global status of each queue, -f for details of each job, and -n for node information of each job.
If you realize you want to stop one of your jobs, you should first get the job's ID using qstat. You
can then kill the job using qdel job_id. Alternatively, qdel all will kill all of your own currently running jobs. Have a look at the manual page of qdel for more details.
7 Shared software: GUIX
|
a shared guix is available with the management on meyerc-guix server. If the personal profiles can be managed from any node, the shared profiles can be adjusted only from meyerc-guix server and only by the members of compara/meyerc_admins.
a shared profile can be created like this
the custom guix-bimsb repository is not enabled by default, but you can use it by running git clone https://github.com/BIMSBbioinfo/guix-bimsb.git) inside folder /gnu/custom_repos/ and then use GUIX_PACKAGE_PATH= /gnu/custom_repos/.../ variable (see http://guix.mdc-berlin.de/documentation.html#sec-7-2)
if a software is not available via guix or it is a precompiled binary , you can install it (drop it) on the shared location /usr/loca/shared/ which, again, can be done only from meyerc-guix server and only by the members of meyerc_admins and compara admins
/usr/loca/shared/bin/ is already available in default $PATH for all users on all computing nodes
/usr/loca/shared/bin/ is already available in default $PATH for all users on all computing nodes
|
8 Houston, I think we have a problem ...
Things can go wrong for a number of reasons. Here is more information on whom to contact when.
|
If you have problems making sense of any or some of the above, read the corresponding manual page provided by the HPC team HPC User Guide.
Please note that we rely on all users of compara to not abuse the system. If you created a mess which could have been easily avoided using test jobs and an appropriate combination of qsub flags, you risk being removed from the list of compara users.
If you have suggestions on how to generally improve the performance of compara, or you encountered some other issue, please email Dan.Munteanu@mdc-berlin.de cc-ing Irmtraud
|
Managed by
Meyer Lab at BIMSB.
Updated on 29/10/2020