Compara [Meyer Lab's high-performance cluster]
1 What is compara ?

Compara is a computer cluster for research in genomics, hence its name. The cluster:

  • Overview:
    • normal nodes: 16 x 72 cores / 386G / 440G /scratch ; 60G /tmp
    • big nodes: 2 x 80 cores / 1.5 TB / 1.3TB /scratch ; 390GB /tmp
  • runs Linux as operating system
  • employs Torque to submit and schedule jobs
  • 2 How do I get started ?

    Before you can actually submit any jobs to compara, you first need to check a few things.

  • make sure you have an MDC username/login and also that you are a member of the Meyer Lab
  • You can access the cluster directly from any desktop PCs on Campus-LAN and WiFi (mdc-intern,mdcguest,eduroam). Just SSH (Secure SHell) connect to the login node meyerc-login01 with your username and password.
  • On UNIX-like system (Linux, MacOS), type: ssh username@meyerc-login01.mdc-berlin.net (with MDC-password)
  • on Windows system, you need to download and use an SSH tool, such as PuTTY or MobaXterm
  • 3 How do I submit jobs ?

    The main idea behind having a cluster and a job queuing system is that jobs are distributed onto the nodes of the cluster, making it more efficient than interactive jobs which get manually submitted on individual machines. This is the main reason why interactive jobs are not allowed.

    In order to submit jobs, you need to first login to compara, the head node of compara, for example using ssh -X -l myself@meyerc-login01 by replacing myself with your own login id. On compara, you may edit files, for example your job submission scripts, but other than that interactive jobs are not allowed and will be killed.

    For submitting a job, there are two options:

  • option 1 (preferred): writing a script and submitting this with qsub using:
    qsub my_script.sh
  • option 2: submitting your job which requires my_command_line directly with qsub using:
    qsub [qsub_options] my_command_line
  • Always submit a test job before submitting any numbers of real jobs.
  • Option 1 has many advantages. It allows you to move all the qsub_options which would otherwise have be typed directly onto the command line to the script. You can thus re-run jobs easily and also have a full record of how each jobs was submitted.

    4 Basic commands & qsub ?

    Our GridEngine offers the following basic commands, tools and activities to accomplish common user tasks in the cluster:

    Task Command
    submit jobs qsub, qresub, qrsh, qlogin, qsh, qmake, qtcsh
    check job status qstat
    modify jobs qalter, qhold, qrls
    delete jobs qdel
    check job accounting after job end qacct
    check cluster messages after job fails qmesg
    display cluster state qstat, qhost, qselect, qquota
    display node state qwho, qhost
    display cluster configuration qconf

    qsub is the command used to submit jobs to the cluster. Here is simple example to make sure things are working:
    echo date | qsub

    The qsub command has a number of options which you should use to specify how your jobs should be run. The manual page for qsub give you a full list of commands. The most commons qsub_options ones are:

    -d path
    Defines the working directory path to be used for the job executable. Make sure you specify the full path.
    -o filename
    Defines the path to be used for the standard output stream of the job. Make sure you specify the full path.
    -e filename
    Defines the path to be used for the standard error stream of the job. Make sure you specify the full path.
    -q queuename
    Sends the job to the specified queue.
    -l resources_list
    Specifies the resources to be allocated to this job. Example: mem and cput, see all options and examples at man pbs_resources.
    -t [int]-[int]
    Starts a job array. Sends multiple copies of the job to the cluster, each with a different task id in the range [int]-[int]. This is a good way of running the same job many times (i.e. with several different parameter settings). In the submitted script, the task id can be accessed through the environment variable PBS_ARRAYID, so the script can check its own task id to run job with the appropriate settings.

    When using a script my_script.sh to submit jobs, all of the above qsub options can be moved to the script by placing lines starting with #PBS [option] after the shebang line. Example:
    #!/bin/bash
    #PBS -d /ubc/cs/research/my_directory/
    #PBS -q slow
    #PBS -l mem=10G,cput=60
    my_command_line

    5 Torque/PBS setup & storage?

    First, it is a good idea to always specify a job queue. This helps schedule more jobs more efficiently and ensures every job gets the correct time and memory resources. There are two sets of queues, those with default priority and those with high priority. Unless you have a compelling reason to use the high priority queues, you should use the default ones. Please check with designated IT people first, if you want to use the high priority queues.

  • /data/meyer is the share stored on fs02, common with /data/meyer on your workstations and max-cluster and available (via nfs) on all meyerc* nodes. knowing already the history of this share , we'd recommend to avoid running any computations implying it (try to limit just to copy data from and to it)
  • /data/basecalls is mounted over nfs from fs02 acdn can be used only read only
  • /data/meyerc is a filesystem stored on meyerc-fs01 and it is available only on meyerc* nodes. as it was designed just for storing data, we'd recommended to avoid using it in computations, except maybe just to read from it and write the output on /scratch
  • /home/{username} is shared across all meyerc* nodes from meyerc-fs01, available only inside your cluster and it is not the same /home as on max-cluster.
  • /scratch is a local SSD and there is the place where all computations I/Os should go; before starting using it, please be aware of it's size and also that it is not shared across multiple nodes
  • Resources' information:

  • Create and define queue default
    • set queue default queue_type = Execution
    • set queue default resources_max.mem = 382gb
    • set queue default resources_max.nodes = 16
    • set queue default resources_max.walltime = 24:00:00
    • set queue default resources_max.nodect = 16
    • set queue default resources_default.nodes = 1
    • set queue default resources_default.walltime = 24:00:00
    • set queue default resources_default.ncpus = 1
    • set queue default resources_default.mem = 8gb
    • set queue default resources_default.neednodes = small
    • set queue default resources_default.nodect = 1
    • set queue default acl_group_enable = True
    • set queue default acl_groups = AG_Meyer
    • set queue default acl_groups += bimsb_itsupport
    • set queue default acl_group_sloppy = True
    • set queue default enabled = True
    • set queue default started = True
  • Create and define queue bigmem
    • set queue bigmem queue_type = Execution
    • set queue bigmem resources_max.mem = 1530gb
    • set queue bigmem resources_max.nodect = 2
    • set queue bigmem resources_max.walltime = 96:00:00
    • set queue bigmem resources_default.nodect = 1
    • set queue bigmem resources_default.walltime = 96:00:00
    • set queue bigmem resources_default.ncpus = 1
    • set queue bigmem resources_default.mem = 16gb
    • set queue bigmem resources_default.neednodes = big
    • set queue bigmem acl_group_enable = True
    • set queue bigmem acl_groups = AG_Meyer
    • set queue bigmem acl_groups += bimsb_itsupport
    • set queue bigmem acl_group_sloppy = True
    • set queue bigmem enabled = True
    • set queue bigmem started = True
  • Create and define queue longtime
    • set queue longtime queue_type = Execution
    • set queue longtime resources_max.mem = 382gb
    • set queue longtime resources_max.nodect = 16
    • set queue longtime resources_max.walltime = 9999:00:00
    • set queue longtime resources_default.nodect = 1
    • set queue longtime resources_default.walltime = 336:00:00
    • set queue longtime resources_default.ncpus = 1
    • set queue longtime resources_default.mem = 8gb
    • set queue longtime resources_default.neednodes = small
    • set queue longtime acl_group_enable = True
    • set queue longtime acl_groups = AG_Meyer
    • set queue longtime acl_groups += bimsb_itsupport
    • set queue longtime acl_group_sloppy = True
    • set queue longtime disallowed_types = interactive
    • set queue longtime enabled = True
    • set queue longtime started = True
  • Create and define queue longbigmem
    • set queue longbigmem queue_type = Execution
    • set queue longbigmem resources_max.mem = 1530gb
    • set queue longbigmem resources_max.nodect = 2
    • set queue longbigmem resources_max.walltime = 9999:00:00
    • set queue longbigmem resources_default.nodect = 1
    • set queue longbigmem resources_default.walltime = 336:00:00
    • set queue longbigmem resources_default.ncpus = 1
    • set queue longbigmem resources_default.mem = 16gb
    • set queue longbigmem resources_default.neednodes = big
    • set queue longbigmem acl_group_enable = True
    • set queue longbigmem acl_groups = AG_Meyer
    • set queue longbigmem acl_groups += bimsb_itsupport
    • set queue longbigmem acl_group_sloppy = True
    • set queue longbigmem disallowed_types = interactive
    • set queue longbigmem enabled = True
    • set queue longbigmem started = True
  • 6 What are my jobs doing ?

    You can use qstat to check the status of your jobs. You can also use the -m flag of qsub to get an email when your jobs is finished. By default, only active jobs will show, i.e. those that are still running. Check the manual page of qstat for how to get more detailed information on your jobs. Some useful flags are -q to see the global status of each queue, -f for details of each job, and -n for node information of each job.

    If you realize you want to stop one of your jobs, you should first get the job's ID using qstat. You can then kill the job using qdel job_id. Alternatively, qdel all will kill all of your own currently running jobs. Have a look at the manual page of qdel for more details.

    7 Shared software: GUIX
  • a shared guix is available with the management on meyerc-guix server. If the personal profiles can be managed from any node, the shared profiles can be adjusted only from meyerc-guix server and only by the members of compara/meyerc_admins.
  • a shared profile can be created like this
  • the custom guix-bimsb repository is not enabled by default, but you can use it by running git clone https://github.com/BIMSBbioinfo/guix-bimsb.git) inside folder /gnu/custom_repos/ and then use GUIX_PACKAGE_PATH= /gnu/custom_repos/.../ variable (see http://guix.mdc-berlin.de/documentation.html#sec-7-2)
  • if a software is not available via guix or it is a precompiled binary , you can install it (drop it) on the shared location /usr/loca/shared/ which, again, can be done only from meyerc-guix server and only by the members of meyerc_admins and compara admins
  • /usr/loca/shared/bin/ is already available in default $PATH for all users on all computing nodes
  • /usr/loca/shared/bin/ is already available in default $PATH for all users on all computing nodes
  • 8 Houston, I think we have a problem ...

    Things can go wrong for a number of reasons. Here is more information on whom to contact when.

  • If you have problems making sense of any or some of the above, read the corresponding manual page provided by the HPC team HPC User Guide.
  • Please note that we rely on all users of compara to not abuse the system. If you created a mess which could have been easily avoided using test jobs and an appropriate combination of qsub flags, you risk being removed from the list of compara users.
  • If you have suggestions on how to generally improve the performance of compara, or you encountered some other issue, please email Dan.Munteanu@mdc-berlin.de cc-ing Irmtraud
  • Managed by

    Meyer Lab at BIMSB.

    Updated on 29/10/2020