HUB Workshops Header
HUB (Hands-on Understanding of Bioinformatics) Workshop, Series 1

High-performance computing for bioinformatics

The Bioinformatics Hub is pleased to present the first HUB Workshop Series, focused on practical Linux and high-performance computing (HPC) skills for bioinformatics research. This workshop is designed to help participants build foundational command-line proficiency and develop the core competencies needed to use ARC (Advanced Research Computing)—the HPC environment at the University of Calgary.

This hands-on session covers Linux essentials—navigation, permissions, and file management—tailored for shared clusters. Instead of memorizing commands, you'll build a mental model of system logic, giving you the confidence to navigate and troubleshoot any HPC environment.

HUB Workshop Series I

Workshop information

The workshop will be delivered as structured, hands-on training in Linux and HPC for bioinformatics research. Topics include navigation and file management, permission structures, terminal-based text editing, and essential command-line workflows. Participants will also be introduced to HPC fundamentals, including resource allocation and job scheduling, with practical use of Slurm for job submission and monitoring. The overall goal is to help participants operate confidently in a Linux-based HPC environment to support reproducible and scalable bioinformatics analyses at the University of Calgary.


Venue


Registration

  • Priority registration will be offered to Bioinformatics Hub Showcase attendees.
  • On a first-come, first-served basis.
  • Limit to 10 participants to maximize individual engagement and deliverables.

Prerequisites

This workshop is designed for beginners, and no prior experience with Linux is required. You do not need an ARC account, and we are unable to offer a workshop on ARC. We will provide a dedicated demo server to simulate both administrative and user perspectives. This ensures all participants explore Linux in a consistent environment, featuring shared folders specifically configured for the workshop.

Please bring your own laptop. The workshop will be demonstrated on macOS- and Linux-based systems. Participants using Windows are welcome and will be provided with instructions for connecting to the cluster.


Materials

All workshop materials, including practice scripts, step-by-step command-line exercises, and slides, are being shared electronically. The slides include guided hands-on exercises designed to reinforce core Linux skills and Slurm-based job management, and can be found below.

To ensure participants can focus on core concepts rather than the complexities of software installation, the Bioinformatics Hub provides a fully pre-configured workshop environment. We have engineered a dedicated demo cluster that mirrors institutional architecture, featuring a comprehensive suite of bioinformatics tools deployed via containerization. These tools are already integrated into the system environment, allowing for a seamless experience where software dependencies are managed automatically in the background.


Workshop schedule

09:00 - 09:30
Pre-workshop. Login support for the demo server

Optional pre-workshop support session to help participants connect to the demo server and confirm that their laptop is ready for the hands-on training.

09:30 - 10:30
Session 1. Linux Foundations for HPC
  • Introduces foundational Linux skills for working in a shared bioinformatics computing environment.
  • Covers filesystem navigation, paths, file and directory management, and safe command-line practices.
  • Emphasizes practical understanding and confidence over memorization.
Hands-on exercise
  • Connect to the demo server: ssh
  • Navigate directories using standard Linux commands: pwd, cd, ls
  • Create and organize a mock bioinformatics project workspace: mkdir, cp, mv, rm
  • Inspect file contents and practice basic file operations: cat, head, tail, more, less
10:45 - 12:00
Session 2. Understanding HPC Architecture & Permissions
  • Introduces the core structure of HPC environments, including login nodes, compute nodes, shared storage, and shared research directories.
  • Explains how permissions, ownership, and group structure support secure and collaborative work in shared systems.
  • Covers key data management practices, including project organization, README.txt documentation, shared-directory permissions, symbolic links, and backup principles.
Hands-on exercise
  • Explore a shared project directory with incorrect permissions and ownership settings: ls -l
  • Inspect user and group information for collaborative access from /etc/group, /etc/passwd
  • Correct permissions so collaborators can work safely and efficiently in the same environment: chmod, chgrp, chown
  • Create a simple README.txt and organize a project folder using the recommended shared-storage structure: vi, >, >>
12:00 - 13:00
Lunch Break

A complimentary lunch will be provided for all participants.

13:00 - 14:30
Session 3. Using HPC Effectively with Slurm
  • Introduces the fundamentals of job scheduling in HPC systems.
  • Covers resource requests, job submission, monitoring, and troubleshooting with Slurm.
  • Helps participants run bioinformatics analyses as scalable scheduled workflows.
Hands-on exercise
  • Examine a sample Slurm job script, including the #SBATCH header lines
  • Revise the script to improve or correct the submission workflow: vi
  • Submit and monitor a job on the demo cluster: sbatch, squeue, sacct
  • Inspect output and error logs: cat, less
14:45 - 15:30
Capstone exercise

Create a structured project directory on ARC (if you have an account) or on a demo server, then run a Slurm job that calculates the factorials of 1 through 20 using R, Python, or a provided executable. Save the results to an output file, verify that the job completed successfully, and then copy the final output to a shared folder with the correct ownership and permissions for group use.

15:30 - 16:00
Q&A

Final discussion, questions, and workshop wrap-up.


Objectives

This workshop is designed to build both a conceptual mental model and practical proficiency in Linux-based high-performance computing. By the end of this workshop, participants will be able to:

  1. Navigate the Linux filesystem confidently using standard command-line tools.
  2. Manage data safely by creating, moving, renaming, and removing files and directories.
  3. Interpret and modify permissions and group ownership to support collaborative research.
  4. Understand the structural components of an HPC cluster, including login nodes, compute nodes, and shared storage.
  5. Explain the role of a job scheduler and how resources are allocated in a shared environment.
  6. Submit and monitor jobs using the Slurm workload manager on a dedicated workshop cluster.
  7. Understand the role and responsibilities of HPC administrators and know what support they can provide.

FAQ

While the workshop runs on a dedicated demo server, we have meticulously configured it to mirror the ARC environment at the University of Calgary. Our goal is to ensure a seamless transition of skills, from filesystem architecture to Slurm command syntax. By using this mirrored setup, we can explore administrative and user-level logic in a controlled space without the typical restrictions of a production cluster.

Please be aware that ARC administrators generally restrict the creation of multiple temporary IDs on ARC. For educational purposes, similar requests are typically redirected to the Teaching and Learning Cluster (TALC). If you encounter specific real-world configuration issues or unique errors during your research, we recommend consulting AI collaborators like Gemini or ChatGPT for rapid troubleshooting and script refinement.

Using our dedicated demo cluster allows us to explore "behind-the-scenes" administrative logic—such as how groups, setgid bits, and Slurm partitions are configured—which is typically hidden from users on TALC or ARC. This provides a deeper understanding of how the system serves you as a researcher.

No. The workshop demo server is a standalone environment designed specifically for training. However, the directory structures and job scripts you develop during the session will be fully portable. We will provide a way for you to export your files and work before the session ends so that you can replicate the same workflow later on ARC.

You only need a standard SSH client. macOS and Linux users can use the built-in Terminal. Windows users can use PowerShell or Command Prompt (standard in Windows 10/11). To ensure a smooth start for all participants, we will host a pre-workshop session from 9:00 AM to 9:30 AM. Please arrive early during this window so we can provide a quick connection guide and confirm that everyone is successfully logged into the demo server before the main program starts.

To ensure consistent performance for all participants, we ask that you use only the provided datasets during the hands-on exercises. Please note that our dedicated demo server is a Mac Mini; while it is meticulously configured to simulate HPC logic and mental models, it has limited computational capacity compared to the ARC production environment and is not intended for large-scale research analysis.

This workshop has been developed and will be delivered by a single instructor from the Bioinformatics Hub. To ensure participants receive meaningful hands-on support, individualized troubleshooting, and opportunities for deeper discussion, we are intentionally keeping the group small.

In addition, the workshop relies on a dedicated demo server that has been carefully configured to mirror the ARC environment. Because we are unable to use ARC directly for training, the demo infrastructure has practical resource limits. Participants will connect through a wireless router attached to the demo server, and limiting the workshop to 10 participants helps ensure stable performance and a smooth, responsive experience throughout the session.

For groups of six or more, the Bioinformatics Hub is also available to offer this workshop as a separate group training session for a fee.

Transparency in data analysis is a core value of the Bioinformatics Hub. For that reason, we make our workshop code and documentation openly available whenever possible.

You can find the slide deck in the Materials section. It includes the guidelines, instructions, and code used in the workshop. The instructions are optimized for the demo server used during the session, but most steps should also be reproducible on ARC. During the workshop, the instructors may also share additional practical tips and context.

Thank you very much for your interest and support.



Useful readings