Now that you are connected to the head node, familiarize yourself with the cluster structure by running the following set of commands.
SLURM from SchedMD is one of the batch schedulers that you can use in AWS ParallelCluster. For an overview of the SLURM commands, see the SLURM Quick Start User Guide.
sinfo
shows both the instances we currently have running and those that are not running (think of this as a queue limit). Initially we’ll see all the node in state idle~
, this means no instances are running. When we submit a job we’ll see some instances go into state alloc
meaning they’re currently completely allocated, or mix
meaning some but not all cores are allocated. After the job completes the instance stays around for a few minutes (default cooldown is 10 mins) in state idle%
. This can be confusing, so we’ve tried to simplify it in the below table:State | Description |
---|---|
idle~ |
Instance is not running but can launch when a job is submitted. |
idle% |
Instance is running and will shut down after ScaledownIdletime (default 10 mins). |
mix |
Instance is partially allocated. |
alloc |
Instance is completely allocated. |
sinfo
squeue
Environment Modules are a fairly standard tool in HPC that is used to dynamically change your environment variables (PATH
, LD_LIBRARY_PATH
, etc).
intelmpi
and openmpi
pre-installed. These MPI versions are compiled with support for the high-speed interconnect EFA.module av
module load intelmpi
mpirun -V
showmount -e localhost
df -h
You’ll see a line like:
172.31.21.202@tcp:/zm5lzbmv 1.1T 1.2G 1.1T 1% /shared
This is a 1.2 TB filesystem, mounted at /shared
that’s 1% used.
In the next section we’ll install Spack on this shared filesystem!