Mount hook
The source code of Sarus includes a hook able to perform an arbitrary sequence of bind mounts and device mounts (including device whitelisting in the related cgroup) into a container.
When activated, the hook enters the mount namespace of the container and
performs the mounts it received as CLI arguments. The formats for such
arguments are the same as those for the --mount
and --device
options of
sarus run
. This design choice has several advantages:
Reuses established formats adopted by popular container tools.
Explicitly states mount details and provides increased clarity compared to lists separated by colons or semicolons (for example, path lists used by other hooks).
Reduces the effort to go from experimentation to consolidation: the mounts for a feature can be explored and prototyped on the Sarus command line, then transferred directly into a Mount hook configuration file.
In effect, the hook produces the same outcome as entering its --mount
and
--device
option arguments in the command line of Sarus
(or other engines with a similar CLI, like Docker and Podman).
However, the hook provides a way to group together the definition and execution of mounts related to a specific feature. By doing so, the feature complexity is abstracted from the user and feature activation becomes either comfortable (e.g. via a single CLI option) or completely transparent (e.g. if the hook either is always active or if it relies on an OCI annotation from the container image). Some example use cases are described in this section.
Note
Compared to the MPI or Glibc hooks, the Mount hook does not check ABI or version compatibility of mounted resources, and it does not deduce on its own the mount destination paths within the container, since its purpose is not strictly tied to replacing library stacks.
Hook installation
The hook is written in C++ and it will be compiled when building Sarus without
the need of additional dependencies. Sarus' installation scripts will also
automatically install the hook in the $CMAKE_INSTALL_PREFIX/bin
directory.
In short, no specific action is required to install the MPI hook.
Hook configuration
The program is meant to be run as a createContainer hook and accepts option arguments with the same formats as the --mount or --device options of sarus run.
The hook also supports the following environment variables:
LDCONFIG_PATH
(optional): Absolute path to a trustedldconfig
program on the host. If set, the program at the path is used to update the container's dynamic linker cache after performing the mounts.
The following is an example of OCI hook JSON configuration file enabling the MPI hook:
{
"version": "1.0.0",
"hook": {
"path": "/opt/sarus/bin/mount_hook",
"args": ["mount_hook",
"--mount=type=bind,src=/usr/lib/libexample.so.1,dst=/usr/local/lib/libexample.so.1",
"--mount=type=bind,src=/etc/configs,dst=/etc/host_configs,readonly",
"--device=/dev/example:rw"
],
"env": [
"LDCONFIG_PATH=/sbin/ldconfig"
]
},
"when": {
"always": true
},
"stages": ["createContainer"]
}
Example use cases
Libfabric provider injection
Libfabric is a communication framework which can be used as a middleware to abstract network hardware from an MPI implementation. Access to different fabrics is enabled through dedicated software components, which are called libfabric providers.
Fabric provider injection [1] consists in bind mounting a dynamically-linked provider and its dependencies into a container, so that containerized applications can access a high-performance fabric which is not supported in the original container image. For a formal introduction, evaluation, and discussion about the advantages of this approach, please refer to the reference publication.
To facilitate the implementation of fabric provider injection, the Mount hook
supports the <FI_PROVIDER_PATH>
wildcard (angle brackets included)
in --mount
arguments.
FI_PROVIDER_PATH
is an environment variable recognized by libfabric itself,
which can be used to control the path where libfabric searches for external,
dynamically-linked providers.
The wildcard is recognized by the hook during the acquisition of CLI arguments,
and is substituted with a path obtained through the following conditions:
If the
FI_PROVIDER_PATH
environment variable exists within the container, its value is taken.If
FI_PROVIDER_PATH
is unset or empty in the container's environment, and theLDCONFIG_PATH
variable is configured for the hook, then the hook searches for a libfabric library in the container's dynamic linker cache, and obtains its installation path. The wildcard value is then set to"libfabric library install path"/libfabric
, which is the default search path used by libfabric. For example, if libfabric is located at/usr/lib64/libfabric.so.1
, the wildcard value will be/usr/lib64/libfabric
.If it's not possible to determine a value with the previous methods, the wildcard value is set to
/usr/lib
.
The following is an example of hook configuration file using the wildcard to perform the injection of the GNI provider, enabling access to the Cray Aries high-speed interconnect on a Cray XC50 supercomputer:
{
"version": "1.0.0",
"hook": {
"path": "/opt/sarus/default/bin/mount_hook",
"args": ["mount_hook",
"--mount=type=bind,src=/usr/local/libfabric/1.18.0/lib/libfabric/libgnix-fi.so,dst=<FI_PROVIDER_PATH>/libgnix-fi.so",
"--mount=type=bind,src=/opt/cray/xpmem/default/lib64/libxpmem.so.0,dst=/usr/lib/libxpmem.so.0",
"--mount=type=bind,src=/opt/cray/ugni/default/lib64/libugni.so.0,dst=/usr/lib64/libugni.so.0",
"--mount=type=bind,src=/opt/cray/udreg/default/lib64/libudreg.so.0,dst=/usr/lib64/libudreg.so.0",
"--mount=type=bind,src=/opt/cray/alps/default/lib64/libalpsutil.so.0,dst=/usr/lib64/libalpsutil.so.0",
"--mount=type=bind,src=/opt/cray/alps/default/lib64/libalpslli.so.0,dst=/usr/lib64/libalpslli.so.0",
"--mount=type=bind,src=/opt/cray/wlm_detect/default/lib64/libwlm_detect.so.0,dst=/usr/lib64/libwlm_detect.so.0",
"--mount=type=bind,src=/var/opt/cray/alps,dst=/var/opt/cray/alps",
"--mount=type=bind,src=/etc/opt/cray/wlm_detect,dst=/etc/opt/cray/wlm_detect",
"--mount=type=bind,src=/opt/gcc/10.3.0/snos/lib64/libatomic.so.1,dst=/usr/lib/libatomic.so.1",
"--device=/dev/kgni0",
"--device=/dev/kdreg",
"--device=/dev/xpmem"
],
"env": [
"LDCONFIG_PATH=/sbin/ldconfig"
]
},
"when": {
"annotations": {
"^com.hooks.mpi.enabled$": "^true$",
"^com.hooks.mpi.type$": "^libfabric$"
}
},
"stages": ["createContainer"]
}
Accessing a host Slurm WLM from inside a container
The Slurm workload manager from the host system can be exposed within containers through a set of bind mounts. Doing so enables containers to submit new allocations and jobs to the cluster, opening up the possibility for more articulated workflows.
The key components to bind mount are the binaries for Slurm commands, the host Slurm configuration, the MUNGE socket, and any related dependencies. Below you may find an example of hook configuration file enabling access to the host Slurm WLM on a Cray XC50 system at CSCS:
{
"version": "1.0.0",
"hook": {
"path": "/opt/sarus/default/bin/mount_hook",
"args": ["mount_hook",
"--mount=type=bind,src=/usr/bin/salloc,dst=/usr/bin/salloc",
"--mount=type=bind,src=/usr/bin/sbatch,dst=/usr/bin/sbatch",
"--mount=type=bind,src=/usr/bin/sinfo,dst=/usr/bin/sinfo",
"--mount=type=bind,src=/usr/bin/squeue,dst=/usr/bin/squeue",
"--mount=type=bind,src=/usr/bin/srun,dst=/usr/bin/srun",
"--mount=type=bind,src=/etc/slurm,dst=/etc/slurm",
"--mount=type=bind,src=/usr/lib64/slurm,dst=/usr/lib64/slurm",
"--mount=type=bind,src=/var/run/munge,destination=/run/munge",
"--mount=type=bind,src=/usr/lib64/libmunge.so.2,dst=/usr/lib64/libmunge.so.2",
"--mount=type=bind,src=/opt/cray/alpscomm/default/lib64/libalpscomm_sn.so.0,dst=/usr/lib64/libalpscomm_sn.so.0",
"--mount=type=bind,src=/opt/cray/alpscomm/default/lib64/libalpscomm_cn.so.0,dst=/usr/lib64/libalpscomm_cn.so.0",
"--mount=type=bind,src=/opt/cray/swrap/default/lib64/libswrap.so.0,dst=/usr/lib64/libswrap.so.0",
"--mount=type=bind,src=/opt/cray/socketauth/default/lib64/libsocketauth.so.0,dst=/usr/lib64/libsocketauth.so.0",
"--mount=type=bind,src=/opt/cray/comm_msg/default/lib64/libcomm_msg.so.0,dst=/usr/lib64/libcomm_msg.so.0",
"--mount=type=bind,src=/opt/cray/sysadm/default/lib64/libsysadm.so.0,dst=/usr/lib64/libsysadm.so.0",
"--mount=type=bind,src=/opt/cray/codbc/default/lib64/libcodbc.so.0,dst=/usr/lib64/libcodbc.so.0",
"--mount=type=bind,src=/opt/cray/nodeservices/default/lib64/libnodeservices.so.0,dst=/usr/lib64/libnodeservices.so.0",
"--mount=type=bind,src=/opt/cray/sysutils/default/lib64/libsysutils.so.0,dst=/usr/lib64/libsysutils.so.0",
"--mount=type=bind,src=/opt/cray/pe/atp/libAtpDispatch.so,dst=/opt/cray/pe/atp/libAtpDispatch.so",
"--mount=type=bind,src=/opt/cray/pe/atp/3.14.5/slurm/libAtpSLaunch.so,dst=/opt/cray/pe/atp/3.14.5/slurm/libAtpSLaunch.so",
"--mount=type=bind,src=/usr/lib64/libxmlrpc-epi.so.0,dst=/usr/lib64/libxmlrpc-epi.so.0",
"--mount=type=bind,src=/usr/lib64/libodbc.so.2,dst=/usr/lib64/libodbc.so.2",
"--mount=type=bind,src=/usr/lib64/libexpat.so.1,dst=/usr/lib64/libexpat.so.1",
"--mount=type=bind,src=/usr/lib64/libltdl.so.7,dst=/usr/lib64/libltdl.so.7",
"--mount=type=bind,src=/opt/cray/job/default/lib64/libjob.so.0,dst=/usr/lib64/libjob.so.0",
"--mount=type=bind,src=/opt/cray/job/default/lib64/libjobctl.so.0,dst=/usr/lib64/libjobctl.so.0",
"--mount=type=bind,src=/opt/cray/ugni/default/lib64/libugni.so.0,dst=/usr/lib64/libugni.so.0",
"--mount=type=bind,src=/usr/lib64/libjansson.so.4,dst=/usr/lib64/libjansson.so.4",
"--mount=type=bind,src=/opt/cscs/jobreport/jobreport.so,dst=/opt/cscs/jobreport/jobreport.so",
"--mount=type=bind,src=/opt/cscs/nohome/nohome.so,dst=/opt/cscs/nohome/nohome.so",
"--mount=type=bind,src=/usr/lib64/libslurm.so.36,dst=/usr/lib64/libslurm.so.36",
"--mount=type=bind,src=/usr/lib64/libcurl.so.4,dst=/usr/lib64/libcurl.so.4",
"--mount=type=bind,src=/usr/lib64/libnghttp2.so.14,dst=/usr/lib64/libnghttp2.so.14",
"--mount=type=bind,src=/usr/lib64/libssh.so.4,dst=/usr/lib64/libssh.so.4",
"--mount=type=bind,src=/usr/lib64/libpsl.so.5,dst=/usr/lib64/libpsl.so.5",
"--mount=type=bind,src=/usr/lib64/libssl.so.1.1,dst=/usr/lib64/libssl.so.1.1",
"--mount=type=bind,src=/usr/lib64/libcrypto.so.1.1,dst=/usr/lib64/libcrypto.so.1.1",
"--mount=type=bind,src=/usr/lib64/libldap_r-2.4.so.2,dst=/usr/lib64/libldap_r-2.4.so.2",
"--mount=type=bind,src=/usr/lib64/liblber-2.4.so.2,dst=/usr/lib64/liblber-2.4.so.2",
"--mount=type=bind,src=/usr/lib64/libsasl2.so.3,dst=/usr/lib64/libsasl2.so.3",
"--mount=type=bind,src=/usr/lib64/libyaml-0.so.2,dst=/usr/lib64/libyaml-0.so.2"
],
"env": [
"LDCONFIG_PATH=/sbin/ldconfig"
]
},
"when": {
"annotations": {
"^com.hooks.slurm.activate$": "^true$"
}
},
"stages": ["createContainer"]
}
The following is an example usage of the hook as configured above:
$ srun --pty sarus run --annotation=com.hooks.slurm.activate=true -t debian:11 bash
nid00040:/$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
nid00040:/$ srun --version
slurm 20.11.8
nid00040:/$ squeue -u <username>
JOBID USER ACCOUNT NAME ST REASON START_TIME TIME TIME_LEFT NODES CPUS
714067 <username> <account> sarus R None 12:48:41 0:40 59:20 1 24
nid00040:/$ salloc -N4 /bin/bash
salloc: Waiting for resource configuration
salloc: Nodes nid0000[0-3] are ready for job
nid00040:/$ srun -n4 hostname
nid00002
nid00003
nid00000
nid00001
# exit from inner Slurm allocation
nid00040:/$ exit
# exit from the container
nid00040:/$ exit
References