Native MPI hook (MPICH-based)
Sarus's source code includes a hook able to import native MPICH-based MPI implementations inside the container. This is useful in case the host system features a vendor-specific or high-performance MPI stack based on MPICH (e.g. Intel MPI, Cray MPI, MVAPICH) which is required to fully leverage a high-speed interconnect.
When activated, the hook will enter the mount namespace of the container, search for dynamically-linkable MPI libraries and replace them with functional equivalents bind-mounted from the host system.
In order for the replacements to work seamlessly, the hook will check that the host and container MPI implementations are ABI-compatible according to the standards defined by the MPICH ABI Compatibility Initiative. The Initiative is supported by several MPICH-based implementations, among which MVAPICH, Intel MPI, and Cray MPT. ABI compatibility and its implications are further discussed here.
Hook installation
The hook is written in C++ and it will be compiled when building Sarus without
the need of additional dependencies. Sarus' installation scripts will also
automatically install the hook in the $CMAKE_INSTALL_PREFIX/bin
directory.
In short, no specific action is required to install the MPI hook.
Hook configuration
The program is meant to be run as a createContainer hook and does not accept arguments. The following environment variables must be defined:
LDCONFIG_PATH
: Absolute path to a trustedldconfig
program on the host.MPI_LIBS
: Colon separated list of full paths to the host's libraries that will substitute the container's libraries. The ABI compatibility is checked by comparing the version numbers specified in the libraries' file names according to the specifications selected with the variableMPI_COMPATIBILITY_TYPE
.
The following optional environment variables are also supported:
MPI_COMPATIBILITY_TYPE
: String determining the logic adopted to check the ABI compatibility of MPI libraries. If defined, must be one ofmajor
,full
,strict
. If unset or set to an unexpected value, defaults tomajor
. The checks performed for compatibility in the different cases are as follows:major
The major numbers (first from the left) must be present and equal.
This is equivalent to checking that the
soname
of the libraries are the same.full
The major numbers (first from the left) must be present and equal.
The host's minor number (second from the left) must be present and greater than or equal to the container's minor number. In case the minor number from the container is greater than the host's minor number (i.e. the container library is probably being replaced with an older revision), the hook will print a verbose log message but will proceed in the attempt to let the container application run.
strict
The major numbers (first from the left) must be present and equal.
The host's minor number (second from the left) must be present and equal to the container's minor number.
This compatibility check is in agreement with the MPICH ABI version number schema.
MPI_DEPENDENCY_LIBS
: Colon separated list of absolute paths to libraries that are dependencies of theMPI_LIBS
. These libraries are always bind mounted in the container under/usr/lib
.BIND_MOUNTS
: Colon separated list of absolute paths to generic files or directories that are required for the correct functionality of the host MPI implementation (e.g. specific device files). These resources will be bind mounted inside the container with the same path they have on the host. If a path corresponds to a device file, that file will be whitelisted for read/write access in the container's devices cgroup.HOOK_ROOTLESS
: String indicating whether the hook is being run under a rootless container runtime. It determines some of the actions undertaken by the hook before performing its bind mounts, for example if identity switches are required to validate the mounts or to work with "root squashed" filesystems. By default, the hook operates in fully privileged mode, assuming "real root" capabilities. This is the way the hook is run under Sarus, and in such a case it is recommended to leave this environment variable unset.If this variable is set to
True
(case-insensitive), the hook assumes rootless execution. This setting is intended to enable using the hook under unprivileged tools like rootless Podman or Enroot.
The following is an example of OCI hook JSON configuration file enabling the MPI hook:
{
"version": "1.0.0",
"hook": {
"path": "/opt/sarus/bin/mpi_hook",
"env": [
"LDCONFIG_PATH=/sbin/ldconfig",
"MPI_LIBS=/usr/lib64/mvapich2-2.2/lib/libmpi.so.12.0.5:/usr/lib64/mvapich2-2.2/lib/libmpicxx.so.12.0.5:/usr/lib64/mvapich2-2.2/lib/libmpifort.so.12.0.5",
"MPI_DEPENDENCY_LIBS=",
"BIND_MOUNTS="
]
},
"when": {
"annotations": {
"^com.hooks.mpi.enabled$": "^true$"
}
},
"stages": ["createContainer"]
}
Configuring to leverage Sarus annotations and CLI options
Sarus automatically generates the com.hooks.mpi.enabled=true
OCI annotation
if the --mpi
command line option is passed to sarus run.
Such annotation can be entered in the hook configuration's when
conditions
to tie the activation of the hook to the presence of the --mpi
option:
"when": {
"annotations": {
"^com.hooks.mpi.enabled$": "^true$"
}
}
Additionally, the --mpi-type
option of sarus run automatically
generates both the com.hooks.mpi.enabled=true
and com.hooks.mpi.type=<value>
annotations, where the value of com.hooks.mpi.type
is the value passed to
the CLI option.
This allows to configure multiple MPI hooks for different native MPI stacks
(for example differing in implementation, compiler, or underlying communication
framework) and choose one through the sarus run command line.
For example:
# With these conditions, a hook will be enabled by '--mpi-type=mpich3'
"when": {
"annotations": {
"^com.hooks.mpi.enabled$": "^true$"
"^com.hooks.mpi.type$": "^mpich3$"
}
}
# With these conditions, a hook will be enabled by '--mpi-type=mpich4-libfabric'
"when": {
"annotations": {
"^com.hooks.mpi.enabled$": "^true$"
"^com.hooks.mpi.type$": "^mpich4-libfabric$"
}
}
When multiple hooks are configured, the defaultMPIType
parameter in sarus.json can be used to
define the default MPI hook for the system and allow users to enable it just
by using the --mpi
option.
Note
The rules for the OCI hook configuration file state that a hook is enabled
only if all the when
conditions match.
This means that if the administrator did not define defaultMPIType
in the
Sarus configuration and the user did not provide --mpi-type=<value>
in the CLI
arguments, the hooks configured with the mpi.type
condition will NOT be enabled.
Note
Multiple hooks can be enabled at the same time by configuring them with identical conditions.