Native MPI hook (MPICH-based)

Sarus’s source code includes a hook able to import native MPICH-based MPI implementations inside the container. This is useful in case the host system features a vendor-specific or high-performance MPI stack based on MPICH (e.g. Intel MPI, Cray MPI, MVAPICH) which is required to fully leverage a high-speed interconnect.

When activated, the hook will enter the mount namespace of the container, search for dynamically-linkable MPI libraries and replace them with functional equivalents bind-mounted from the host system.

In order for the replacements to work seamlessly, the hook will check that the host and container MPI implementations are ABI-compatible according to the standards defined by the MPICH ABI Compatibility Initiative. The Initiative is supported by several MPICH-based implementations, among which MVAPICH, Intel MPI, and Cray MPT. ABI compatibility and its implications are further discussed here.

Hook installation

The hook is written in C++ and it will be compiled when building Sarus without the need of additional dependencies. Sarus’s installation scripts will also automatically install the hook in the $CMAKE_INSTALL_PREFIX/bin directory. In short, no specific action is required to install the MPI hook.

Sarus configuration

The program is meant to be run as a prestart hook and does not accept arguments, but its actions are controlled through a few environment variables:

  • LDCONFIG_PATH: Absolute path to a trusted ldconfig program on the host.

  • MPI_LIBS: Colon separated list of full paths to the host’s libraries that will substitute the container’s libraries. The ABI compatibility check is performed by comparing the version numbers specified in the libraries’ file names as follows:

    • The major numbers (first from the left) must be equal.

    • The host’s minor number (second from the left) must be greater or equal to the container’s minor number. In case the minor number from the container is greater than the host’s minor number, the hook will print a warning but will proceed in the attempt to let the container application run.

    • If the host’s library name does not contain the version numbers or contains only the major version number, the missing numbers are assumed to be zero.

    This compatibility check is in agreement with the MPICH ABI version number schema.

  • MPI_DEPENDENCY_LIBS: Colon separated list of absolute paths to libraries that are dependencies of the MPI_LIBS. These libraries are always bind mounted in the container under /usr/lib.

  • BIND_MOUNTS: Colon separated list of absolute paths to generic files or directories that are required for the correct functionality of the host MPI implementation (e.g. specific device files). These resources will be bind mounted inside the container with the same path they have on the host. If a path corresponds to a device file, that file will be whitelisted for read/write access in the container’s devices cgroup.

The following is an example of OCI hook JSON configuration file enabling the MPI hook:

{
    "version": "1.0.0",
    "hook": {
        "path": "/opt/sarus/bin/mpi_hook",
        "env": [
            "LDCONFIG_PATH=/sbin/ldconfig",
            "MPI_LIBS=/usr/lib64/mvapich2-2.2/lib/libmpi.so.12.0.5:/usr/lib64/mvapich2-2.2/lib/libmpicxx.so.12.0.5:/usr/lib64/mvapich2-2.2/lib/libmpifort.so.12.0.5",
            "MPI_DEPENDENCY_LIBS=",
            "BIND_MOUNTS="
        ]
    },
    "when": {
        "annotations": {
            "^com.hooks.mpi.enabled$": "^true$"
        }
    },
    "stages": ["prestart"]
}

Sarus support at runtime

The annotation com.hooks.mpi.enabled=true is automatically generated by Sarus if the --mpi command line option is passed to sarus run.