Native MPI hook (MPICH-based)

Sarus's source code includes a hook able to import native MPICH-based MPI implementations inside the container. This is useful in case the host system features a vendor-specific or high-performance MPI stack based on MPICH (e.g. Intel MPI, Cray MPI, MVAPICH) which is required to fully leverage a high-speed interconnect.

When activated, the hook will enter the mount namespace of the container, search for dynamically-linkable MPI libraries and replace them with functional equivalents bind-mounted from the host system.

In order for the replacements to work seamlessly, the hook will check that the host and container MPI implementations are ABI-compatible according to the standards defined by the MPICH ABI Compatibility Initiative. The Initiative is supported by several MPICH-based implementations, among which MVAPICH, Intel MPI, and Cray MPT. ABI compatibility and its implications are further discussed here.

Hook installation

The hook is written in C++ and it will be compiled when building Sarus without the need of additional dependencies. Sarus' installation scripts will also automatically install the hook in the $CMAKE_INSTALL_PREFIX/bin directory. In short, no specific action is required to install the MPI hook.

Hook configuration

The program is meant to be run as a prestart hook and does not accept arguments, but its actions are controlled through a few environment variables:

  • LDCONFIG_PATH: Absolute path to a trusted ldconfig program on the host.

  • MPI_LIBS: Colon separated list of full paths to the host's libraries that will substitute the container's libraries. The ABI compatibility check is performed by comparing the version numbers specified in the libraries' file names as follows:

    • The major numbers (first from the left) must be equal.

    • The host's minor number (second from the left) must be greater or equal to the container's minor number. In case the minor number from the container is greater than the host's minor number, the hook will print a warning but will proceed in the attempt to let the container application run.

    • If the host's library name does not contain the version numbers or contains only the major version number, the missing numbers are assumed to be zero.

    This compatibility check is in agreement with the MPICH ABI version number schema.

  • MPI_DEPENDENCY_LIBS: Colon separated list of absolute paths to libraries that are dependencies of the MPI_LIBS. These libraries are always bind mounted in the container under /usr/lib.

  • BIND_MOUNTS: Colon separated list of absolute paths to generic files or directories that are required for the correct functionality of the host MPI implementation (e.g. specific device files). These resources will be bind mounted inside the container with the same path they have on the host. If a path corresponds to a device file, that file will be whitelisted for read/write access in the container's devices cgroup.

The following is an example of OCI hook JSON configuration file enabling the MPI hook:

{
    "version": "1.0.0",
    "hook": {
        "path": "/opt/sarus/bin/mpi_hook",
        "env": [
            "LDCONFIG_PATH=/sbin/ldconfig",
            "MPI_LIBS=/usr/lib64/mvapich2-2.2/lib/libmpi.so.12.0.5:/usr/lib64/mvapich2-2.2/lib/libmpicxx.so.12.0.5:/usr/lib64/mvapich2-2.2/lib/libmpifort.so.12.0.5",
            "MPI_DEPENDENCY_LIBS=",
            "BIND_MOUNTS="
        ]
    },
    "when": {
        "annotations": {
            "^com.hooks.mpi.enabled$": "^true$"
        }
    },
    "stages": ["prestart"]
}

Configuring to leverage Sarus annotations and CLI options

Sarus automatically generates the com.hooks.mpi.enabled=true OCI annotation if the --mpi command line option is passed to sarus run. Such annotation can be entered in the hook configuration's when conditions to tie the activation of the hook to the presence of the --mpi option:

"when": {
    "annotations": {
        "^com.hooks.mpi.enabled$": "^true$"
    }
}

Additionally, the --mpi-type option of sarus run automatically generates both the com.hooks.mpi.enabled=true and com.hooks.mpi.type=<value> annotations, where the value of com.hooks.mpi.type is the value passed to the CLI option. This allows to configure multiple MPI hooks for different native MPI stacks (for example differing in implementation, compiler, or underlying communication framework) and choose one through the sarus run command line. For example:

# With these conditions, a hook will be enabled by '--mpi-type=mpich3'
"when": {
    "annotations": {
        "^com.hooks.mpi.enabled$": "^true$"
        "^com.hooks.mpi.type$": "^mpich3$"
    }
}

# With these conditions, a hook will be enabled by '--mpi-type=mpich4-libfabric'
"when": {
    "annotations": {
        "^com.hooks.mpi.enabled$": "^true$"
        "^com.hooks.mpi.type$": "^mpich4-libfabric$"
    }
}

When multiple hooks are configured, the defaultMPIType parameter in sarus.json can be used to define the default MPI hook for the system and allow users to enable it just by using the --mpi option.

Note

The rules for the OCI hook configuration file state that a hook is enabled only if all the when conditions match.

This means that if the administrator did not define defaultMPIType in the Sarus configuration and the user did not provide --mpi-type=<value> in the CLI arguments, the hooks configured with the mpi.type condition will NOT be enabled.

Note

Multiple hooks can be enabled at the same time by configuring them with identical conditions.