User guide
Basic end-to-end user workflow
Steps to do on your workstation:
Build the container image using the Docker tool and Dockerfiles.
Push the Docker image into Docker Hub registry (https://hub.docker.com).
Steps to do on the HPC system:
Pull the Docker image from the Docker Hub registry using Sarus.
Run the image at scale with Sarus.
An explanation of the different steps required to deploy a Docker image using Sarus follows.
1. Develop the Docker image
First, the user has to build a container image. This boils down to writing a Dockerfile that describes the container, executing the Docker command line tool to build the image, and then running the container to test it. Below you can find an overview of what the development process looks like. We provide a brief introduction to using Docker, however, for a detailed explanation please refer to the Docker Get Started guide.
Let's start with a simple example. Consider the following Dockerfile where we install Python on a Debian Jessie base image:
FROM debian:jessie
RUN apt-get -y update && apt-get install -y python
Once that the user has written the Dockerfile, the container image can be built:
$ docker build -t hello-python .
The previous step will take the content of the Dockerfile and build our
container image. The first entry, FROM debian:jessie
, will specify a base
Linux distribution image as a starting point for our container (in this case,
Debian Jessie), the image of which Docker will try to fetch from its registry if
it is not already locally available. The second entry RUN apt-get -y update &&
apt-get install -y python
will execute the command that follows the RUN
instruction, updating the container with the changes done to its software
environment: in this case, we are installing Python using the apt-get package
manager.
Once the build step has finished, we can list the available container images using:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-python latest 2e57316387c6 3 minutes ago 224 MB
We can now spawn a container from the image we just built (tagged as
hello-python
), and run Python inside the container
(python --version
), so as to verify the version of the installed
interpreter:
$ docker run --rm hello-python python --version
Python 2.7.9
One of the conveniences of building containers with Docker is that this process can be carried out solely on the user's workstation/laptop, enabling quick iterations of building, modifying, and testing of containerized applications that can then be easily deployed on a variety of systems, greatly improving user productivity.
Note
Reducing the size of the container image, besides saving disk space, also speeds up the process of importing it into Sarus later on. The easiest ways to limit image size are cleaning the package manager cache after installations and deleting source codes and other intermediate build artifacts when building software manually. For practical examples and general good advice on writing Dockerfiles, please refer to the official Best practices for writing Dockerfiles.
A user can also access the container interactively through a shell, enabling quick testing of commands. This can be a useful step for evaluating commands before adding them to the Dockerfile:
$ docker run --rm -it hello-python bash
root@c5fc1954b19d:/# python --version
Python 2.7.9
root@c5fc1954b19d:/#
2. Push the Docker image to the Docker Hub
Once the image has been built and tested, you can log in to DockerHub (requires an account) and push it, so that it becomes available from the cloud:
$ docker login
$ docker push <user name>/<repo name>:<image tag>
Note that in order for the push to succeed, the image has to be
correctly tagged with the same <repository name>/<image name>:<image tag>
identifier you intend to push to. Images can be tagged at build-time, supplying
the -t
option to docker build, or afterwards by using
docker tag. In the case of our example:
$ docker tag hello-python <repo name>/hello-python:1.0
The image tag (the last part of the identifier after the colon) is optional.
If absent, Docker will set the tag to latest
by default.
3. Pull the Docker image from Docker Hub
Now the image is stored in the Docker Hub and you can pull it into the HPC system using the sarus pull command followed by the image identifier:
$ sarus pull <repo name>/hello-python:1.0
While performing the pull does not require specific privileges, it is generally advisable to run sarus pull on the system's compute nodes through the workload manager: compute nodes often have better hardware and, in some cases like Cray XC systems, large RAM filesystems, which will greatly reduce the pull process time and will allow to pull larger images.
Should you run into problems because the pulled image doesn't fit in the default
filesystem, you can specify an alternative temporary directory with the
--temp-dir
option.
You can use sarus images to list the images available on the system:
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
<repo name>/hello-python 1.0 6bc9d2cd1831 2018-01-19T09:43:04 40.16MB docker.io
4. Run the image at scale with Sarus
Once the image is available to Sarus we can run it at scale using the workload manager. For example, if using SLURM:
$ srun -N 1 sarus run <repo name>/hello-python:1.0 python --version
Python 2.7.9
As with Docker, containers can also be used through a terminal, enabling quick testing of commands:
$ srun -N 1 --pty sarus run -t debian bash
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ exit
The --pty
option to srun
and the -t/--tty
option to sarus run
are needed to properly setup the pseudo-terminals in order to achieve a familiar
user experience.
You can tell the previous example was run inside a container by querying the specifications of your host system. For example, the OS of Cray XC compute nodes is based on SLES and not Debian:
$ srun -N 1 cat /etc/os-release
NAME="SLES"
VERSION="12"
VERSION_ID="12"
PRETTY_NAME="SUSE Linux Enterprise Server 12"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:12"
Additional features
Pulling images from 3rd party registries
By default, Sarus tries to pull images from Docker Hub. To pull an image from a registry different from Docker Hub, enter the server address as part of the image reference. For example, to access the NVIDIA GPU Cloud:
$ srun -N 1 sarus pull --login nvcr.io/nvidia/k8s/cuda-sample:nbody
username :$oauthtoken
password :
...
To work with images not pulled from Docker Hub (including the removal detailed in a later section), you need to enter the image reference as displayed by the sarus images command in the first two columns:
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
nvcr.io/nvidia/k8s/cuda-sample nbody 29e2298d9f71 2019-01-14T12:22:25 91.88MB nvcr.io
$ srun -N1 sarus run nvcr.io/nvidia/k8s/cuda-sample:nbody cat /usr/local/cuda/version.txt
CUDA Version 9.0.176
Pulling images from private repositories
Remote registries may host image repositories which are not public, but require credentials in order to access their content. To retrieve images from a private repository, the sarus pull command offers several options.
The --login
option allows to enter credentials through an interactive
prompt after launching the command:
$ srun -N 1 sarus pull --login <privateRepo>/<image>:<tag>
username: user
password:
...
--login
also supports piping to stdin, at the condition that user name and
password are separated by a newline character:
$ srun -N 1 printf '<user>\n<password>' | sarus pull --login <privateRepo>/<image>:<tag>
The -u/--user
option allows to provide the user name as part of the command
line, while the --password-stdin
option reads the password from stdin. These
two options complement each other naturally:
$ srun -N 1 cat passwordFile.txt | sarus pull --username <user> --password-stdin <privateRepo>/<image>:<tag>
It is also possible to combine the -u/--user
option with --login
(for
example to automatically populate the username field in the interactive prompt);
on the other hand, the --password-stdin
option cannot be used in conjunction
with --login
.
Managing the registry authentication file
Sarus internally relies on Skopeo to
download images from remote registries. When using the --login
option, Sarus
passes credentials to Skopeo through a
containers-auth.json(5)
authentication file, which is generated using data only for the specific
repository being accessed.
The authentication file is created within the ${XDG_RUNTIME_DIR}/sarus
directory if the XDG_RUNTIME_DIR
environment variable is defined and its
value is the path to an existing directory.
Otherwise, the file is created in the Sarus local repository for the current user.
Note
The path to the local repository for a given user can be obtained from the
localRepositoryBaseDir parameter
in the Sarus configuration file
according to the pattern <localRepositoryBaseDir>/<user name>/.sarus
.
The file is owned by the user who launched the sarus pull command, and is set to have owner-only read and write access to prevent exposing the registry credentials.
The authentication file generated by Sarus is intended to be specific for each pull invocation, and is automatically removed at the end of a successful pull process. If an image pull terminates abnormally, the file may be left lingering in the filesystem. In such case, it can be removed manually by the owner, or it will be overwritten and removed by the next successful sarus pull of a private image.
Pulling images by digest (immutable identifier)
Container images are usually pulled using a tag, which is an arbitrary label to differentiate images within the same repository. Image tags are mutable, and can potentially point to different images at different times, for example in the case an image is rebuilt.
Sarus supports the capability to pull images using a digest, an immutable identifier which uniquely and consistently points to a specific version of an image. Digests are useful to increase clarity and reproducibility in container workflows by allowing to reference exact software stack versions.
As defined
by the OCI Image Specification, digests take the form of a string using the
<algorithm>:<encoded>
pattern. The algorithm
portion indicates the
cryptographic algorithm used for the digest, while the encoded
portion
represents the result of the hash function.
To pull an image by digest, append the digest to the image name using @
as
separator:
$ sarus pull debian@sha256:039f72a400b48c272c6348f0a3f749509b18e611901a21379abc7eb6edd53392
# image : docker.io/library/debian@sha256:039f72a400b48c272c6348f0a3f749509b18e611901a21379abc7eb6edd53392
# cache directory : "/home/<user>/.sarus/cache"
# temp directory : "/tmp"
# images directory : "/home/<user>/.sarus/images"
# image digest : sha256:039f72a400b48c272c6348f0a3f749509b18e611901a21379abc7eb6edd53392
Getting image source signatures
Copying blob 5492f66d2700 done
Copying config 3c3ca0ede6 done
Writing manifest to image destination
Storing signatures
> unpacking OCI image
> make squashfs image: "/home/<user>/.sarus/images/docker.io/library/debian/sha256-039f72a400b48c272c6348f0a3f749509b18e611901a21379abc7eb6edd53392.squashfs"
It is possible to combine tag and digest in the argument of the sarus pull command. In this case, Sarus proceeds to pull the image indicated by the digest and completely ignores the tag. This behavior is consistent with other container tools like Docker, Podman and Buildah:
$ sarus pull alpine:3.15.2@sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97
# image : docker.io/library/alpine:3.15.2@sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97
# cache directory : "/home/<user>/.sarus/cache"
# temp directory : "/tmp"
# images directory : "/home/<user>/.sarus/images"
# image digest : sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97
Getting image source signatures
Copying blob 3aa4d0bbde19 done
Copying config e367198082 done
Writing manifest to image destination
Storing signatures
> unpacking OCI image
> make squashfs image: "/home/<user>/.sarus/images/docker.io/library/alpine/sha256-73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97.squashfs"
Even if the tag is ignored by the pull, it can still serve as a visual aid for for users writing or reading the command, helping to understand what image the digest is pointing to.
Pulling images using a proxy
Sarus delegates communication with remote registries to Skopeo, which in turn uses the Golang standard library to implement http connections. The usage of a proxy by the Golang http package can be controlled through a set of environment variables.
The HTTP_PROXY
and HTTPS_PROXY
variables (or their lowercase versions)
are used to indicate proxies for the respective protocols. For https requests,
HTTPS_PROXY
takes precedence over HTTP_PROXY
.
To connect to a specific registry without going through the proxy, the
NO_PROXY
or no_proxy
environment variables can be used. Such variables
should contain a list of comma-separated values for which a proxy connection is
not used.
The values of the environment variables may be either a complete URL or in a
host[:port]
format, in which case the http
scheme is assumed.
The supported schemes are http
, https
, and socks5
.
For example:
http_proxy=socket5://127.0.0.1:3128
https_proxy=https://127.0.0.1:3128
no_proxy=localhost,127.0.0.1,*.docker.io
Pulling images from insecure registries
Images from remote registries are downloaded through Skopeo, which by default uses secure connections and the TLS protocol.
To enable image pulls from a registry unable to provide the required TLS certificates, the registry must be declared as "insecure" in a registry configuration file for container tools.
The main registry configuration file for a given user is located in
${HOME}/.config/containers/registries.conf
; if such file does not exist,
/etc/containers/registries.conf
is used instead.
In addition to the main file, specific drop-in configuration files can be
created in ${HOME}/.config/containers/registries.conf.d
or in
/etc/containers/registries.conf.d
. Drop-in configuration files are loaded
after the main configuration, and their values will overwrite any previous setting.
For more details about drop-in configuration directories, please refer to the
containers-registries.conf.d(5)
manpage.
The following example shows how to create a drop-in registry configuration file
defining an insecure registry located at localhost:5000
:
$ mkdir -p ${HOME}/.config/containers/registries.conf.d
$ cat <<EOF > ${HOME}/.config/containers/registries.conf.d/insecure-localhost.conf
> [[registry]]
> prefix = "localhost:5000"
> insecure = true
> location = "localhost:5000"
> EOF
Notice the insecure = true
setting. For the full details about the format
and the available settings of a container registry configuration file, please
refer to the containers-registries.conf(5)
manpage.
Once an insecure registry has been entered in a configuration file, it is possible to pull images from it:
$ sarus pull localhost:5000/library/alpine
# image : localhost:5000/library/alpine:latest
# cache directory : "/home/docker/.sarus/cache"
# temp directory : "/tmp"
# images directory : "/home/docker/.sarus/images"
# image digest : sha256:e7d88de73db3d3fd9b2d63aa7f447a10fd0220b7cbf39803c803f2af9ba256b3
Getting image source signatures
Copying blob 59bf1c3509f3 done
Copying config d539cd357a done
Writing manifest to image destination
Storing signatures
> unpacking OCI image
> make squashfs image: "/home/docker/.sarus/images/localhost:5000/library/alpine/latest.squashfs"
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
localhost:5000/library/alpine latest d539cd357acb 2022-03-15T17:24:08 2.61MB localhost:5000
Warning
Please note that the use of insecure registries represents a significant security risk, and as such should only be used in exceptional cases such as local testing.
Download cache
During image pulls, Sarus stores individual image components (like filesystem layers and OCI configuration files) downloaded from registries in a cache directory, so they can be reused by subsequent pull commands.
The location of the cache directory for the current user is displayed at the
beginning of the sarus pull command output, alongside image properties
and other directories used by Sarus.
The location of the cache can also be obtained from the
localRepositoryBaseDir parameter
in the sarus.json
configuration file,
using the following path format <localRepositoryBaseDir>/<username>/.sarus/cache
.
The contents of the download cache can be deleted at any time to free up storage space.
Loading images from tar archives
If you do not have access to a remote Docker registry, or you're uncomfortable with uploading your images to the cloud in order to pull them, Sarus offers the possibility to load images from tar archives generated by docker save.
First, save an image to a tar archive using Docker on your workstation:
$ docker save --output debian.tar debian:jessie
$ ls -sh
total 124M
124M debian.tar
Then, transfer the archive to the HPC system and use the sarus load command, followed by the archive filename and the reference you want to give to the Sarus image:
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
$ srun sarus load ./debian.tar my_debian
> expand image layers ...
> extracting : /tmp/debian.tar/7e5c6402903b327fc62d1144f247c91c8e85c6f7b64903b8be289828285d502e/layer.tar
> make squashfs ...
> create metadata ...
# created: <user home>/.sarus/images/load/library/my_debian/latest.squashfs
# created: <user home>/.sarus/images/load/library/my_debian/latest.meta
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
load/library/my_debian latest 2fe79f06fa6d 2018-01-31T15:08:56 47.04MB load
The image is now ready to use. Notice that the origin server for the image has
been labeled load
to indicate this image has been loaded from an archive.
Similarly to sarus pull, we recommend to load tar archives from
compute nodes. Should you run out of space while unpacking the image,
sarus load also accepts the --temp-dir
option to specify an
alternative unpacking directory.
As with images from 3rd party registries, to use or remove loaded images you need to enter the image reference as displayed by the sarus images command in the first two columns (repository[:tag]).
Displaying image digests
Images pulled by digest do not have a tag associated to them. In order to run
or remove such images, it is necessary to provide the full digest after the
image name. The digests of the images available in the Sarus local repository
can be displayed using the --digests
option of the sarus images
command:
$ sarus images --digests
REPOSITORY TAG DIGEST IMAGE ID CREATED SIZE SERVER
alpine latest sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97 e3671980822d 2022-03-25T13:17:13 2.61MB docker.io
fedora latest sha256:36af84ba69e21c9ef86a0424a090674c433b2b80c2462e57503886f1d823abe8 04d13a5c8de5 2022-03-25T13:17:57 50.03MB docker.io
ubuntu <none> sha256:dcc176d1ab45d154b767be03c703a35fe0df16cfb1cc7ea5dd3b6f9af99b6718 4f4768f23ea4 2022-03-25T13:21:40 26.41MB docker.io
Running images by digest
To run images pulled by digest, append the digest to the image name using
@
as separator:
$ sarus images --digests
REPOSITORY TAG DIGEST IMAGE ID CREATED SIZE SERVER
alpine <none> sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97 e3671980822d 2022-03-25T14:28:45 2.61MB docker.io
$ sarus run alpine@sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97 cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.2
PRETTY_NAME="Alpine Linux v3.15"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
As with the sarus pull command, if both tag and digest are specified the tag is ignored and the image is looked up using the digest:
$ sarus run alpine:1.0@sha256:73c155696fe65b68696e6ea24088693546ac468b3e14542f23f0efbde289cc97 cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.2
PRETTY_NAME="Alpine Linux v3.15"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
Removing images
To remove an image from Sarus's local repository, use the sarus rmi command:
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
library/debian latest 6bc9d2cd1831 2018-01-31T14:11:27 40.17MB docker.io
$ sarus rmi debian:latest
removed docker.io/library/debian/latest
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
To remove images pulled from 3rd party registries or images loaded from local tar archives you need to enter the image reference as displayed by the sarus images command:
$ sarus images
REPOSITORY TAG IMAGE ID CREATED SIZE SERVER
load/library/my_debian latest 2fe79f06fa6d 2018-01-31T15:08:56 47.04MB load
$ sarus rmi load/library/my_debian
removed load/library/my_debian/latest
To remove images pulled by digest, append the digest to the image name using
@
as separator:
$ sarus rmi ubuntu@sha256:dcc176d1ab45d154b767be03c703a35fe0df16cfb1cc7ea5dd3b6f9af99b6718
removed image docker.io/library/ubuntu@sha256:dcc176d1ab45d154b767be03c703a35fe0df16cfb1cc7ea5dd3b6f9af99b6718
Naming the container
The sarus run command line option --name
can be used to assign a custom name to the container.
If the option is not specified, Sarus assigns a name in the form sarus-container-<random string>
.
$ sarus run --name=my-container <other options> <image>
Kill a container
A running container can be killed, i.e. stopped and deleted, using the sarus kill command, for example:
$ sarus kill my-container
Listing running containers
Users can list their currently running containers with the sarus ps command. Containers started by other users are not shown.
$ sarus run --name my-container -t ubuntu:22.04
...
$ sarus ps
ID PID STATUS BUNDLE CREATED OWNER
my-container 651945 running /opt/sarus/default/var/OCIBundleDir 2024-02-19T12:57:26.053166138Z root
Environment
Environment variables within containers are set by combining several sources, in the following order (later entries override earlier entries):
Host environment of the process calling Sarus
Environment variables defined in the container image, e.g., Docker ENV-defined variables
Modification of variables related to the NVIDIA Container Toolkit
Modifications (set/prepend/append/unset) specified by the system administrator in the Sarus configuration file. See here for details.
Environment variables defined using the
-e/--env
option of sarus run. The option can be passed multiple times, defining one variable per option. The first occurring=
(equals sign) character in the option value is treated as the separator between the variable name and its value:$ srun sarus run -e SARUS_CONTAINER=true debian bash -c 'echo $SARUS_CONTAINER' SARUS_CONTAINER=true $ srun sarus run --env=CLI_VAR=cli_value debian bash -c 'echo $CLI_VAR' CLI_VAR=cli_value $ srun sarus run --env NESTED=innerName=innerValue debian bash -c 'echo $NESTED' NESTED=innerName=innerValue
If an
=
is not provided in the option value, Sarus considers the string as the variable name, and takes the value from the corresponding variable in the host environment. This can be used to override a variable set in the image with the value from the host. If no=
is provided and a matching variable is not found in the host environment, the option is ignored and the variable is not set in the container.
Accessing host directories from the container
System administrators can configure Sarus to automatically mount facilities like parallel filesystems into every container. Refer to your site documentation or system administrator to know which resources have been enabled on a specific system.
Mounting custom files and directories into the container
By default, Sarus creates the container filesystem environment from the image
and host system as specified by the system administrator.
It is possible to request additional paths from the host environment to be
mapped to some other path within the container using the --mount
option of
the sarus run command:
$ srun -N 1 --pty sarus run --mount=type=bind,source=/path/to/my/data,destination=/data -t debian bash
The previous command would cause /path/to/my/data
on the host to be mounted
as /data
within the container. This mount option can be specified multiple
times, one for each mount to be performed.
--mount
accepts a comma-separated list of <key>=<value>
pairs as its
argument, much alike the Docker option with the same name (for reference, see
the official Docker documentation on bind mounts). As with Docker,
the order of the keys is not significant. A detailed breakdown of the possible
flags follows in the next subsections.
Mandatory flags
type
: represents the type of the mount. Currently, onlybind
(for bind-mounts) is supported.source
(required): Absolute path accessible from the user on the host that will be mounted in the container. Can alternatively be specified assrc
.destination
: Absolute path to where the filesystem will be made available inside the container. If the directory does not exist, it will be created. It is possible to overwrite other bind mounts already present in the container, however, the system administrator retains the power to disallow user-requested mounts to any location at their discretion. May alternatively be specified asdst
ortarget
.
Bind mounts
In addition to the mandatory flags, regular bind mounts can optionally add the following flag:
readonly
(optional): Causes the filesystem to be mounted as read-only. This flag takes no value.
The following example demonstrates the use of a custom read-only bind mount.
$ ls -l /input_data
drwxr-xr-x. 2 root root 57 Feb 7 10:49 ./
drwxr-xr-x. 23 root root 4096 Feb 7 10:49 ../
-rw-r--r--. 1 root root 1048576 Feb 7 10:49 data1.csv
-rw-r--r--. 1 root root 1048576 Feb 7 10:49 data2.csv
-rw-r--r--. 1 root root 1048576 Feb 7 10:49 data3.csv
$ echo "1,2,3,4,5" > data4.csv
$ srun -N 1 --pty sarus run --mount=type=bind,source=/input_data,destination=/input,readonly -t debian bash
$ ls -l /input
-rw-r--r--. 1 root 0 1048576 Feb 7 10:49 data1.csv
-rw-r--r--. 1 root 0 1048576 Feb 7 10:49 data2.csv
-rw-r--r--. 1 root 0 1048576 Feb 7 10:49 data3.csv
-rw-r--r--. 1 root 0 10 Feb 7 10:52 data4.csv
$ cat /input/data4.csv
1,2,3,4,5
$ touch /input/data5.csv
touch: cannot touch '/input/data5.csv': Read-only file system
$ exit
Note
Bind-mounting FUSE filesystems into Sarus containers
By default, all FUSE filesystems are accessible only by the user who mounted them; this restriction is enforced by the kernel itself. Sarus is a privileged application setting up the container as the root user in order to perform some specific actions.
To allow Sarus to access a FUSE mount point on the host, in order to bind mount it into a container,
use the FUSE option allow_root
when creating the mount point.
For example, when creating an EncFS filesystem:
$ encfs -o allow_root --nocache $PWD/encfs.enc/ /tmp/encfs.dec/
$ sarus run -t --mount=type=bind,src=/tmp/encfs.dec,dst=/var/tmp/encfs ubuntu ls -l /var/tmp
It is possible to pass allow_root
if the option user_allow_other
is defined in
/etc/fuse.conf
, as stated in the FUSE manpage.
Mounting custom devices into the container
Devices can be made available inside containers through the --device
option
of sarus run. The option can be entered multiple times, specifying
one device per option. By default, device files will be mounted into the
container at the same path they have on the host. A different destination path
can be entered using a colon (:
) as separator from the host path.
All paths used in the option value must be absolute:
$ srun sarus run --device=/dev/fuse debian ls -l /dev/fuse
crw-rw-rw-. 1 root root 10, 229 Aug 17 17:54 /dev/fuse
$ srun sarus run --device=/dev/fuse:/dev/container_fuse debian ls -l /dev/container_fuse
crw-rw-rw-. 1 root root 10, 229 Aug 17 17:54 /dev/container_fuse
When working with device files, the --mount
option should not be used since
access to custom devices is disabled by default through the container's device
cgroup. The --device
option, on the other hand, will also whitelist the
requested devices in the cgroup, making them accessible within the container.
By default, the option will grant read, write and mknod permissions to devices;
this behavior can be controlled by adding a set of flags at the end of an option
value, still using a colon as separator. The flags must be a combination of the
characters rwm
, standing for read, write and mknod access respectively;
the characters may come in any order, but must not be repeated. The access flags
can be entered independently from the presence of a custom destination path.
The full syntax of the option is thus --device=host-device[:container-device][:permissions]
The following example shows how to mount a device with read-only access:
$ srun --pty sarus run -t --device=/dev/example:r debian bash
$ echo "hello" > /dev/example
bash: /dev/example: Operation not permitted
$ exit
Important
Sarus and the --device
option cannot grant more permissions to a device
than those which have been allowed on the host.
For example, if in the host a device is set for read-only access, then Sarus
cannot enable write or mknod access.
This is enforced by the implementation of device cgroups in the Linux kernel. For more details, please refer to the kernel documentation.
Image entrypoint and default arguments
Sarus fully supports image entrypoints and default arguments as defined by the OCI Image Specification.
The entrypoint of an image is a list of arguments that will be used as the command to execute when the container starts; it is meant to create an image that will behave like an executable file.
The image default arguments will be passed to the entrypoint if no argument is provided on the commmand line when launching a container. If the entrypoint is not present, the first default argument will be treated as the executable to run.
When creating container images with Docker, the entrypoint and default arguments are set using the ENTRYPOINT and CMD instructions respectively in the Dockerfile. For example, this file will generate an image printing arguments to the terminal by default:
FROM debian:stretch
ENTRYPOINT ["/bin/echo"]
CMD ["Hello world"]
After building such image (we'll arbitrarily call it echo
) and importing it
into Sarus, we can run it without passing any argument:
$ srun sarus run <image repo>/echo
Hello world
Entering a command line argument will override the default arguments passed to the entrypoint:
$ srun sarus run <image repo>/echo Foobar
Foobar
The image entrypoint can be changed by providing a value to the --entrypoint
option of sarus run. It is important to note that when changing the
entrypoint the default arguments get discarded as well:
$ srun sarus run --entrypoint=cat <image repo>/echo /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
The entrypoint can be removed by passing an empty value to --entrypoint
. This
is useful, for example, for inspecting and debugging containers:
$ srun --pty sarus run --entrypoint "" -t <image repo>/echo bash
$ env | grep ^PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
$ exit
Note
Using the "adjacent value" style to remove an image entrypoint
(i.e. --entrypoint=""
) is not supported. Please pass the empty string
value separated by a whitespace.
Working directory
The working directory inside the container can be controlled using the
-w/--workdir
option of the sarus run command:
$ srun -N 1 --pty sarus run --workdir=/path/to/workdir -t debian bash
If the path does not exist, it is created inside the container.
If the -w/--workdir
option is not specified but the image defines
a working directory, the container process will start
there. Otherwise, the process will start in the container's root directory (/
).
Using image-defined working directories can be useful, for example, for
simplifying the command line when launching containers.
When creating images with Docker, the working directory is set using the WORKDIR instruction in the Dockerfile.
PID namespace
The PID namespace for the container can be controlled through the --pid
option of sarus run. Currently, the supported values for the option
are:
host
: use the host's PID namespace for the container. This allows to transparently support MPI implementations relying on the ranks having different PIDs when running on the same physical host, or using shared memory technologies like Cross Memory Attach (CMA); (default)private
: create a new PID namespace for the container. Having a private PID namespace can be also referred as "using PID namespace isolation" or simply "using PID isolation".$ sarus run --pid=private alpine:3.14 ps -o pid,comm PID COMMAND 1 ps
Note
Consider using an init process when running with a private PID namespace if you need to handle signals or run many processes into the container.
Adding an init process to the container
By default, Sarus only executes the user-specified application within the container. When using a private PID namespace, the container process is assigned PID 1 in the new namespace. The PID 1 process has unique features in Linux: most notably, the process will ignore signals by default and zombie processes will not be reaped inside the container (see [1] , [2] for further reference).
If you need to handle signals or reap zombie processes (which can be useful when
executing several different processes in long-running containers), you can use the
--init
option to run an init process inside the container:
$ srun -N 1 sarus run --pid=private --init alpine:3.14 ps -o pid,comm
PID COMMAND
1 init
7 ps
Sarus uses tini as its default init process.
Warning
Some HPC applications may be subject to performance losses when run with an init process. Our internal benchmarking tests with tini showed overheads of up to 2%.
Setting OCI annotations
OCI annotations
are defined in the OCI Runtime Specification as a mean to provide arbitrary
metadata for the container in the form of a key-value map. Annotation keys usually
express a hierarchical namespace structure, with domains separated by .
(full stop) characters.
Annotations can be useful to control how Sarus selects OCI hooks or, depending on their implementation, certain features of OCI hooks and OCI runtimes.
To set an annotation when running a container, use the --annotation
option
of sarus run.
The option can be passed multiple times, defining one annotation per option.
The first occurring =
(equals sign) character in the option value is treated
as the separator between the annotation key and its value.
$ srun -N 1 sarus run --annotation com.documentation.example.key=value debian true
Annotations set from the Sarus command line take precedence over other annotations,
for example coming from the container image (sometimes also known as "image labels")
or created automatically by Sarus itself.
For example, the hooks shipped alongside Sarus use the com.hooks.logging.level
annotation to determine their verbosity level. The annotation is created internally
by Sarus to match the verbosity of the engine. A different verbosity level for the
hooks can be defined by overriding the annotation as follows:
# Set Sarus hooks to print debug-level messages (level 0)
# while Sarus keeps its normal verbosity
$ srun -N 1 sarus run --annotation=com.hooks.logging.level=0 debian true
Please refer to your site documentation or your system administrator to learn about custom features related to OCI annotations on a specific system.
Verbosity levels and help messages
To run a command in verbose mode, enter the --verbose
global option before
the command:
$ srun sarus --verbose run debian:latest cat /etc/os-release
To run a command printing extensive details about internal workings, enter the --debug
global
option before the command:
$ srun sarus --debug run debian:latest cat /etc/os-release
To print a general help message about Sarus, use sarus --help
.
To print information about a command (e.g. command-specific options), use
sarus help <command>
:
$ sarus help run
Usage: sarus run [OPTIONS] REPOSITORY[:TAG] [COMMAND] [ARG...]
Run a command in a new container
Note: REPOSITORY[:TAG] has to be specified as
displayed by the "sarus images" command.
Options:
--centralized-repository Use centralized repository instead of the local one
-t [ --tty ] Allocate a pseudo-TTY in the container
--entrypoint arg Overwrite the default ENTRYPOINT of the image
--mount arg Mount custom directories into the container
-m [ --mpi ] Enable MPI support
-n [ --name ] arg Assign a name to the container
--ssh Enable SSH in the container
Support for container customization through hooks
Sarus allows containers to be customized by other programs or scripts leveraging the interface defined by the Open Container Initiative Runtime Specification for POSIX-platform hooks (OCI hooks for short). These customizations are especially amenable to HPC use cases, where the dedicated hardware and highly-tuned software adopted by high-performance systems are in contrast with the infrastructure-agnostic nature of software containers. OCI hooks provide a solution to open access to these resources inside containers.
The hooks which is possible to enable on a given Sarus installation are configured by the system administrators. The sarus hooks command can be used to list the currently configured hooks:
$ sarus hooks
NAME PATH STAGES
01-glibc-hook /opt/sarus/default/bin/glibc_hook createContainer
03-nvidia-container-runtime-hook /usr/bin/nvidia-container-runtime-hook prestart
05-mpi-hook /opt/sarus/default/bin/mpi_hook createContainer
07-ssh-hook /opt/sarus/default/bin/ssh_hook createRuntime
09-slurm-global-sync-hook /opt/sarus/default/bin/slurm_global_sync_hook createContainer
11-amdgpu-hook /opt/sarus/default/bin/amdgpu_hook createContainer
12-mount-hook /opt/sarus/default/bin/mount_hook createContainer
Please refer to your site documentation or your system administrator for information about the conditions to enable hooks on a specific system.
The following subsections illustrate a few cases of general interest for HPC from an end-user perspective.
Native MPI support (MPICH-based)
Sarus comes with a hook able to import native MPICH-based MPI implementations inside the container. This is useful in case the host system features a vendor-specific or high-performance MPI stack based on MPICH (e.g. Intel MPI, Cray MPT, MVAPICH) which is required to fully leverage a high-speed interconnect.
To take advantage of this feature, the MPI installed in the container (and dynamically linked to your application) needs to be ABI-compatible with the MPI on the host system. Taking as an example the Piz Daint Cray XC50 supercomputer at CSCS, to best meet the required ABI compatibility we recommend that the container application uses one of the following MPI implementations:
MPICH v3.1.4 (Feburary 2015)
MVAPICH2 2.2 (September 2016)
Intel MPI Library 2017 Update 1
The following is an example Dockerfile to create a Debian image with MPICH 3.1.4:
FROM debian:jessie
RUN apt-get update && apt-get install -y \
build-essential \
wget \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN wget -q http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz \
&& tar xf mpich-3.1.4.tar.gz \
&& cd mpich-3.1.4 \
&& ./configure --disable-fortran --enable-fast=all,O3 --prefix=/usr \
&& make -j$(nproc) \
&& make install \
&& ldconfig \
&& cd .. \
&& rm -rf mpich-3.1.4 \
&& rm mpich-3.1.4.tar.gz
Note
Applications that are statically linked to MPI libraries will not work with the native MPI support provided by the hook.
Once the system administrator has configured the hook, containers with native
MPI support can be launched by passing the --mpi
option to the
sarus run command, e.g.:
$ srun -N 16 -n 16 sarus run --mpi <repo name>/<image name> <mpi_application>
If multiple hooks are configured in the system, the --mpi-type
option of
sarus run can be used to choose a specific hook, for example:
$ srun -N 16 -n 16 sarus run --mpi-type=mpich-libfabric <repo name>/<image name> <mpi_application>
When multiple hooks are configured, the system administrator has the capability
to define one of them as the default, which could be accessed through just the
--mpi
option. This provides compatibility with workflows which do not use
the --mpi-type
option for reasons of portability or legacy.
The MPI-related hooks configured on the system can be listed through the
--mpi
option of the sarus hooks command, for example:
$ sarus hooks --mpi
NAME MPI TYPE
05-mpich-hook ^mpich$ (default)
051-mpich-ofi-hook ^mpich-libfabric$
If the value passed to the sarus run --mpi-type
option matches one of the
regular expressions under the MPI TYPE
column, the corresponding hook is
enabled. The (default)
qualifier alongside an MPI type indicates the
default MPI hook for the Sarus installation, which can be enabled just with
the sarus run --mpi
option.
NVIDIA GPU support
NVIDIA provides access to GPU devices and their driver stacks inside OCI containers through the NVIDIA Container Toolkit hook .
When Sarus is configured to use this hook, the GPU devices to be made
available inside the container can be selected by setting the
CUDA_VISIBLE_DEVICES
environment variable in the host system. Such action is
often performed automatically by the workload manager or other site-specific
software (e.g. the SLURM workload manager sets CUDA_VISIBLE_DEVICES
when
GPUs are requested via the Generic Resource Scheduling plugin). Be sure to check the setup provided by
your computing site.
The container image needs to include a CUDA runtime that is suitable for both the target container applications and the available GPUs in the host system. One way to achieve this is by basing your image on one of the official Docker images provided by NVIDIA, i.e. the Dockerfile should start with a line like this:
FROM nvidia/cuda:8.0
Note
To check all the available CUDA images provided by NVIDIA, visit https://hub.docker.com/r/nvidia/cuda/
When developing GPU-accelerated images on your workstation, we recommend using nvidia-docker to run and test containers using an NVIDIA GPU.
SSH connection within containers
Sarus also comes with a hook which enables support for SSH connections within containers, leveraging the Dropbear SSH software.
When Sarus is configured to use this hook, before attempting SSH connections
to/from containers, the sarus ssh-keygen
command must be run in order to
generate the keys that will be used by the SSH daemons and the SSH clients
inside containers.
It is sufficient to generate the keys just once, as they are persistent between sessions.
It is then possible to execute a container passing the --ssh
option to
sarus run, e.g. sarus run --ssh <image> <command>
. Using the
previously generated the SSH keys, the hook instantiates an SSH daemon and sets
up a custom ssh
binary inside each container created with the same command.
Within a container spawned with the --ssh
option, it is possible to connect
into other containers by simply issuing the ssh
command available in the
default search PATH
. E.g.:
ssh <hostname of other node>
The custom ssh
binary takes care of using the proper keys and
non-standard port in order to connect to the remote container.
When the ssh
program is called without a command argument, it opens
a login shell into the remote container. In this situation, the SSH hook attempts
to reproduce the environment variables which were defined upon the launch
of the remote container. The aim is to replicate the experience of actually
accessing a shell in the container as it was created.
The hook supports the annotation com.hooks.ssh.authorize_ssh_key
, which allows
the user to add a public key to the container's authorized_keys
file, e.g.
sarus run --ssh --annotation com.hooks.ssh.authorize_ssh_key=$HOME/.ssh/<key>.pub <image>
Notice that the annotation value must be a public key file, not the public key itself. The annotation allows remote access via SSH to the running container through user-specified (and potentially ephemeral) keys.
The com.hooks.ssh.pidfile_container
annotation allows the user to define the
Dropbear daemon PIDfile inside the container.
The com.hooks.ssh.pidfile_host
annotation can be used to copy the PIDfile of the
Dropbear daemon in the host.
The com.hooks.ssh.port
annotation can be used to set an arbitrary port for the Dropbear server
and client, overriding the value from the default hook configuration.
Warning
The SSH hook currently does not implement a poststop functionality and
requires the use of a private PID namespace for
the container in order to cleanup the Dropbear daemon.
Thus, the --ssh
option of sarus run implies --pid=private
,
and is incompatible with the use of --pid=host
.
OpenMPI communication through SSH
The MPICH-based MPI hook described above does not support OpenMPI libraries. As an alternative, OpenMPI programs can communicate through SSH connections created by the SSH hook.
To run an OpenMPI program using the SSH hook, we need to manually provide a list
of hosts and explicitly launch mpirun
only on one node of the allocation.
We can do so with the following commands:
salloc -C gpu -N4 -t5
srun hostname > $SCRATCH/hostfile
srun sarus run --ssh \
--mount=src=/users,dst=/users,type=bind \
--mount=src=$SCRATCH,dst=$SCRATCH,type=bind \
ethcscs/openmpi:3.1.3 \
bash -c 'if [ $SLURM_PROCID -eq 0 ]; then mpirun --hostfile $SCRATCH/hostfile -npernode 1 /openmpi-3.1.3/examples/hello_c; else sleep 10; fi'
Upon establishing a remote connection, the SSH hook provides a $PATH
covering the
most used default locations: /usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin
.
If your OpenMPI installation uses a custom location, consider using an absolute
path to mpirun
and the --prefix
option as advised in the
official OpenMPI FAQ.
Glibc replacement
Sarus's source code includes a hook able to inject glibc libraries from the host inside the container, overriding the glibc of the container.
This is useful when injecting some host resources (e.g. MPI libraries) into the container and said resources depend on a newer glibc than the container's one.
The host glibc stack to be injected is configured by the system administrator.
If Sarus is configured to use this hook, the glibc replacement can be activated
by passing the --glibc
option to sarus run. Since native MPI
support is the most common occurrence of host resources injection, the hook is
also implicitly activated when using the --mpi
option.
Even when the hook is configured and activated, the glibc libraries in the container will only be replaced if the following conditions apply:
the container's libraries are older than the host's libraries;
host and container glibc libraries have the same soname and are ABI compatible.
Running MPI applications without the native MPI hook
The MPI replacement mechanism controlled by the --mpi
option is not mandatory to run distributed applications in Sarus containers. It
is possible to run containers using the MPI implementation embedded in the image,
foregoing the performance of custom high-performance hardware.
This can be useful in a number of scenarios:
the software stack in the container should not be altered in any way
non-performance-critical testing
impossibility to satisfy ABI compatibility for native hardware acceleration
Sarus can be launched as a normal MPI program, and the execution context will be propagated to the container application:
mpiexec -n <number of ranks> sarus run <image> <MPI application>
The important aspect to consider is that the process management system from the host must be able to communicate with the MPI libraries in the container.
MPICH-based MPI implementations by default use the Hydra process manager and the PMI-2 interface to communicate between processes. OpenMPI by default uses the OpenRTE (ORTE) framework and the PMIx interface, but can be configured to support PMI-2.
Note
Additional information about the support provided by PMIx for containers and cross-version use cases can be found here: https://openpmix.github.io/support/faq/how-does-pmix-work-with-containers
As a general rule of thumb, corresponding MPI implementations (e.g. using an
MPICH-compiled mpiexec
on the host to launch Sarus containers featuring
MPICH libraries) should work fine together. As mentioned previously, if an
OpenMPI library in the container has been configured to support PMI, the
container should also be able to communicate with an MPICH-compiled mpiexec
from the host.
The following is a minimal Dockerfile example of building OpenMPI 4.0.2 with PMI-2 support on Ubuntu 18.04:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y \
build-essential \
ca-certificates \
automake \
autoconf \
libpmi2-0-dev \
wget \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.2.tar.gz \
&& tar xf openmpi-4.0.2.tar.gz \
&& cd openmpi-4.0.2 \
&& ./configure --prefix=/usr --with-pmi=/usr/include/slurm-wlm --with-pmi-libdir=/usr/lib/x86_64-linux-gnu CFLAGS=-I/usr/include/slurm-wlm \
&& make -j$(nproc) \
&& make install \
&& ldconfig \
&& cd .. \
&& rm -rf openmpi-4.0.2.tar.gz openmpi-4.0.2
When running under the Slurm workload manager, the process management interface
can be selected with the --mpi
option to srun
. The following example
shows how to run the OSU point-to-point latency test
from the Sarus cookbook on CSCS' Piz Daint Cray XC50 system without native
interconnect support:
$ srun -C gpu -N2 -t2 --mpi=pmi2 sarus run ethcscs/mpich:ub1804_cuda92_mpi314_osu ./osu_latency
###MPI-3.0
# OSU MPI Latency Test v5.6.1
# Size Latency (us)
0 6.82
1 6.80
2 6.80
4 6.75
8 6.79
16 6.86
32 6.82
64 6.82
128 6.85
256 6.87
512 6.92
1024 9.77
2048 10.75
4096 11.32
8192 12.17
16384 14.08
32768 17.20
65536 29.05
131072 57.25
262144 83.84
524288 139.52
1048576 249.09
2097152 467.83
4194304 881.02
Notice that an --mpi=pmi2
option was passed to srun
but not to
sarus run.
Remote development with Visual Studio Code
Visual Studio Code is a popular Programming IDE that can be configured to edit and test code on remote systems. Remote systems can be accessed via SSH protocol, hence the Sarus SSH hook can be used to extend the IDE to work on remote Sarus container environments.
Follow these steps to configure Visual Studio Code to access remote Sarus instances:
On the remote system start a Sarus container enabling SSH:
Copy your public SSH key on the remote host that will run the container.
<me>@<laptop>:~$ scp ~/.ssh/id_ed25519.pub <remote_user>@<remote_node>:~/.ssh/
Run the remote container with the
--ssh
option and the annotation to authorize your public key.<remote_user>@<remote_node>:~> sarus run --ssh --tty --entrypoint bash \ > --annotation com.hooks.ssh.authorize_ssh_key=$HOME/.ssh/id_ed25519.pub \ > <container_image>
Test that you can now access the remote container via SSH on port 15263.
<me>@<laptop>:~$ ssh -o StrictHostKeyChecking=no -o ControlMaster=no \ > -o UserKnownHostsFile=/dev/null -p 15263 <remote_user>@<remote_node>
Configure Visual Studio Code to access the remote Sarus container:
On Visual Studio Code, install "Remote Development" and "Remote - SSH" extensions.
Click on the bottom left corner symbol on Visual Studio Code.
Select "Connect to Host..." .
Select "+ Add New SSH Host..." .
On the "Enter SSH Connection Command" window, insert the SSH command line that you tested on step 3 .
Select the SSH client configuration file where you prefer to add the host, i.e.
$HOME/.ssh/config
.Now on this file you should find a new entry like:
Host <remote_node> HostName <remote_node> StrictHostKeyChecking no ControlMaster no UserKnownHostsFile /dev/null Port 15263 User <remote_user>
Select "Connect" to connect the IDE to the remote container environment.
Important
In order to establish connections through "Remote - SSH" extension,
the scp
program must be available within the container.
This is required by Visual Studio Code to send and establish the VS Code
Server into the remote container.
For more details about "Remote - SSH" Visual Studio Code extension, you can refer to this tutorial from the official Visual Studio Code documentation.