** This is still experimental! **
nomad-onload
is tooling to integrate Nomad and OpenOnload.
It provides a Nomad Device Plugin that exposes OpenOnload capabilites to Nomad via virtual devices. This enables kernel-bypass of the networking stack of any Docker-driver Nomad Job. In addition to TCP and UDP acceleration, facilities like epoll
and pipes are brought to userspace as well.
After installing the plugin and adding device "onload" {}
to a Nomad Tasks' resources
stanza, that container will become Onload-accelerated! With proper tuning, you can get extreme performance 😎.
Running high-performance kernel-bypass workloads is a vast topic. This High Performance Redis gist includes an introduction to it with Onload and kernel tuning.
- Installation
- Onload Devices
- Timekeeping Devices
- Plugin Configuration
- Tips
- Building
- Motivation
- Roadmap
- Credits and License
Binaries for multiple platforms are released on GitHub.
To install the Nomad Onload Device Plugin on your Nomad Client instance,
copy the nomad-device-onload
binary to the host's Nomad plugin_dir
(e.g. /opt/nomad/data/plugins
).
Then add a plugin
config stanza to your Nomad configuration:
# onload.hcl
plugin "nomad-device-onload" {
config {
# Mount Onload into images
mount_onload = true
}
}
When installed, this plugin will discover an Onload installation and make plugin-based devices available to Nomad Clients. The plugin publishes the following "device types"; which are available depend on whether the host OS has installed Onload and TCPDirect.
Device Type | Onload? | TCPDirect? | Notes |
---|---|---|---|
N | N | Nothing mounted. No devices published, even with SFC hardware | |
onload |
Y | N | Onload mounted, LD_PRELOAD per set_preload config |
zf |
N | Y | Only TCPDirect mounted, LD_PRELOAD skipped |
onloadzf |
Y | Y | Like onload , but TCPDirect is also mounted. |
When a one of those device types, such as onload
is specified in a Nomad Job's resource stanza,
then the plugin will install Onload binaries and libraries and device files into the Task,
and optionally LD_PRELOAD
Onload. Onload performance tuning may be applied via its various EF_
environment variable knobs.
task {
env {
EF_TCP_SERVER_LOOPBACK = "2"
EF_TCP_CLIENT_LOOPBACK = "4"
}
resources {
device "onload" {}
}
}
Here is how Onload devices are fingerprinted:
- If Onload/TCPDirect is not installed, there are no devices available.
- Each "SFC interface" (Solarflare/Xilinx/AMD Network Card) is discovered with Vendor
amd
- If there are no SFC interfaces found, we create a fake one called
none
So if we have both Onload and TCPDirect installed along with two SFC interfces eth0
and eth1
, we'd have the following devices available to a Nomad Client:
amd/onload/eth0
amd/zf/eth0
amd/onloadzf/eth0
amd/onload/eth1
amd/zf/eth1
amd/onloadzf/eth1
Or similarly, with Onload and TCPDirect installed, but without SFC interfaces:
amd/onload/none
amd/zf/none
amd/onloadzf/none
Nomad allows devices to be selected per this device name:
<device_type>
<vendor>/<device_type>
<vendor>/<device_type>/<model>
Thus, by simply specifying the Device Type name onload
, we get the Onload capability. However, the full information can be used in name
, as well as the attributes used in contraint
and affinity
.
If configured with probe_pps
or probe_ptp
, this plugin will also detect devices under /dev/pps*
and /dev/ptp*
. The will be made available as pps
and ptp
device types.
none/pps/<interface>
: for example/none/pps/pps0
none/ptp/<interface>
: for example/none/ptp/ptp1
The following settings are available to configure the plugin behavior, per above.
Devices and libraries are always installed when a nomad-onload
device is used in the resources
stanza. Setting any partiular path to empty string ""
will disable that mount. For example, host_profile_dir_path = ""
will prevent mount profiles.
If mount_onload
is enables mounting of all the files and paths configured below it,
All mounts are read-only.
Name | Type | Default | Description |
---|---|---|---|
set_preload |
bool |
true |
Should the Device Plugin set the LD_PRELOAD environment variable in the Nomad Task? |
mount_onload |
bool |
true |
Should the Device Plugin mount Onload files into the Nomad Task? |
probe_nic |
bool |
true |
|
probe_xdp |
bool |
true |
|
probe_pps |
bool |
true |
|
probe_ptp |
bool |
true |
|
ignored_interfaces |
list(string) |
[] |
List of interfaces to ignore. Include none to prevent that pseudo-devices creation |
num_nic |
number |
false |
10 |
num_pps |
number |
false |
10 |
num_ptp |
number |
false |
10 |
task_device_path |
string |
"/dev" |
Path to place device files in the Nomad Task |
host_device_path |
string |
"/dev" |
Path to find device files on the Host |
task_onload_lib_path |
string |
"/opt/onload/usr/lib64" |
Path to place Onload libraries in the Nomad Task |
host_onload_lib_path |
string |
"/usr/lib64" |
Path to find Onload libraries on the Host |
task_onload_bin_path |
string |
"/opt/onload/usr/bin" |
Path to place Onload binaries in the Nomad Task |
host_onload_bin_path |
string |
"/usr/bin" |
Path to find Onload binaries on the Host |
task_profile_dir_path |
string |
" /usr/libexec/onload/profiles" |
Path to place Onload profile directory in the Nomad Task |
host_profile_dir_path |
string |
" /usr/libexec/onload/profiles" |
Path to find Onload profile directory on the Host |
task_zf_bin_path |
string |
"/opt/onload/usr/bin/" |
Path to place TCPDirect/ZF binaries in the Nomad Task |
host_zf_bin_path |
string |
"/usr/bin" |
Path to find TCPDirect/ZF binaries on the Host |
task_zf_lib_path |
string |
"/opt/onload/usr/bin" |
Path to place TCPDirect/ZF libraries in the Nomad Task |
host_zf_lib_path |
string |
"/usr/lib64" |
Path to find TCPDirect/ZF libraries on the Host |
fingerprint_period |
string |
"1m" |
Period of time between attemps to fingerpint devices |
See the examples directory:
The binary distribution includes nomad-probe-onload
, which scans a system using the same code as nomad-device-onload
:
$ ./bin/nomad-probe-onload
Onload version: 8.1.2.26
TCPDirect version: 8.1.2
Onload hardware-accelerated interfaces:
eth0 0000:b1:00.0
eth1 0000:b1:00.1
XDP hardware-accelerated interfaces: (FAKE, ROADMAP)
PPS devices:
/dev/pps0
PTP devices:
/dev/ptp0
/dev/ptp1
/dev/ptp2
You can run onload_stackdump
inside the container, but you must remove LD_PRELOAD
first:
$ docker exec -it -e LD_PRELOAD= redis-d3e926be-72c4-1940-953b-7e1bbb7a75dd /usr/bin/onload_stackdump lots | head
============================================================
ci_netif_dump_to_logger: stack=6 name=
cplane_pid=142738
namespace=net:[4026531840]
Onload 8.1.2.26 uid=999 pid=1 ns_flags=80
...
Building is performed with Taskfile, creating the following binaries:
nomad-probe-onload
(simple test tool)nomad-device-onload
(the plugin)
$ task
task: [tidy] go mod tidy
task: [tidy] go mod tidy
task: [tidy] go mod tidy
task: [install-deps] go build -o ./bin/launcher github.com/hashicorp/nomad/plugins/shared/cmd/launcher
task: [build-onload-probe] go build -o ./bin/nomad-probe-onload cmd/onload-probe/*.go
task: [build-plugin] go build -o ./bin/nomad-device-onload cmd/nomad-device-onload/*.go
We publish with GitHub Actions and Goreleaser.
When using Onload in containerized environment, all the Onload devices, libraries, and executables need to be present inside the container. Furthermore, the versions of everything need to match exactly between Host and Container.
One way to manage this is to build Onload into your image. Neomantra maintains docker-onload
tooling which create Onload-enabled Docker base images. You can then build your application on top of those base images. If you maintain multiple Onload versions in your cluster, you would need to apply a CI/CD build matrix of all your Dockerfiles and all your Onload versions.
If you wanted to Onload-enable a third-party application, such as Redis, you would need to either build Redis from an Onload base image, or you would need to add the matching binaries/libraries to a new image derived from a Redis image.
Then, when you actually want to run the image, you must hook up /usr/bin/onload
to activate LD_PRELOAD
, and you would need to tell Docker to mount devices like /dev/sfc
. Clearly this is all cumbersome, but it is necessary. Typically, teams build scripts and tooling to manage this complexity.
Cluster Orchestrators can help with this as they manage the control plane and prepare Containers for launch. The Onload team released Kubernetes Onload, which provides aKubernetes Operator and resources for automatically injecting the require enviornment into a Kubernetes Pod.
Here we are with this same capability for HashiCorp Nomad. Simply ask Nomad for an onload
device for any Docker-driver Nomad Job and the plugin will take care of the rest of the plumbing!
- Device Attributes
- Device Statistics
- Redis example
- XDP example
- Expose Nomad-selected interfaces via environment variables, e.g. how does
device "ptp" {}
become/dev/ptp1
inside the config?
Thanks to the Nomad and Onload teams and the organizations that have supported their open collaboration!
Much of the code has been reviewed and adapted from:
Made with ❤️ and 🔥 by Evan Wies.
Copyright (c) 2024 Neomantra BV.
Released under the MIT License, see LICENSE.txt.