The virt driver task plugin expands the types of workloads Nomad can run to add virtual machines. Leveraging on the power of Libvirt, the Virt driver allows the user to define virtual machine tasks using the Nomad job spec.
IMPORTANT: This plugin is in tech preview, still under active development, there might be breaking changes in future releases
: This is an Alpha version still under development
- Use the job's
task.config
to define the cloud image for your virtual machine - Start/stop virtual machines
- Nomad runtime environment is populated
- Use Nomad alloc data in the virtual machine.
- Publish ports
- Monitor the memory consumption
- Monitor CPU usage
- Task config cpu value is used to populate virtual machine CpuShares
- The tasks
task
,alloc
, andsecrets
directories are mounted within the VM at the filesystem root. These are currently mounted read-only (RO), so VMs do not write excessive amounts of data which results in the host filesystem filling. Once correct guardrails are in place, this restriction will be lifted. Please see the filesystem concepts page for more detail about an allocations working directory.
Here is a simple Python server on Ubuntu example:
job "python-server" {
group "virt-group" {
count = 1
network {
mode = "host"
port "http" {
to = 8000
}
}
task "virt-task" {
driver = "nomad-driver-virt"
artifact {
source = "http://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img"
destination = "local/focal-server-cloudimg-amd64.img"
mode = "file"
}
config {
image = "local/focal-server-cloudimg-amd64.img"
primary_disk_size = 10000
use_thin_copy = true
default_user_password = "password"
cmds = ["python3 -m http.server 8000"]
network_interface {
bridge {
name = "virbr0"
ports = ["http"]
}
}
}
resources {
cores = 2
memory = 4000
}
}
}
}
$ nomad job run examples/python.nomad.hcl
==> 2024-09-10T13:01:22+02:00: Monitoring evaluation "c0424142"
2024-09-10T13:01:22+02:00: Evaluation triggered by job "python-server"
2024-09-10T13:01:23+02:00: Evaluation within deployment: "d546f16e"
2024-09-10T13:01:23+02:00: Allocation "db146826" created: node "c20ee15a", group "virt-group"
2024-09-10T13:01:23+02:00: Evaluation status changed: "pending" -> "complete"
==> 2024-09-10T13:01:23+02:00: Evaluation "c0424142" finished with status "complete"
$ virsh list
Id Name State
------------------------------------
4 virt-task-5a6e215e running
In order to be able to build the binary, the libvirt-dev
module is necessary,
use any of the package managers to get it:
sudo apt install libvirt-dev
git clone [email protected]:hashicorp/nomad-driver-virt
cd nomad-driver-virt
make dev
The compiled binary will be located at ./build/nomad-driver-virt
.
- Nomad 1.9.0+
- libvirt-daemon-system
- qemu-utils
Make sure the node where the client will run supports virtualization, in Linux you can do it in a couple of ways:
- Reading the CPU flags:
egrep -o '(vmx|svm)' /proc/cpuinfo
- Reading the kernel modules and looking for the virtualization ones:
lsmod | grep -E '(kvm_intel|kvm_amd)'
If the result is empty for either call, the machine does not support virtualization and the nomad client wont be able to run any virtualization workload.
- Verify permissions:
Nomad
runs as root, add the userroot
and the grouproot
to the QEMU configuration to allow it to execute the workloads. Remember to start the libvirtd daemon if not started yet or to restarted after adding the qemu user/group configuration:
systemctl start libvirtd
or
systemctl restart libvirtd
Ensure that Nomad can find the plugin, see plugin_dir
-
emulator block
- uri - Since libvirt supports many different kinds of virtualization (often referred to as "drivers" or "hypervisors"), it is necessary to use a
uri
to specify which one to use. It defaults to"qemu:///system"
- user - User for the connection authentication.
- password - Password for the connection authentication.
- uri - Since libvirt supports many different kinds of virtualization (often referred to as "drivers" or "hypervisors"), it is necessary to use a
-
data_dir - The plugin will create VM configuration files and intermediate files in this directory. If not defined it will default to
/var/lib/virt
. -
image_paths - Specifies the host paths the QEMU driver is allowed to load images from. If not defined, it defaults to the plugin data_dir directory and alloc directory.
plugin "nomad-driver-virt" {
config {
emulator {
uri = "qemu:///default"
}
data_dir = "/opt/ubuntu/virt_temp"
image_paths = ["/var/local/statics/images/"]
}
}
- image - Path to .img cloud image to base the VM disk on, it should be located in an allowed path. It is very important that the cloud image includes cloud init, otherwise most features will not be available for teh task.
- use_thin_copy - Make a thin copy of the image using qemu, and use it as the backing cloud image for the VM.
- hostname - The hostname to assign which defaults to a short uuid that will be unique to every VM, to avoid clashes when there are multiple instances of the same task running. Since it's used as a network host name, it must be a valid DNS label according to RFC 1123.
- os - Guest configuration for a specific machine and architecture to emulate if they are to be different from the host. Both the architecture and machine have to be available for KVM. If not defined, libvirt will use the same ones as the host machine.
- command - List of commands to execute on the VM once it is running. They can provide the operator with a quick and easy way to start a process on the newly created VM, used in conjunction with the template, it can be a simple yet powerful start up tool.
- default_user_password - Initial password to be configured for the default user on the newly created VM, it will have to be updated on first connect.
- default_user_authorized_ssh_key - SSH public key that will be added to the SSH configuration for the default user of the cloud image distribution.
- user_data - Path to a cloud-init compliant user data file to be used as the user-data for the cloud-init configuration.
- primary_disk_size - Disk space to assign to the VM, bear in mind it will fit the VM's OS.
Regarding the resources, currently the driver has support for cpuSets or cores and memory.
Every core will be treated as a vcpu.
Do not use resources.cpus
, they will be ignored.
driver = "nomad-driver-virt"
artifact {
source = "http://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img"
destination = "local/focal-server-cloudimg-amd64.img"
mode = "file"
}
config {
image = "local/focal-server-cloudimg-amd64.img"
primary_disk_size = 9000
use_thin_copy = true
default_user_password = "password"
cmds = ["touch /home/ubuntu/file.txt"]
default_user_authorized_ssh_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIC31v1..."
}
The following configuration options are available within the task's driver configuration block:
- network_interface - A list of network interfaces that should be attached to the VM. This
currently only supports a single entry.
- bridge - Attach the VM to a network of bridged type.
virbr0
, the default libvirt network is a bridged network.- name - The name of the bridge interface to use. This relates to the output seen from
commands such as
virsh net-info
. - ports - A list of port labels which will be exposed on the host via mapping to the network interface. These labels must exist within the job specification network block.
- name - The name of the bridge interface to use. This relates to the output seen from
commands such as
- bridge - Attach the VM to a network of bridged type.
The example below shows the network configuration and task configuration required to expose and map
ports 22
and 80
:
group "virt-group" {
network {
mode = "host"
port "ssh" {
to = 22
}
port "http" {
to = 80
}
}
task "virt-task" {
driver = "nomad-driver-virt"
config {
network_interface {
bridge {
name = "virbr0"
ports = ["ssh", "http"]
}
}
}
}
}
Exposed ports and services can make use of the existing service block, so that registrations can be performed using the specified backend provider.
Make sure the node supports virtualization.
# Build the task driver plugin
make dev
# Copy the build nomad-driver-plugin executable to the plugin dir
cp ./build/nomad-driver-virt - /opt/nomad/plugins
# Start Nomad
nomad agent -config=examples/server.nomad.hcl 2>&1 > server.log &
# Run the client as sudo
sudo nomad agent -config=examples/client.nomad.hcl 2>&1 > client.log &
# Run a job
nomad job run examples/job.nomad.hcl
# Verify
nomad job status virt-example
virsh list
If running a job for the first time, you run into errors, remember to verify the runtime Runtime dependencies.
It is important to know that to protect the host machine from guests overusing the disk, managed vm don't have write access to the Nomad filesystem.
If Nomad is not running as root, the permissions for the directories used by both Nomad and the virt driver need to be adjusted.
Once the vm is running things still don't go as plan and extra tools are necessary to find the problem. Here are some strategies to debug a failing VM:
By default, cloud images are password protected, by adding a default_user_password
a new password is assigned to the default user of the used distribution (for example,
ubuntu
for ubuntu fedora
for fedora, or root
for alpine)
By running virsh console [vm-name]
, a terminal is started inside the VM that will allow an internal inspection of the VM.
$ virsh list
Id Name State
------------------------------------
1 virt-task-8bc0a63f running
$ virsh console virt-task-8bc0a63f
Connected to domain 'virt-task-8bc0a63f'
Escape character is ^] (Ctrl + ])
nomad-virt-task-8bc0a63f login: ubuntu
Password:
If no login prompt shows up, it can mean the virtual machine is not booting and adding some extra space to the disk may solve the problem. Remember the disk has to fit the root image plus any other process running in the VM.
The virt driver heavily relies on cloud-init
to execute the virtual machine's
configuration. Once you have managed to connect to the terminal, the results of
cloud init can be found in two different places:
/var/log/cloud-init.log
/var/log/cloud-init-output.log
Looking into these files can give a better understanding of any possible execution errors.
If connecting to the terminal is not an option, it is possible to stop the job and
mount the VM's disk to inspect it. If the use_thin_image
option is used, the driver will create
the disk image in the directory ${plugin_config.data_dir}/virt/vm-name.img
:
# Find the virtual machine disk image
$ ls /var/lib/virt
virt-task-8bc0a63f.img
# Enable Network Block Devices on the Host
modprobe nbd max_part=8
# Connect the disk as network block device
qemu-nbd --connect=/dev/nbd0 '/var/lib/virt/virt-task-dc8187e3.img'
# Find The Virtual Machine Partitions
fdisk /dev/nbd0 -l
# Mount the partition from the VM
mount /dev/nbd0p1 /mnt/somepoint/
Important Don't forget to unmount the disk after finishing:
umount /mnt/somepoint/
qemu-nbd --disconnect /dev/nbd0
rmmod nbd
For networking, the plugin leverages on the libvirt default network default
:
$ virsh net-list
Name State Autostart Persistent
--------------------------------------------
default active yes yes
Under the hood, libvirt uses dnsmasq to lease IP addresses to the virtual machines, there are mutiple ways to find the IP assigned to the nomad task. Using virsh to find the leased IP:
$ virsh net-dhcp-leases default
Expiry Time MAC address Protocol IP address Hostname Client ID or DUID
----------------------------------------------------------------------------------------------------------------------------------------------------------------
2024-10-07 18:48:09 52:54:00:b5:0b:d4 ipv4 192.168.122.211/24 nomad-virt-task-dc8187e3 ff:08:24:45:0e:00:02:00:00:ab:11:63:3c:26:5b:b7:fe:b3:13
or using the mac address to find the IP via ARP:
$ virsh dumpxml virt-task-8473ccfb | grep "mac address" | awk -F\' '{ print $2}'
52:54:00:b5:0b:d4
$ arp -an | grep 52:54:00:b5:0b:d4
? (192.168.122.211) at 52:54:00:b5:0b:d4 [ether] on virbr0