Skip to content

Commit

Permalink
Revamp Testing Infrastructure and run Multi-Kernel Tests in CI (#111)
Browse files Browse the repository at this point in the history
* Fix invalid enum relocation

* Improve error logging on JSON unmarshal failure

EventsTrace was going into a print "\n" loop on a misconfigured kernel,
which was hard to diagnose with the current error logging.

* Fix broken probe_set_features logic

See comment, libbpf_probe_bpf_prog_type will return true on both aarch64
and x86_64 as it checks for the ability to _load_, not the ability to
_attach_, which is what will fail on aarch64.

* Rework multi-kernel-tester and run in CI

- Move the core initramfs and init process logic to Bluebox
- Set up the whole thing to run in CI with mainline kernels v5.11-v5.18
- Redo testing/README.md for new changes

* Add debug build target

* Clarify format / test-format targets, format code

* Disable fail-fast in multikernel test action

* Clarify and cleanup formatting/build CI workflow

* Dump contents of tracefs trace file on test fail

This file contains the result of all bpf_prink's, which often
encompasses a wealth of information when probes fail to do something.

* Fix typo in README.md

Co-authored-by: Nicholas Berlin <[email protected]>

* Fix test failures on Linux 5.11/aarch64

We were attempting to dereference a struct dentry into a struct dentry
*, which resulted in garbage data. Add FUNC_ARG_READ_PTREGS_NODEREF
to skip the dereference step and use it to fix the test failures.

* Fix broken build-debug target in Makefile

Additionally, remove nonexistent .PHONY targets and simplify build-debug
by removing _internal-build-debug target.

* Remove -DENABLE_BPF_PRINTK from CI builds

Enabling bpf_printk has a huge performance penalty in the emulated VMs
running in CI. Running aarch64 tests on Linux 5.11 without bpf_printk
takes ~1m30s while the same run without bpf_printk takes ~2m10s. While
this is larger than I'd expect, I can see it happening on an emulated
VM, every bpf_printk call corresponds to a lot of extra instructions
that have to be emulated.

Disable the flag. This reduces the debug info present in CI artifacts,
but if tests fail, devs should be able to run the failing test locally
with -DENABLE_BPF_PRINTK to get debug output.

* Improve logging on test failure

- Log stacktrace to stdout so logs are all serialized
- Log banner warning that the trace file is always empty on CI
    - I can see this eating a day of a dev's time otherwise :P

* Fix BPF tramp detection, add multi-kernel tests

This logic should properly detect a lack of BPF trampoline support on
x86 kernels that don't have it enabled (unlike the previous simple arch
check).

It's also incredibly subject to change in a new kernel release (what if
taskstats_exit dissapears?), so add a multi-kernel test for feature
probing. This should be nicely extensible to other features in the
future.

* Document unintuitive API

* Add comment to TestFeaturesCorrect RE: x86

* Dockerize code formatting Makefile target

Differences in clang-format versions were causing a headache here.
clang-format 13 was fine with the code as-is, but clang-format 11 would
make this change:

diff --git a/non-GPL/TcLoader/TcLoader.c b/non-GPL/TcLoader/TcLoader.c
index 9a8bc0c..1ed7c1d 100644
--- a/non-GPL/TcLoader/TcLoader.c
+++ b/non-GPL/TcLoader/TcLoader.c
@@ -371,10 +371,10 @@ static int netlink_qdisc(int cmd, unsigned int flags, const char *ifname)
     int rv                            = -1;
     struct rtnetlink_handle qdisc_rth = {.fd = -1};
     struct netlink_msg qdisc_req      = {
-        .n.nlmsg_len   = NLMSG_LENGTH(sizeof(struct tcmsg)),
-        .n.nlmsg_flags = NLM_F_REQUEST | flags,
-        .n.nlmsg_type  = cmd,
-        .t.tcm_family  = AF_UNSPEC,
+             .n.nlmsg_len   = NLMSG_LENGTH(sizeof(struct tcmsg)),
+             .n.nlmsg_flags = NLM_F_REQUEST | flags,
+             .n.nlmsg_type  = cmd,
+             .t.tcm_family  = AF_UNSPEC,
     };

     if (!ifname) {

This seems to be a result of the AlignConsecutiveAssignments directive in
.clang-format (removing it makes the two versions agree). Dockerize the format
step to cleanly solve the problem for all such cases in the future.

* Fix incorrect docker tag s/_TAG/_PULL_TAG/g

Should be using the remote instead of the local.

* Fix clang-format not running in a container

find was running in the container, clang-format was not due to how the
command was being interpreted aargh.

* Fix incorrect script args in comment

Co-Authored-By: Florian Lehner <[email protected]>

* Remove time estimate in README

Co-Authored-By: Florian Lehner <[email protected]>

* Cleanup run_tests.sh arguments

-d really should have allowed users to pass individual images if they
want something more specific tested. Also remove old information from
usage text and update for new -k arg.

* Add note about debootstrap to builder script

Co-Authored-By: Florian Lehner <[email protected]>

* Fix improper getopts usage

An option cannot have > 1 argument. Just put the list kernel images at
the end (no specific flag required).

* Fix incorrect variable name

* Fix faulty bash list logic

* Change BPFTOOL_VERSION to LINUX_TOOLS_VERSION

Co-Authored-By: Florian Lehner <[email protected]>

* Add gen_initramfs.sh script, update debug docs

* Greatly clean up bash scripts

- Use readonly/local
- Make conditionals more concise with || and &&
- Pull everything into functions, no naked code
- Standardize on lowercase for local vars
- Make command line arg format consistent
- Rename run_single_test.sh to invoke_qemu.sh, pull out all test logic
  for better separation of concerns
      - Update README.md to suggest usage of invoke_qemu.sh instead of
        directly invoking QEMU
- Add ~/go/bin to PATH in bash instead of workflow YAML
- Greatly clean up GNU parallel invocation

* Fix caching logic in GH actions workflow

Co-Authored-By: Lovel Rishi <[email protected]>

* Fix inspecific restore key

We could have restored the set of aarch64 kernels in the x86 tests with
the workflow as it was.

* Remove reduntant apt-get update

Co-Authored-By: Lovel Rishi <[email protected]>

* Cleanup Makefile

- Add tag-container target and cleanup / separate docker related vars
- Remove DOCKER_IMG_UBUNUT_VERSION

* Fix broken test-format target

* Add section to README.md on userspace debugging

Co-Authored-By: Florian Lehner <[email protected]>

* Add missing arg to gen_initramfs.sh help

* Remove KVM section from README.md

We're not currently using it in CI, and all dev work on my end has
been done without it. We can discuss enabling it if/when we start
using a custom runner, but for now, remove the clutter.

* Fixups to scripts/invoke_qemu.sh

- Make KVM probing an option, turned off by default (it seems to break
  debugging on my machine, and provides no real benefit anyways nested
  in a VBox VM)
- Only provide one -append flag to QEMU, it will ignore all but the
  last, causing the -d option to not work

* Add missing -o to find invocation in Makefile

Co-authored-by: Nicholas Berlin <[email protected]>
Co-authored-by: Florian Lehner <[email protected]>
Co-authored-by: Lovel Rishi <[email protected]>
  • Loading branch information
4 people committed Jul 21, 2022
1 parent a7e8fab commit e9e5d74
Show file tree
Hide file tree
Showing 26 changed files with 1,219 additions and 713 deletions.
34 changes: 0 additions & 34 deletions .github/workflows/ci.yml

This file was deleted.

47 changes: 47 additions & 0 deletions .github/workflows/formatting-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Formatting and Build

on:
push:
branches:
- main
pull_request:

jobs:
formatting-and-build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Test code formatting
run: make test-format
- name: Cache hash
run: echo "GITHUB_CACHE_HASH=$(md5sum Makefile docker/Dockerfile.builder | md5sum | awk '{print $1}')" >> $GITHUB_ENV
- name: Setup docker cache
uses: actions/cache@v3
with:
path: ~/.cache/docker
key: ${{ env.GITHUB_CACHE_HASH }}-v0
restore-keys: |
${{ env.GITHUB_CACHE_HASH }}-v0
- name: Configure Docker
run: |
sudo systemctl stop docker
sudo rm -rf /var/lib/docker
sudo mkdir -p /var/lib/docker
sudo mkdir -p ~/.cache/docker
sudo chown -fR root:root ~/.cache/docker
sudo mount --rbind ~/.cache/docker /var/lib/docker
sudo systemctl start docker
- name: Test build
run: make build
- name: Test for source differences post-build
run: git diff --exit-code
- name: Test container image build
run: make container
- name: Fix permissions (last step)
run: |
docker system prune -f
sudo systemctl stop docker
sudo umount /var/lib/docker
sudo chown -R $USER:$USER ~/.cache/docker
sudo rm -rf ~/.cache/docker/volumes/backingFsBlockDev
sudo find ~/.cache/docker -name work | sudo xargs chmod -R 700
75 changes: 75 additions & 0 deletions .github/workflows/multikernel-tester.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: Multi Kernel Testing
on: [ pull_request ]

jobs:
multikernel_tester:
strategy:
# If a failure occurs, run the other arches/distros to the end. It's useful to see if it
# occurs on other kernels as well
fail-fast: false
matrix:
kernel_flavor: [ mainline ]
arch: [ x86_64, aarch64 ]
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Build eBPF probes and userspace components
run: make build ARCH=${{ matrix.arch }}
- name: Auth GCP
uses: 'google-github-actions/auth@v0'
with:
credentials_json: '${{ secrets.ACTIONS_GCP_JSON_CREDENTIALS }}'
- name: 'Setup gcloud'
uses: 'google-github-actions/setup-gcloud@v0'
- name: Create kernel images directory
run: |
sudo mkdir -p /kernel-images
sudo chown -fR $USER:$USER /kernel-images
- name: Get current timestamp for GCS cache key
run: echo "TIMESTAMP=$(date +%s)" >> $GITHUB_ENV
- name: Setup GCS cache
id: cache
uses: actions/cache@v3
with:
path: /kernel-images
key: gcs-cache-${{ matrix.kernel_flavor }}-${{ matrix.arch }}-${{ env.TIMESTAMP }}
restore-keys: gcs-cache-${{ matrix.kernel_flavor }}-${{ matrix.arch }}
- name: Rsync kernel images from GCS
run: gsutil -m rsync -d -r gs://ebpf-ci-kernel-images/${{ matrix.kernel_flavor }}/${{ matrix.arch }}/images /kernel-images/
- name: Install packages needed for testing
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
gcc-aarch64-linux-gnu \
libc6-dev-arm64-cross \
parallel \
qemu-system-x86 \
qemu-system-arm
env:
DEBIAN_FRONTEND: noninteractive
- name: Install Go
uses: actions/setup-go@v3
with:
go-version: '1.17'
- name: Install Bluebox
run: go install github.com/florianl/bluebox@b8590fb1850f56df6e6d7786931fcabdc1e9173d
- name: Run tests
working-directory: testing
run: |
./run_tests.sh \
${{ matrix.arch }} \
../artifacts-${{ matrix.arch }}/non-GPL/EventsTrace/EventsTrace \
/kernel-images/*
- name: Archive test summary
if: always()
uses: actions/upload-artifact@v3
with:
name: run-summary-${{ matrix.kernel_flavor }}-${{ matrix.arch }}.txt
path: testing/bpf-check-summary.txt
- name: Archive individual test results
if: always()
uses: actions/upload-artifact@v3
with:
name: results-${{ matrix.kernel_flavor }}-${{ matrix.arch }}
path: testing/results
2 changes: 1 addition & 1 deletion GPL/EventProbe/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

# BPF program

if (CMAKE_BUILD_TYPE STREQUAL Debug)
if (CMAKE_BUILD_TYPE STREQUAL Debug OR ENABLE_BPF_PRINTK)
set(BPF_DEBUG_TRACE 1)
else()
set(BPF_DEBUG_TRACE 0)
Expand Down
4 changes: 2 additions & 2 deletions GPL/EventProbe/File/Probe.bpf.c
Original file line number Diff line number Diff line change
Expand Up @@ -329,12 +329,12 @@ int BPF_KPROBE(kprobe__vfs_rename)
new_dentry = rd.new_dentry;
} else {
/* Dentries are accessible from ctx */
err = FUNC_ARG_READ_PTREGS(old_dentry, vfs_rename, old_dentry);
err = FUNC_ARG_READ_PTREGS_NODEREF(old_dentry, vfs_rename, old_dentry);
if (err) {
bpf_printk("kprobe__vfs_rename: error reading old_dentry\n");
return 0;
}
err = FUNC_ARG_READ_PTREGS(new_dentry, vfs_rename, new_dentry);
err = FUNC_ARG_READ_PTREGS_NODEREF(new_dentry, vfs_rename, new_dentry);
if (err) {
bpf_printk("kprobe__vfs_rename: error reading new_dentry\n");
return 0;
Expand Down
50 changes: 50 additions & 0 deletions GPL/EventProbe/Helpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,56 @@
_ret; \
})

/*
* Reads the specified argument from struct pt_regs without dereferencing it
* (unlike FUNC_ARG_READ_PTREGS) (i.e. we get a pointer to the argument, not
* the argument itself). Note that we first have to read the value in struct
* pt_regs into a volatile temporary (_dst). Without this, LLVM can generate
* code like the following, which will fail to verify:
*
* r3 = 8 # The register value we want to read is at offset 8 in the context
* r2 = r1 # r1 = ctx pointer
* r2 += r3 # Increment ctx ptr to register value we're interested in
* r3 = *(u64 *)(r2 +0) # Dereference it (fail)
* dereference of modified ctx ptr R2 off=8 disallowed
*
* The verifier disallows dereferencing the context pointer when it's been
* modified. This will often happen as an inlining optimization if dst is
* immediately passed into a function. We instead want code like the following
* to be generated:
*
* r2 = r1 # r1 = ctx pointer
* r3 = *(u64 *)(r2 + 8) # Dereference it, putting the increment in the dereference insn
* ...pass r3 to a function
*/
#define FUNC_ARG_READ_PTREGS_NODEREF(dst, func, arg) \
({ \
int ret = 0; \
volatile typeof(dst) _dst; \
switch (arg__##func##__##arg##__) { \
case 0: \
_dst = (typeof(dst))PT_REGS_PARM1(ctx); \
break; \
case 1: \
_dst = (typeof(dst))PT_REGS_PARM2(ctx); \
break; \
case 2: \
_dst = (typeof(dst))PT_REGS_PARM3(ctx); \
break; \
case 3: \
_dst = (typeof(dst))PT_REGS_PARM4(ctx); \
break; \
case 4: \
_dst = (typeof(dst))PT_REGS_PARM5(ctx); \
break; \
default: \
ret = -1; \
}; \
dst = _dst; \
barrier(); \
ret; \
})

#define FUNC_ARG_READ_PTREGS(dst, func, arg) \
({ \
int ret = 0; \
Expand Down
16 changes: 15 additions & 1 deletion GPL/EventProbe/PathResolver.h
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,21 @@ static void ebpf_resolve_kernfs_node_to_string(char *buf, struct kernfs_node *kn

static void ebpf_resolve_pids_ss_cgroup_path_to_string(char *buf, const struct task_struct *task)
{
struct kernfs_node *kn = BPF_CORE_READ(task, cgroups, subsys[pids_cgrp_id], cgroup, kn);
/*
* Since pids_cgrp_id is an enum value, we need to get it at runtime as it
* can change kernel-to-kernel depending on the kconfig or possibly not be
* enabled at all.
*/
int cgrp_id;
if (bpf_core_enum_value_exists(enum cgroup_subsys_id, pids_cgrp_id)) {
cgrp_id = bpf_core_enum_value(enum cgroup_subsys_id, pids_cgrp_id);
} else {
/* Pids cgroup is not enabled on this kernel */
buf[0] = '\0';
return;
}

struct kernfs_node *kn = BPF_CORE_READ(task, cgroups, subsys[cgrp_id], cgroup, kn);
ebpf_resolve_kernfs_node_to_string(buf, kn);
}

Expand Down
86 changes: 63 additions & 23 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,39 +1,79 @@
ARCH ?= $(shell arch)
BUILD_DIR ?= artifacts-${ARCH}
SUDO ?=
PWD ?= $(shell pwd)
CONTAINER_RUNTIME ?= docker
DOCKER_IMG_UBUNTU_VERSION ?= jammy
BUILDER_PULL_TAG ?= us-docker.pkg.dev/elastic-security-dev/ebpf-public/builder:20220620-0715
BUILDER_TAG ?= us-docker.pkg.dev/elastic-security-dev/ebpf-public/builder:${USER}-latest
C_INCLUDE_PATH ?=
DOCKER_CACHE ?=

DOCKER_IMAGE = us-docker.pkg.dev/elastic-security-dev/ebpf-public/builder
DOCKER_PULL_TAG = 20220711-1742
DOCKER_LOCAL_TAG = ${USER}-latest
CURRENT_DATE_TAG = $(shell date +%Y%m%d-%H%M)

.PHONY = build build-local clean container fix-permissions format test-format
PWD = $(shell pwd)
BUILD_DIR = artifacts-${ARCH}
CMAKE_FLAGS = -DARCH=${ARCH} -DBUILD_STATIC_EVENTSTRACE=True -DUSE_BUILTIN_VMLINUX=True -B${BUILD_DIR} -S${PWD}

container:
${CONTAINER_RUNTIME} build ${DOCKER_CACHE} -t ${BUILDER_TAG} --build-arg PULL_TAG=${DOCKER_IMG_UBUNTU_VERSION} -f docker/Dockerfile.builder .
# Directories to search recursively for c/cpp source files to clang-format
FORMAT_DIRS = GPL/ non-GPL/ testing/test_bins

.PHONY = build build-debug _internal-build clean container format test-format

# Kludge to get around a missing header. If we don't do this, we'll get the following error when
# building:
#
# In file included from /home/vagrant/ebpf/contrib/libbpf/include/uapi/linux/bpf.h:11:
# In file included from /home/vagrant/ebpf/contrib/libbpf/include/linux/types.h:8:
# In file included from /usr/lib/llvm-14/lib/clang/14.0.0/include/stdint.h:52:
# /usr/include/stdint.h:26:10: fatal error: 'bits/libc-header-start.h' file not found
# include <bits/libc-header-start.h>
# ^~~~~~~~~~~~~~~~~~~~~~~~~~
#
# The HostIsolation probes include linux/bpf.h (copied into the libbpf repo) which includes
# linux/types.h (also copied into the libbpf repo) which includes stdint.h. The clang stdint.h
# includes bits/libc-header-start.h which is not in our include path. The correct one to use
# depends on which arch we're compiling for.
ifeq ($(ARCH),x86_64)
export C_INCLUDE_PATH = /usr/include/x86_64-linux-gnu
else
export C_INCLUDE_PATH = /usr/aarch64-linux-gnu/include
endif

build-local:
mkdir -p ${BUILD_DIR}
C_INCLUDE_PATH=${C_INCLUDE_PATH} cmake -DUSE_BUILTIN_VMLINUX=True -B${BUILD_DIR} -S${PWD}
C_INCLUDE_PATH=${C_INCLUDE_PATH} make -C${BUILD_DIR}
export CC=${ARCH}-linux-gnu-gcc
export CXX=${ARCH}-linux-gnu-g++
export AR=${ARCH}-linux-gnu-ar
export LD=${ARCH}-linux-gnu-ld

build:
docker run --rm -v${PWD}:${PWD} -w${PWD} ${BUILDER_PULL_TAG}
docker run --rm -v${PWD}:${PWD} -w${PWD} ${DOCKER_IMAGE}:${DOCKER_PULL_TAG} \
/usr/bin/env make _internal-build ARCH=${ARCH} EXTRA_CMAKE_FLAGS=${EXTRA_CMAKE_FLAGS}
sudo chown -fR ${USER}:${USER} ${BUILD_DIR}
@echo "\n++ Build Successful at `date` ++\n"

fix-permissions:
# Convenience target to pass -DCMAKE_BUILD_TYPE=Debug and -DCMAKE_C_FLAGS="-g -O0"
build-debug:
docker run --rm -v${PWD}:${PWD} -w${PWD} ${DOCKER_IMAGE}:${DOCKER_PULL_TAG} \
/usr/bin/env make _internal-build ARCH=${ARCH} EXTRA_CMAKE_FLAGS='-DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_FLAGS="-g -O0"'
sudo chown -fR ${USER}:${USER} ${BUILD_DIR}
@echo "\n++ Build Successful at `date` ++\n"

_internal-build:
mkdir -p ${BUILD_DIR}/
cmake ${EXTRA_CMAKE_FLAGS} ${CMAKE_FLAGS}
make -C${BUILD_DIR} -j$(shell nproc)

container:
docker build -t ${DOCKER_LOCAL_TAG} -f docker/Dockerfile.builder .

tag-container:
docker tag ${DOCKER_IMAGE}:${DOCKER_LOCAL_TAG} ${DOCKER_IMAGE}:$CURRENT_DATE_TAG
@echo "\n++ Tagged image as ${DOCKER_IMAGE}:${CURRENT_DATE_TAG} ++\n"

# We dockerize code formatting because differences in clang-format versions can
# lead to different formatting decisions. This way, everyone is using
# clang-format 14 (default in the Ubuntu jammy repos).
format:
find . \( -path ./contrib -o -path ./artifacts* \) -prune \
-o -name "*.c" -o -name "*.cpp" -o -name "*.h" -print | xargs /usr/bin/env clang-format -i
docker run --rm -v${PWD}:${PWD} -w${PWD} ${DOCKER_IMAGE}:${DOCKER_PULL_TAG} \
sh -c 'find ${FORMAT_DIRS} -name "*.cpp" -o -name "*.c" -o -name "*.h" -o -name "*.cpp" | xargs /usr/bin/env clang-format -i'

test-format:
find . \( -path ./contrib -o -path ./artifacts* \) -prune \
-o -name "*.c" -o -name "*.cpp" -o -name "*.h" -print | xargs /usr/bin/env clang-format --dry-run -Werror
docker run --rm -v${PWD}:${PWD} -w${PWD} ${DOCKER_IMAGE}:${DOCKER_PULL_TAG} \
sh -c 'find ${FORMAT_DIRS} -name "*.cpp" -o -name "*.c" -o -name "*.h" -o -name "*.cpp" | xargs /usr/bin/env clang-format -i --dry-run -Werror'

clean:
${SUDO} rm -rf artifacts-*
sudo rm -rf artifacts-*
Loading

0 comments on commit e9e5d74

Please sign in to comment.