Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revamp Testing Infrastructure and run Multi-Kernel Tests in CI (#111)
* Fix invalid enum relocation * Improve error logging on JSON unmarshal failure EventsTrace was going into a print "\n" loop on a misconfigured kernel, which was hard to diagnose with the current error logging. * Fix broken probe_set_features logic See comment, libbpf_probe_bpf_prog_type will return true on both aarch64 and x86_64 as it checks for the ability to _load_, not the ability to _attach_, which is what will fail on aarch64. * Rework multi-kernel-tester and run in CI - Move the core initramfs and init process logic to Bluebox - Set up the whole thing to run in CI with mainline kernels v5.11-v5.18 - Redo testing/README.md for new changes * Add debug build target * Clarify format / test-format targets, format code * Disable fail-fast in multikernel test action * Clarify and cleanup formatting/build CI workflow * Dump contents of tracefs trace file on test fail This file contains the result of all bpf_prink's, which often encompasses a wealth of information when probes fail to do something. * Fix typo in README.md Co-authored-by: Nicholas Berlin <[email protected]> * Fix test failures on Linux 5.11/aarch64 We were attempting to dereference a struct dentry into a struct dentry *, which resulted in garbage data. Add FUNC_ARG_READ_PTREGS_NODEREF to skip the dereference step and use it to fix the test failures. * Fix broken build-debug target in Makefile Additionally, remove nonexistent .PHONY targets and simplify build-debug by removing _internal-build-debug target. * Remove -DENABLE_BPF_PRINTK from CI builds Enabling bpf_printk has a huge performance penalty in the emulated VMs running in CI. Running aarch64 tests on Linux 5.11 without bpf_printk takes ~1m30s while the same run without bpf_printk takes ~2m10s. While this is larger than I'd expect, I can see it happening on an emulated VM, every bpf_printk call corresponds to a lot of extra instructions that have to be emulated. Disable the flag. This reduces the debug info present in CI artifacts, but if tests fail, devs should be able to run the failing test locally with -DENABLE_BPF_PRINTK to get debug output. * Improve logging on test failure - Log stacktrace to stdout so logs are all serialized - Log banner warning that the trace file is always empty on CI - I can see this eating a day of a dev's time otherwise :P * Fix BPF tramp detection, add multi-kernel tests This logic should properly detect a lack of BPF trampoline support on x86 kernels that don't have it enabled (unlike the previous simple arch check). It's also incredibly subject to change in a new kernel release (what if taskstats_exit dissapears?), so add a multi-kernel test for feature probing. This should be nicely extensible to other features in the future. * Document unintuitive API * Add comment to TestFeaturesCorrect RE: x86 * Dockerize code formatting Makefile target Differences in clang-format versions were causing a headache here. clang-format 13 was fine with the code as-is, but clang-format 11 would make this change: diff --git a/non-GPL/TcLoader/TcLoader.c b/non-GPL/TcLoader/TcLoader.c index 9a8bc0c..1ed7c1d 100644 --- a/non-GPL/TcLoader/TcLoader.c +++ b/non-GPL/TcLoader/TcLoader.c @@ -371,10 +371,10 @@ static int netlink_qdisc(int cmd, unsigned int flags, const char *ifname) int rv = -1; struct rtnetlink_handle qdisc_rth = {.fd = -1}; struct netlink_msg qdisc_req = { - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .n.nlmsg_flags = NLM_F_REQUEST | flags, - .n.nlmsg_type = cmd, - .t.tcm_family = AF_UNSPEC, + .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), + .n.nlmsg_flags = NLM_F_REQUEST | flags, + .n.nlmsg_type = cmd, + .t.tcm_family = AF_UNSPEC, }; if (!ifname) { This seems to be a result of the AlignConsecutiveAssignments directive in .clang-format (removing it makes the two versions agree). Dockerize the format step to cleanly solve the problem for all such cases in the future. * Fix incorrect docker tag s/_TAG/_PULL_TAG/g Should be using the remote instead of the local. * Fix clang-format not running in a container find was running in the container, clang-format was not due to how the command was being interpreted aargh. * Fix incorrect script args in comment Co-Authored-By: Florian Lehner <[email protected]> * Remove time estimate in README Co-Authored-By: Florian Lehner <[email protected]> * Cleanup run_tests.sh arguments -d really should have allowed users to pass individual images if they want something more specific tested. Also remove old information from usage text and update for new -k arg. * Add note about debootstrap to builder script Co-Authored-By: Florian Lehner <[email protected]> * Fix improper getopts usage An option cannot have > 1 argument. Just put the list kernel images at the end (no specific flag required). * Fix incorrect variable name * Fix faulty bash list logic * Change BPFTOOL_VERSION to LINUX_TOOLS_VERSION Co-Authored-By: Florian Lehner <[email protected]> * Add gen_initramfs.sh script, update debug docs * Greatly clean up bash scripts - Use readonly/local - Make conditionals more concise with || and && - Pull everything into functions, no naked code - Standardize on lowercase for local vars - Make command line arg format consistent - Rename run_single_test.sh to invoke_qemu.sh, pull out all test logic for better separation of concerns - Update README.md to suggest usage of invoke_qemu.sh instead of directly invoking QEMU - Add ~/go/bin to PATH in bash instead of workflow YAML - Greatly clean up GNU parallel invocation * Fix caching logic in GH actions workflow Co-Authored-By: Lovel Rishi <[email protected]> * Fix inspecific restore key We could have restored the set of aarch64 kernels in the x86 tests with the workflow as it was. * Remove reduntant apt-get update Co-Authored-By: Lovel Rishi <[email protected]> * Cleanup Makefile - Add tag-container target and cleanup / separate docker related vars - Remove DOCKER_IMG_UBUNUT_VERSION * Fix broken test-format target * Add section to README.md on userspace debugging Co-Authored-By: Florian Lehner <[email protected]> * Add missing arg to gen_initramfs.sh help * Remove KVM section from README.md We're not currently using it in CI, and all dev work on my end has been done without it. We can discuss enabling it if/when we start using a custom runner, but for now, remove the clutter. * Fixups to scripts/invoke_qemu.sh - Make KVM probing an option, turned off by default (it seems to break debugging on my machine, and provides no real benefit anyways nested in a VBox VM) - Only provide one -append flag to QEMU, it will ignore all but the last, causing the -d option to not work * Add missing -o to find invocation in Makefile Co-authored-by: Nicholas Berlin <[email protected]> Co-authored-by: Florian Lehner <[email protected]> Co-authored-by: Lovel Rishi <[email protected]>
- Loading branch information