Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPing minor userspace fixes #57

Merged
merged 7 commits into from
Nov 8, 2022

Conversation

simosund
Copy link
Contributor

@simosund simosund commented Sep 5, 2022

A few unrelated minor fixes to the userspace component of ePPing. Threw them together in the same PR to avoid lots of tiny PRs.

  • The first commit fixes an issue where the clsact would not be cleaned up if using tc mode (--ingress-hook tc)
  • The second commit adds a small hint suggesting trying a newer kernel or using tc mode if it fails loading the XDP program. This addresses Unclear kernel requirements for ePPing #49.
  • The third commit simply adds a couple of calls to bfp_object__close() on shutdown/error. I'm not sure if this is really needed when the program ends directly after anyways, but most recent examples seem to do it so figured I'd better do it as well in case it cleans up some resources that's not automatically freed by the kernel when the program ends.

As for the second commit, I'm not sure if it's the best way to address #49. Simone suggested to automatically try to load and attach the tc ingress program in case the XDP one fails. While this should be possible (although a bit more of a hassle) and in some cases quite convenient, I'm not sure if automatically changing user-specified arguments is a good idea (even if it's not done silently) as it might be easy to miss unless paying close attention to stderr, especially if ePPing is started via ex. a script.

A somewhat related discussion is if we should perhaps always try to load the XDP program in native mode (XDP_MODE_NATIVE), instead of in unspecified mode (XDP_MODE_UNSPEC) which will (silently) fall back on generic XDP if there's no driver support. From some old discussions with Jesper I got the impression that the performance of XDP generic is rather lackluster, and it probably being better to run with tc than with generic XDP. If that's the case I'm not sure if it makes any sense to allow running in XDP generic mode (and also being unclear to the user whether it's is running in native or generic).

Finally, should XDP mode really be the default? It's less likely to work (due to requiring a more recent kernel), does (still not?) work with ex. GRO and may offer worse performance than tc if running in generic mode. Even if it's running in native XDP mode do we really expect it to have much of a performance benefit here anyways? As ePPing only monitors the packets it's always letting all packets through to the kernel anyways (always return XDP_PASS), so we don't really benefit from being able to create a fastpath that bypasses the kernel like some other XDP use cases might. So perhaps it makes more sense to have tc as the default instead (which can be changed to XDP by passing -I/--ingress-hook xdp)?

Copy link
Member

@tohojo tohojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it makes more sense to just default to TC mode, and also only use XDP in native mode :)

@simosund simosund force-pushed the pping-minor-userspace-fixes branch from 2ee87a1 to 7830efd Compare November 3, 2022 17:35
@simosund
Copy link
Contributor Author

simosund commented Nov 7, 2022

So, I've expanded this PR a bit. A brief overview of the commits as of now:

  • The first commit fixes an issue where the clsact qdisc would not be cleaned up if using tc ingress (-I/--ingress-hook tc).
  • The second commit #defines the BPF program names in the user space loader to avoid them being hardcoded in multiple places. The idea is that if the program names were to change, it might lead to (potentially sneaky) errors if the program names are not correctly updated in all the places. Ideally both the BPF progs and the user space side would use names from a single place (ex. defined in pping.h), but I'm not aware of a way to use a single definition that works both as a (function) identifier and as a string.
  • The third commit changes ePPing to use the tc ingress hook instead of XDP as default (as TC should be more likely to work correctly, and for our use case we shouldn't expect XDP to provide a considerable performance advantage)
  • The fourth commit adds the option to chose which mode to load the XDP program in (set to native by default). I'm not sure if this option is really needed (can't see much of a use case with running ePPing with XDP in any other mode than native). I've also left "hardware" mode out as I suspect that will not work well with the rest of ePPing (still need to share maps tc egress etc.), although I know too little about hardware offloaded XDP to say if this is really the case. Finally, as this commit does two things (adds the option and changes default from unspecified to native) it should arguably be two separate commits, but figured that this PR already contains plenty of small commits and these were highly related.
  • The fifth commit simply adds an additional error message/hint in case the XDP load fails. After the change to use tc ingress as default, I'm not sure to what degree this is needed anymore. I suspect it could also be improved by checking the actual error code to distinguish between different errors instead of the very simple assumption it does right now based on XDP mode.
  • The sixth commit adds calls to bpf_object__close on error/shutdown. Not sure how necessary this is as I would assume the kernel would free up most resources when the program terminates anyways, but as other examples do so I assume it's good practice.
  • The seventh commit aborts the program if the periodical map cleanup fails. While a single failure to run the cleanup might not be too problematic, in case the cleanup fails constantly the maps will eventually fill up with stale entries. This would leave ePPing in a state where it still adds overhead due to processing every packet, but being unable to will be unable to provide any measurements as it will be unable to add new flows/timestamps. Therefore, it probably makes more sense to just abort the program as soon as the cleanup fails rather than letting it keep running and hope it recovers.

@tohojo
Copy link
Member

tohojo commented Nov 7, 2022

The changes LGTM, but seems the PR needs a rebase...

The userspace loader would only check if the tc clsact was created
when the egress program was loaded. Thus, if the ingress program
created the clsact the egress program would not have to create the
clsact, the ePPing would thus falsely believe it did not create a
clsact and fail to remove it on shutdown even if --force was used. Fix
this by checking if either ingress or egress created clsact.

This bug was introduced as a sneaky side effect of commit
78b45bd (pping: Use libxdp to load
and attach XDP program). Before this commit the egress program (for
which there is only a tc alternative) would be loaded first, and thus
it was sufficient to check if it created the clsact. When switching to
libxdp however, the ingress program (specifically the XDP program) had
to be loaded first, and thus the order of loading ingress and egress
program were swapped. Therefore, it was no longer sufficient to only
check the egress program as the tc ingress program may have created
the clsact before the the egress program is attached (and only
checking the ingress program would also not be enough as the tc
ingress program may never be loaded if XDP mode is used instead).

Signed-off-by: Simon Sundberg <[email protected]>
Define the BPF program names in the user space component. The strings
corresponding to the BPF program names were before inserted in several
places, including in multiple string comparison, which is error prone
and could leave to subtle errors if the program names are changed and
not updated correctly in all places. With the program name string
being defined, they only have to be changed in a single place.

Currently only the names of the ingress programs occur in multiple
places, but also define the name for the egress program to be
consistent.

Note that even after this change one has the sync the defined values
with the actual program names declared in the pping_kern.c
file. Ideally, these would all be defined in a single place, but not
aware of a convenient way to make that happen (cannot use the defined
strings as function names as they are not identifiers, and if defined
as identifiers instead it would not be possible to use them as
strings).

Signed-off-by: Simon Sundberg <[email protected]>
Using the XDP ingress hook requires a newer kernel (needs Toke's patch
fixing the verification of global function for BPF_PROG_TYPE_EXT
programs) than tc mode, is will likely perform worse than tc if
running in generic mode (due to no driver support for
XDP). Furthermore, even when XDP works and has driver support, its
performance benefit over tc is likely small as the packets are always
passed on to the network stack regardless (not creating a fast-path
that bypasses the network stack). Therefore, use the tc ingress hook
as default instead, and only use XDP if explicitly required by the
user (-I/--ingress hook xdp).

This partly addresses issue xdp-project#49, as ePPing should no longer by default
get the confusing error message from failing verification if the
kernel lacks Toke's verifier patch.

Signed-off-by: Simon Sundberg <[email protected]>
Add an option to let the user configure which mode to load the XDP
program in (unspecified, native or generic).

Set the default mode to native (was unspecified previously) as that is
what the user most likely wants to use (generic or unpsecified falling
back on generic will likely have worse performance).

Signed-off-by: Simon Sundberg <[email protected]>
Due to a kernel bug for XDP programs loaded via libxdp that use global
functions (see https://lore.kernel.org/bpf/[email protected]/t/),
XDP mode only works on relatively recent kernels where the bug is
patched (or kernels where the patch has been backported). As many
users may not have such a recent kernel they will only see a confusing
verifier error like the following:

Starting ePPing in standard mode tracking TCP on test123
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libbpf: prog 'pping_xdp_ingress': BPF program load failed: Invalid argument
libbpf: prog 'pping_xdp_ingress': -- BEGIN PROG LOAD LOG --
Func#1 is safe for any args that match its prototype
Validating pping_xdp_ingress() func#0...
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
; int pping_xdp_ingress(struct xdp_md *ctx)
0: (bf) r6 = r1                       ; R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
1: (b7) r7 = 0                        ; R7_w=invP0
; struct packet_info p_info = { 0 };
2: (7b) *(u64 *)(r10 -8) = r7         ; R7_w=invP0 R10=fp0 fp-8_w=00000000
3: (7b) *(u64 *)(r10 -16) = r7        ; R7_w=invP0 R10=fp0 fp-16_w=00000000
4: (7b) *(u64 *)(r10 -24) = r7        ; R7_w=invP0 R10=fp0 fp-24_w=00000000
5: (7b) *(u64 *)(r10 -32) = r7        ; R7_w=invP0 R10=fp0 fp-32_w=00000000
6: (7b) *(u64 *)(r10 -40) = r7        ; R7_w=invP0 R10=fp0 fp-40_w=00000000
7: (7b) *(u64 *)(r10 -48) = r7        ; R7_w=invP0 R10=fp0 fp-48_w=00000000
8: (7b) *(u64 *)(r10 -56) = r7        ; R7_w=invP0 R10=fp0 fp-56_w=00000000
9: (7b) *(u64 *)(r10 -64) = r7        ; R7_w=invP0 R10=fp0 fp-64_w=00000000
10: (7b) *(u64 *)(r10 -72) = r7       ; R7_w=invP0 R10=fp0 fp-72_w=00000000
11: (7b) *(u64 *)(r10 -80) = r7       ; R7_w=invP0 R10=fp0 fp-80_w=00000000
12: (7b) *(u64 *)(r10 -88) = r7       ; R7_w=invP0 R10=fp0 fp-88_w=00000000
13: (7b) *(u64 *)(r10 -96) = r7       ; R7_w=invP0 R10=fp0 fp-96_w=00000000
14: (7b) *(u64 *)(r10 -104) = r7      ; R7_w=invP0 R10=fp0 fp-104_w=00000000
15: (7b) *(u64 *)(r10 -112) = r7      ; R7_w=invP0 R10=fp0 fp-112_w=00000000
16: (7b) *(u64 *)(r10 -120) = r7      ; R7_w=invP0 R10=fp0 fp-120_w=00000000
17: (7b) *(u64 *)(r10 -128) = r7      ; R7_w=invP0 R10=fp0 fp-128_w=00000000
18: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
;
19: (07) r2 += -128                   ; R2=fp-128
; if (parse_packet_identifer_xdp(ctx, &p_info) < 0)
20: (85) call pc+13
R1 type=ctx expected=fp
Caller passes invalid args into func#1
processed 206542 insns (limit 1000000) max_states_per_insn 32 total_states 13238 peak_states 792 mark_read 40
-- END PROG LOAD LOG --
libbpf: failed to load program 'pping_xdp_ingress'
libbpf: failed to load object 'pping_kern.o'
Failed attaching ingress BPF program on interface test123: Invalid argument
Failed loading and attaching BPF programs in pping_kern.o

To help users that run into this issue when loading the program in
generic or unspecified mode, add a small hint suggesting to
upgrade the kernel or use the tc ingress mode instead in case
attaching the XDP program fails.

However, if loaded in native mode, instead give the suggestion to try
loading in generic mode instead. While libbpf and libxdp already add
some messages hinting at this, this hint clarifies how to do this with
ePPing (using the --xdp-mode argument).

Signed-off-by: Simon Sundberg <[email protected]>
Previously the program would only print out an error message if the
cleanup of a map failed, and then keep running. Each time the
periodical cleanup failed the error message would be repeated, but no
further action taken. Change this behavior so that the program instead
terminates the cleanup thread and aborts the rest of the program.

Signed-off-by: Simon Sundberg <[email protected]>
@simosund simosund force-pushed the pping-minor-userspace-fixes branch from 7830efd to 91d7242 Compare November 8, 2022 08:57
@simosund
Copy link
Contributor Author

simosund commented Nov 8, 2022

Yes, had not rebased it after you #55 and #62, but fixed that now.

@tohojo tohojo merged commit 5e5c557 into xdp-project:master Nov 8, 2022
@simosund simosund deleted the pping-minor-userspace-fixes branch September 11, 2023 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants