-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pping #7
Add pping #7
Conversation
A few high-level comments:
|
Thank you for your comments @tohojo. I will probably be busy with some other things the next couple of days, but after that I'll get back to addressing these points. |
Alright, that's totally fine of course. You can just force-push to the existing branch to update the PR, but please leave a comment as well when you do that - Github doesn't always send out notifications on branch updates... |
f7b4d04
to
8981833
Compare
Hi @tohojo,
Any advise on how to handle this? My current workaround is to simply take the fd of the first map from the object, but that does not feel like a very good long term solution. |
libbpf will reload the name from the kernel when reusing the fd, so if there's no name that would be because your kernel doesn't have one? Which kernel version are you running? And what output does |
I'm running kernel version 5.4.0-60 (under Ubuntu 20.04.1LTS), and the output of bpftool map is:
Also tried it on a VM running Ubuntu 20.10 with kernel version 5.8.0-38, with the same result. I tc version |
Simon Sundberg <[email protected]> writes:
> libbpf will reload the name from the kernel when reusing the fd, so
> if there's no name that would be because your kernel doesn't have
> one? Which kernel version are you running? And what output does
> `bpftool map` show?
I'm running kernel version 5.4.0-60 (under Ubuntu 20.04.1LTS), and the output of bpftool map is:
```
1: hash flags 0x0
key 16B value 16B max_entries 16384 memlock 1576960B
7: perf_event_array name rtt_events flags 0x0
key 4B value 4B max_entries 8 memlock 4096B
```
Also tried it on a VM running Ubuntu 20.10 with kernel version 5.8.0-38, with the same result.
I tc version `tc-utility, iproute2-ss200127` (and `tc-utility,
iproute2-ss200602` on the VM) if that matters, as the map is pinned by
tc.
Ah, 'tc' doesn't have BTF support so it can't read the name of a
BTF-defined map. With the changes you made it shouldn't be necessary to
pin it with TC, though. You can just call bpf_map__pin() and libbpf will
pin it in the same path as it would reuse it from. You can check with
bpf_map__is_pinned() if it's necessary to pin or if it already was.
Alternatively, you can build the latest iproute2-next from git which has
libbpf support - then you should get full BTF support in tc as well :)
|
No, my tc (or iproute2) version doesn't have libbpf support, so therefore the tc-bpf program doesn't use a BTF-defined map but rather But you are completely right in that I don't need to pin the map with TC. I tried simply switching order so that the XDP-program which is loaded with libbpf is loaded first (and thus pins the map if it doesn't already exist), and then TC gets to reuse that map instead and the name is preserved. I'm not sure if I understand why I would need to "manually" pin the map with bpf_map__pin() though. Just leaving libbpf to handle that automatically based on the pinning attribute in the BTF-defined map seems to work fine? As far as I understand it, libbpf handles the automatic pinning similarly to iproute2, in that it will automatically reuse the map if it already exists, or otherwise pin it. |
Simon Sundberg <[email protected]> writes:
> Ah, 'tc' doesn't have BTF support so it can't read the name of a
BTF-defined map. With the changes you made it shouldn't be necessary to
pin it with TC, though. You can just call bpf_map__pin() and libbpf will
pin it in the same path as it would reuse it from. You can check with
bpf_map__is_pinned() if it's necessary to pin or if it already was.
No, my tc (or iproute2) version doesn't have libbpf support, so
therefore the tc-bpf program doesn't use a BTF-defined map but rather
`struct bpf_elf_map`. I guess I could do something similar as in
[traffic-pacing-edt/edt_pacer_vlan.c](https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/edt_pacer_vlan.c#L61-L84)
to use BTF-defined maps if they're supported.
But you are completely right in that I don't need to pin the map with
TC. I tried simply switching order so that the XDP-program which is
loaded with libbpf is loaded first (and thus pins the map if it
doesn't already exist), and then TC gets to reuse that map instead and
the name is preserved.
Yeah, just swapping the load order is probably the easiest :)
I'm not sure if I understand why I would need to "manually" pin the
map with bpf_map__pin() though. Just leaving libbpf to handle that
automatically based on the pinning attribute in the BTF-defined map
seems to work fine? As far as I understand it, libbpf handles the
automatic pinning similarly to iproute2, in that it will automatically
reuse the map if it already exists, or otherwise pin it.
No, you're right, of course, that was me misremembering how this works.
Just using the pinning attribute should be fine!
|
Hi again @tohojo. As of commit 5a40220 I think I've addressed all points from your initial high-level comments, with exception of using libxdp (think I'll wait with that until you've made it easy to include from this repository). So if you have the time, it would be great if you could take another look at this and see if there are any other issues. Otherwise I'll probably start working on adding IPv6 support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall in pretty good shape; these are mostly smallish nits and style issues :)
@@ -0,0 +1,85 @@ | |||
#!/bin/bash | |||
# | |||
# Author: Jesper Dangaaard Brouer <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably add some indication here that you modified the script (which I'm assuming you did?); either adding yourself, or a comment that you modified it, or turning the 'author' into a series of copyright lines. @netoptimizer any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a comment here that I've made extended it by adding a "section" option (and have also changed a default fallback value, although this comment doesn't state that, but could be rectified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely just a copy of my script.
My plan is to generalize this script and place it one level up.
It is ugly that we have several copies of the same script, but it is fine for this PR... I should be assigned to clean it up (maybe create a github issue to track this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #9 for this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few runtime issues as well, now that I ran out of code style issues ;)
Also, a couple of points about commit messages:
- The git formatter expects a completely blank line between the subject (first line) and the rest of the text.
- Please break lines at 80 characters so it doesn't look weird when looking at 'git log' in a console.
- Try to keep the language imperative in the changes of the code ("change something" instead of "I changed something" or "this patch changes something")
So instead of:
Minor refactor and fixed weird spacing
Have refactored tc_bpf_load and tc_bpf_clear to use a common run_program function which does the fork+execv.
Have also fixed some weird spacing in pping_helpers.h, and ran clang-format with the kernel source tree .clang-format on the other source files which resultet in some minor changes.
Signed-off-by: Simon Sundberg <[email protected]>
this could be something like:
Minor refactor and whitespace fixes
Refactor tc_bpf_load and tc_bpf_clear to use a common run_program
function which does the fork+execv.
Also fix some weird spacing in pping_helpers.h, and fix some formatting
issues, using clang-format with the kernel source tree .clang-format on the
whole tree.
Signed-off-by: Simon Sundberg <[email protected]>
So, would you like me to rebase in order to rewrite the commit messages in this style, or is messing with the history at this point an even worse sin? |
Yeah, please go ahead and rewrite history. As long as it is pre-merge this is fine; and in fact often encouraged. The final version of the commits is what will be preserved in the history for posterity, so the important thing is the readability of those. For kernel submissions this also means folding fixes into the original patches. We're not as strict with that in this repository (as you can probably tell by looking at the history), so you don't have to do that for this PR. But fixing the commit messages is more important as that will go into the history, and right now If you do want to play with reshuffling commits and folding fixes into other patches (for this or for future series), I can recommend Oh, and if you're not already using |
Hi @tohojo, Have also added the check for if the program is run as root or not (stealing your code snippet) and attempt to create the /sys/fs/bpf/tc folder before the map is pinned. So feel free to take another look at it and see if there's anything else that needs fixing (or if the fixes themselves need fixing...). |
Also, thanks for the tip about |
Looks much better! But yeah, now that you mention it, prefixing the subjects with
Other than the above and the comment on |
Can rewrite the commits prepending |
Add a XDP program to parse TCP timestamps and a simple loader Signed-off-by: Simon Sundberg <[email protected]>
Signed-off-by: Simon Sundberg <[email protected]>
The XDP program pushes the calculated RTTs to userspace through the perf-buffer and the userspace program polls it to print them out Signed-off-by: Simon Sundberg <[email protected]>
Split and rename files so there is one userspace program (pping) and two kernel-space ones (one for XDP and one for TC-BPF). Copy the shell script for loading the TC-BPF program from traffic-pacing-edt folder, but add support for loading a specific section. The XDP and TC-BPF programs do not share the ts_start map, so program does not work. Signed-off-by: Simon Sundberg <[email protected]>
Make tc pin the ts_start map when loading the TC-BPF program, and rewrite XDP loader to reuse map pinned by tc. Also add comment with TODO list in pping.c. Testing pping by adding a delay through a netem qdisc in the test environment shows that the reported RTT will approach 100ms for any delay lower than 100ms, but the correct RTT for any delay over 100ms. Root cause is unknown, but Pollere's original pping implementation (as well as a bpftrace based pping implementation) shows the same issue. This issue has not been observed when running on real interfaces without a netem qdisc. Signed-off-by: Simon Sundberg <[email protected]>
Format the code using the .clang-format from the kernel source tree, with a few manual tweaks here and there. Also, remove the TODO list from comment of pping.c and instead put it in TODO.md. Signed-off-by: Simon Sundberg <[email protected]>
Format the header files in the Linux kernel style (missed in previous commit). Also fix a formating error in TODO.md that cause empty checkboxes to not display correctly. Signed-off-by: Simon Sundberg <[email protected]>
Make loader use libbpf's existing functionality for reusing pinned maps. The name for map not kept by tc, so cannot get fd of map by name. Use fd of first encountered map as temporary workaround. Signed-off-by: Simon Sundberg <[email protected]>
Switch order so XDP program loads first, so the ts_start map is automatically pinned by libbpf (solves issue with tc not preserving the name of the map). Unload the TCP-BPF program (or rather remove the entire clsact qdisc it is attached to) using bpg_egress_loader script once program exits. Also unpin ts_start map on program shutdown. Signed-off-by: Simon Sundberg <[email protected]>
Copy setup from traffic-pacing-edt to use BTF-defined map if configure detects that iproute2 has libbpf support, otherwise fall back on bpf_elf_map. Also fix a minor bug with setting default value for SEC in bpf_egress_loader.sh. Signed-off-by: Simon Sundberg <[email protected]>
Split the print statements for RTTs into two parts to avoid inet_ntoa overwriting one of the IP-addresses (causing both source and destitionation address to appear the same). Also flip the order of source and destination to be the same as Pollere's pping. Signed-off-by: Simon Sundberg <[email protected]>
Perform various fixes and tweaks: - Rename several defines to make them more informative - Remove unrolling of loop in BPF programs - Reuse defines for program sections between userspace and kernel space programs - Perform fork+exec to run bpf_egress_loader script instead of system() - Add comment to copied scripts indicating I've modified them - Add pping.h and pping_helpers.h as dependencies in Makefile Also, add a brief description of what PPing is and how it works to README Signed-off-by: Simon Sundberg <[email protected]>
Refactor tc_bpf_load and tc_bpf_clear to use a common run_program function which does the fork+execv. Enclose compound statement defines in parenthesis. Removed argument CLOCK_MONOTONIC from callers to parameterless function get_time_ns(). Also fix some weird spacing in pping_helpers.h, and fix some formatting issues, using clang-format with the kernel source tree .clang-format on the whole tree. Signed-off-by: Simon Sundberg <[email protected]>
Create the /sys/fs/bpf/tc folder if it does not exist. Also check if pping is run as root, otherwise inform user that it must run as root. Libbpf will attempt to create the /sys/fs/bpf/tc/globals directory when pinning the map, however it will not do so recursivly (so will fail if /sys/fs/bpf/tc does not exist). So as a temporary solution, attempt to create /sys/fs/bpf/tc (however, if sys/fs/bpf is not mounted this will still fail). Signed-off-by: Simon Sundberg <[email protected]>
So @tohojo, now I've finally gotten around to refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks good - nice work! :)
This is the initial version of pping that I showed @tohojo and @netoptimizer yesterday (but with a little bit of cleanup). It only supports IPv4 at the moment (IPv6 support is on the TODO list).
Regarding the issue you saw with it reporting RTTs of around 100ms when adding just 30ms latency (on the veth0 interface in the test environment), I'm not sure what's causing it. After the first couple of packets it seems to approach 100ms for any delay <100ms, and the correct value for delays >100ms. I get the same issue when running Pollere's pping as well as my bpftrace pping implementation.