Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pping #7

Merged
merged 14 commits into from
Feb 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions pping/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)

USER_TARGETS := pping
TC_BPF_TARGETS := pping_kern_tc
BPF_TARGETS := pping_kern_xdp
BPF_TARGETS += $(TC_BPF_TARGETS)

LDFLAGS += -pthread
EXTRA_DEPS += config.mk pping.h pping_helpers.h

LIB_DIR = ../lib

include $(LIB_DIR)/common.mk
include config.mk

all: config.mk

config.mk: configure
@sh configure

ifndef HAVE_TC_LIBBPF
# If the iproute2 'tc' tool doesn't understand BTF debug info
# use llvm-strip to remove this debug info from object file
#
# *BUT* cannot strip everything as it removes ELF elems needed for
# creating maps
#
.PHONY: strip_tc_obj
strip_tc_obj: ${TC_BPF_TARGETS:=.o}
$(Q) echo "TC don't support libbpf - strip BTF info"
$(Q) llvm-strip --no-strip-all --remove-section .BTF $?

all: strip_tc_obj
endif
19 changes: 19 additions & 0 deletions pping/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# PPing using XDP and TC-BPF
An implementation of the passive ping ([pping](https://github.com/pollere/pping)) utility based on XDP (for ingress) and TC-BPF (for egress)

## Simple description
Passive Ping (PPing) makes use of the TCP Timestamp option to calculate the RTT for TCP traffic passing through.
PPing can be used on measure RTTs on end hosts or any device which sees both directions of the TCP flow.

For outgoing packets, it checks for TCP timestamp TSval in the TCP header. If it finds one it creates a timestamp
for when it saw that TSval in a particular flow. On incomming packets it parses the TCP timestamp TSecr (which
is the TSval echoed by the receiving host) and checks it has seen any previous outgoing packets with that TCP
timestamp. If it has, an RTT is calculated as the difference in time between when it saw an outgoing packet
with a TSval, and when it received an incomming packet from the reverse flow with a matching TSecr.

Note that TCP timestamps may not be unique for every packet in a flow, therefore it only matches the first
outgoing packet with a particular TSval with the first incomming packet with a matching TSecr. Duplicate
TSval/TSecr are ignored.

## Planned design
!["Design of eBPF pping](./eBPF_pping_design.png)
18 changes: 18 additions & 0 deletions pping/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# TODO

## For initial merge
- [x] Clean up commits and add signed-off-by tags
- [x] Add SPDX-license-identifier tags
- [x] Format C-code in kernel style
- [x] Use existing funcionality to reuse maps by using BTF-defined maps
- [x] Use BTF-defined maps for TC-BPF as well if iproute has libbpf support

## Future
- [ ] Use libxdp to load XDP program
- [x] Cleanup: Unload TC-BPF at program shutdown, and unpin map - In userspace part
- [ ] Add IPv6 support - In TC-BPF, XDP and userspace part
- [ ] Check for existance of reverse flow before adding to hash-map (to avoid adding timestamps for flows that we can't see the reverse traffic for) - In TC-BPF part
- This could miss the first few packets, would not be ideal for short flows
- [ ] Keep track of minimum RTT for each flow (done by Pollere's pping, and helps identify buffer bloat) - In XDP part
- [ ] Add configurable rate-limit for how often each flow can add entries to the map (prevent high-rate flows from quickly filling up the map) - In TCP-BPF part
- [ ] Improve map cleaning: Use a dynamic time to live for hash map entries based on flow's RTT, instead of static 10s limit - In TC-BPF, XDP and userspace
87 changes: 87 additions & 0 deletions pping/bpf_egress_loader.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/bin/bash
#
# Author: Jesper Dangaaard Brouer <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add some indication here that you modified the script (which I'm assuming you did?); either adding yourself, or a comment that you modified it, or turning the 'author' into a series of copyright lines. @netoptimizer any opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment here that I've made extended it by adding a "section" option (and have also changed a default fallback value, although this comment doesn't state that, but could be rectified).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely just a copy of my script.
My plan is to generalize this script and place it one level up.
It is ugly that we have several copies of the same script, but it is fine for this PR... I should be assigned to clean it up (maybe create a github issue to track this?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #9 for this...

# License: GPLv2
#
# Modified by Simon Sundberg <[email protected]> to add support
# of optional section (--sec) option and changed default BPF_OBJ
#
basedir=`dirname $0`
source ${basedir}/functions.sh

root_check_run_with_sudo "$@"

# Use common parameters
source ${basedir}/parameters.sh

export TC=/sbin/tc

# This can be changed via --file or --obj
if [[ -z ${BPF_OBJ} ]]; then
# Fallback default
BPF_OBJ=pping_kern_tc.o
fi

# This can be changed via --sec
if [[ -z ${SEC} ]]; then
# Fallback default
SEC=pping_egress
fi

info "Applying TC-BPF egress setup on device: $DEV with object file: $BPF_OBJ"

function tc_remove_clsact()
{
local device=${1:-$DEV}
shift

# Removing qdisc clsact, also deletes all filters
call_tc_allow_fail qdisc del dev "$device" clsact 2> /dev/null
}

function tc_init_clsact()
{
local device=${1:-$DEV}
shift

# TODO: find method that avoids flushing (all users)

# Also deletes all filters
call_tc_allow_fail qdisc del dev "$device" clsact 2> /dev/null

# Load qdisc clsact which allow us to attach BPF-progs as TC filters
call_tc qdisc add dev "$device" clsact
}

function tc_egress_bpf_attach()
{
local device=${1:-$DEV}
local objfile=${2:-$BPF_OBJ}
local section=${3:-$SEC}
shift 3

call_tc filter add dev "$device" pref 2 handle 2 \
egress bpf da obj "$objfile" sec "$section"
}

function tc_egress_list()
{
local device=${1:-$DEV}

call_tc filter show dev "$device" egress
}

if [[ -n $REMOVE ]]; then
tc_remove_clsact $DEV
exit 0
fi

tc_init_clsact $DEV
tc_egress_bpf_attach $DEV $BPF_OBJ $SEC

# Practical to list egress filters after setup.
# (It's a common mistake to have several progs loaded)
if [[ -n $LIST ]]; then
info "Listing egress filter on device"
tc_egress_list $DEV
fi
29 changes: 29 additions & 0 deletions pping/configure
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
# This is not an autoconf generated configure
#

# Output file which is input to Makefile
CONFIG=config.mk

# Assume tc is in $PATH
TC=tc

check_tc_libbpf()
{
tc_version=$($TC -V)
if echo $tc_version | grep -q libbpf; then
libbpf_version=${tc_version##*libbpf }
echo "HAVE_TC_LIBBPF:=y" >> $CONFIG
echo "BPF_CFLAGS += -DHAVE_TC_LIBBPF" >> $CONFIG
echo "yes ($libbpf_version)"
else
echo "no"
fi
}

echo "# Generated config" > $CONFIG
echo "Detecting available features on system"

echo -n " - libbpf support in tc tool: "
check_tc_libbpf
Binary file added pping/eBPF_pping_design.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 64 additions & 0 deletions pping/functions.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#
# Common functions used by scripts in this directory
# - Depending on bash 3 (or higher) syntax
#
# Author: Jesper Dangaaard Brouer <[email protected]>
tohojo marked this conversation as resolved.
Show resolved Hide resolved
# License: GPLv2

## -- sudo trick --
function root_check_run_with_sudo() {
# Trick so, program can be run as normal user, will just use "sudo"
# call as root_check_run_as_sudo "$@"
if [ "$EUID" -ne 0 ]; then
if [ -x $0 ]; then # Directly executable use sudo
echo "# (Not root, running with sudo)" >&2
sudo "$0" "$@"
exit $?
fi
echo "cannot perform sudo run of $0"
exit 1
fi
}

## -- General shell logging cmds --
function err() {
local exitcode=$1
shift
echo -e "ERROR: $@" >&2
exit $exitcode
}

function warn() {
echo -e "WARN : $@" >&2
}

function info() {
if [[ -n "$VERBOSE" ]]; then
echo "# $@"
fi
}

## -- Wrapper calls for TC --
function _call_tc() {
local allow_fail="$1"
shift
if [[ -n "$VERBOSE" ]]; then
echo "tc $@"
fi
if [[ -n "$DRYRUN" ]]; then
return
fi
$TC "$@"
local status=$?
if (( $status != 0 )); then
if [[ "$allow_fail" == "" ]]; then
err 3 "Exec error($status) occurred cmd: \"$TC $@\""
fi
fi
}
function call_tc() {
_call_tc "" "$@"
}
function call_tc_allow_fail() {
_call_tc "allow_fail" "$@"
}
94 changes: 94 additions & 0 deletions pping/parameters.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#
# Common parameter parsing used by scripts in this directory
# - Depending on bash 3 (or higher) syntax
#
# Author: Jesper Dangaaard Brouer <[email protected]>
tohojo marked this conversation as resolved.
Show resolved Hide resolved
# License: GPLv2
#
# Modified by Simon Sundberg <[email protected]> to add support
# of optional section (--sec) option
#

function usage() {
echo ""
echo "Usage: $0 [-vh] --dev ethX"
echo " -d | --dev : (\$DEV) Interface/device (required)"
echo " -v | --verbose : (\$VERBOSE) verbose"
echo " --remove : (\$REMOVE) Remove the rules"
echo " --dry-run : (\$DRYRUN) Dry-run only (echo tc commands)"
echo " -s | --stats : (\$STATS_ONLY) Call statistics command"
echo " -l | --list : (\$LIST) List setup after setup"
echo " --file | --obj : (\$BPF_OBJ) BPF-object file to load"
echo " --sec : (\$SEC) Section of BPF-object to load"
echo ""
}

# Using external program "getopt" to get --long-options
OPTIONS=$(getopt -o vshd:l \
--long verbose,dry-run,remove,stats,list,help,dev:,file:,obj:,sec: -- "$@")
if (( $? != 0 )); then
usage
err 2 "Error calling getopt"
fi
eval set -- "$OPTIONS"

## --- Parse command line arguments / parameters ---
while true; do
case "$1" in
-d | --dev ) # device
export DEV=$2
info "Device set to: DEV=$DEV" >&2
shift 2
;;
--file | --obj )
export BPF_OBJ=$2
info "BPF-object file: $BPF_OBJ" >&2
shift 2
;;
--sec )
export SEC=$2
info "Section to load: $SEC" >&2
shift 2
;;
-v | --verbose)
export VERBOSE=yes
# info "Verbose mode: VERBOSE=$VERBOSE" >&2
shift
;;
--dry-run )
export DRYRUN=yes
export VERBOSE=yes
info "Dry-run mode: enable VERBOSE and don't call TC" >&2
shift
;;
--remove )
export REMOVE=yes
shift
;;
-s | --stats )
export STATS_ONLY=yes
shift
;;
-l | --list )
export LIST=yes
shift
;;
-- )
shift
break
;;
-h | --help )
usage;
exit 0
;;
* )
shift
break
;;
esac
done

if [ -z "$DEV" ]; then
usage
err 2 "Please specify net_device (\$DEV)"
fi
Loading