-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DSCP value to outgoing HTTP requests #464
base: main
Are you sure you want to change the base?
Conversation
Add ability to specify a DSCP value in IPv4/IPv6 header of outgoing HTTP requests Signed-off-by: Gary Lee <[email protected]>
We would like to add DSCP support to remote_write in Prometheus, specifically when it's running under Kubernetes. This PR is the first step. The change in prometheus will be small, something like:
Thanks |
Hi @garyclee, Thank you for the pull request. To be honest, I didn't know about this feature before. It seems very niche. However, after reviewing the code, I don't think this is a good idea to use A better way to achieve this might be to use a proxy server that can be configured to set the DSCP value in the IP header. What do you think about this approach? |
Thanks a lot for the tip about SetTOS() and SetTrafficClass(). I've started changing it and it looks promising. Would you be more inclined to accept the PR if I removed the the syscall? On platforms that don't support it, it will just ignore the DSCP option, and continue with DSCP = 0. Thanks |
Replace syscall with golang.org/x/net calls Signed-off-by: Gary Lee <[email protected]>
Remove whitespace Signed-off-by: Gary Lee <[email protected]>
The syscalls have been removed. I hope it's a little more acceptable. Using a proxy would be an option, but we would like to avoid it if possible, given it's relatively easy to do directly from Prometheus. Thanks a lot |
OpenShift supports setting specific DSCP for a pod's egress traffic: https://cloud.redhat.com/blog/using-qos-dscp-in-openshift-container-platform Perhaps standard k8s does also? Requiring the application to set TOS / DSCP may not work out that well, since setting some of the higher priorities will require CAP_NET_ADMIN (or equivalent on non-Linux platforms), which is why it usually handled at the kernel level via iptable / nftables packet mangling or tc. |
@dswarbrick I don't think setting TOS/DSCP require CAP_NET_ADMIN in k8s, I always configure DSCP (e.g. set to 63) and TCP KA (also one socket option) in istio ingress gateway, which default will drop all linux capabilities in securityContext. I also verified it in my docker image and k8s deployment with drop ALL. I captured the pkt to verify the DSCP value is correct. |
@hobbytp https://man7.org/linux/man-pages/man7/socket.7.html
These restrictions exist for a reason. On a multi-user system, if all applications marked all their traffic as highest priority, it would defeat the entire purpose of QoS. |
But I set DSCP by using following option level and name, instead of SO_PRIORITY. |
@hobbytp In that case... https://man7.org/linux/man-pages/man7/ip.7.html
|
I did some testing with SetTOS, and it looks like it successfully sets any desired TOS value, including high values like DSCP EF. Captured traffic confirms that the TOS value is being set as expected. The void __ip_sock_set_tos(struct sock *sk, int val)
{
if (sk->sk_type == SOCK_STREAM) {
val &= ~INET_ECN_MASK;
val |= inet_sk(sk)->tos & INET_ECN_MASK;
}
if (inet_sk(sk)->tos != val) {
inet_sk(sk)->tos = val;
sk->sk_priority = rt_tos2priority(val);
sk_dst_reset(sk);
}
} You would need to poke around in the mapping that Going back to the case SO_PRIORITY:
if ((val >= 0 && val <= 6) ||
sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
sk->sk_priority = val;
else
ret = -EPERM;
break; In summary, if you only care about the TOS bits being set and (presumably) honoured by downstream routers, then you can simply use SetTOS as @roidelapluie suggested. However, if you want to also ensure that the packets (or skbs, technically) are handled with high priority in the kernel, you may have to dig a little deeper. |
Thank dswarbrick for the testing and detail explanation from linux code level . It is quite clear and useful. #define TC_PRIO_BESTEFFORT 0
#define TC_PRIO_FILLER 1
#define TC_PRIO_BULK 2
#define TC_PRIO_INTERACTIVE_BULK 4
#define TC_PRIO_INTERACTIVE 6
#define TC_PRIO_CONTROL 7 //==>Only this one meanS SO_PRIORITY>6?
#define ECN_OR_COST(class) TC_PRIO_##class
const __u8 ip_tos2prio[16] = { //==>Key points: Seems all items <=6 ?
TC_PRIO_BESTEFFORT,
ECN_OR_COST(BESTEFFORT),
TC_PRIO_BESTEFFORT,
ECN_OR_COST(BESTEFFORT),
TC_PRIO_BULK,
ECN_OR_COST(BULK),
TC_PRIO_BULK,
ECN_OR_COST(BULK),
TC_PRIO_INTERACTIVE,
ECN_OR_COST(INTERACTIVE),
TC_PRIO_INTERACTIVE,
ECN_OR_COST(INTERACTIVE),
TC_PRIO_INTERACTIVE_BULK,
ECN_OR_COST(INTERACTIVE_BULK),
TC_PRIO_INTERACTIVE_BULK,
ECN_OR_COST(INTERACTIVE_BULK)
};
void ip_sock_set_tos(struct sock *sk, int val)
{
lock_sock(sk);
__ip_sock_set_tos(sk, val);
release_sock(sk);
}
void __ip_sock_set_tos(struct sock *sk, int val)
{
if (sk->sk_type == SOCK_STREAM) {
val &= ~INET_ECN_MASK;
val |= inet_sk(sk)->tos & INET_ECN_MASK;
}
if (inet_sk(sk)->tos != val) {
inet_sk(sk)->tos = val;
sk->sk_priority = rt_tos2priority(val); //==> so sk_priority always <=6?
sk_dst_reset(sk);
}
}
#define IPTOS_TOS_MASK 0x1E
#define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
static inline char rt_tos2priority(u8 tos)
{
return ip_tos2prio[IPTOS_TOS(tos)>>1]; //==>Seems only can return value <=6, e.g. DSCP EP(e.g. 46) will return 4
} |
Hi @dswarbrick Thanks a lot for the info regarding OpenShift. My colleage Hobby checked a number of CNIs, including Calico, Flannel, Weave Net, Cilium CNI and CNI-Genie. The only one that seems to support DSCP marking is ovn-kubernetes, as you pointed out, that is the default in OpenShift. So the only portable way to achieve this seems to be doing in Prometheus directly. Thanks |
To me it still sound like you are bringing more proof that this feature is too niche to be integrated upstream. |
Add ability to specify a DSCP value in IPv4/IPv6 header of outgoing HTTP requests