-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SecondaryNetwork breaks network connection of compute nodes #6448
Comments
This is very early during initialization (based on the logs), and at that point the uplink should not have been moved to the bridge yet, so there should still be connectivity and kube-proxy should be able to let you access the K8s API. Is it possible that there was a previous run of the Agent, with a different failure? I suppose there could have been an Agent crash after connecting the uplink to the bridge, in which case you could end up in this situation. |
It would be good to have outputs of the follow command:
And check the previous antrea-agent logs in the host |
I am on it. But since every install breaks the network connection I want to script the creation of the debug setup. |
ip route
ip addr
ovs-vsctl show
|
root@node2:/var/log/antrea# cat antrea-agent.ERROR
root@node2:/var/log/antrea# cat antrea-agent.FATAL
|
@meibensteiner there are logs missing here, maybe you should provide a tarball of the entire directory contents. In your original post, the transport interface was named "eth0", but now it is showing as "enp0s1", so I assume this is a different testbed? Finally, the network configuration looks correct to me. Do you not have connectivity to your network gateway (192.168.64.1)? IIRC, the log |
I reran the test and got that tarball. In order to regain access to the node via ssh I needed to uninstall the rke2-agent entirely though. Just FYI. var-log-antrea.tar.gz This is a different testbed. Had to get dev environment working again. My two nodes seem to both get the PodCIDR populated:
network gateway (192.168.64.1) is also not reachable when pinging. |
@hongliangl will be looking at this issue. |
@meibensteiner Thanks for providing the logs. After looking into the logs, I got these information,
More time is needed for further investigation. |
The root cause is antrea-agent doesn’t remove flow-restore-wait="true" when attaching the uplink and to the secondary OVS interface, so the “normal” flow on the secondary OVS bridge can’t forward packets between OVS internal port and uplink as expected. In the meanwhile, the Node IP NIC is the same as the physical NIC used in the secondary network, the disconnections from kube-apiserver and antrea-controller makes antrea-agent to stop itself because of the network errors. During this time, no chance to remove flow-restore-wait="true" from Open_vSwitch configurations caused the openflows never works as expected. A patch is created to resolve the issue. #6504 |
Describe the bug
When using the SecondaryNetwork via OVS feature the control-plane nodes start perfectly fine. On compute nodes the antrea-agent startup fails, leaving the nodes unaccessible.
To Reproduce
Ubuntu 24.04 LTS
RKE2 1.29
Helm chart values:
Expected
antrea-agent is supposed to attach eth0 to br-ext and the node continues to have network connection.
Actual behavior
antrea-agent on the agent node:
Versions:
Additional context
The text was updated successfully, but these errors were encountered: