Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[求助/Help]v3.11.2,计算节点离线,报:Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed #20317

Open
chenjacken opened this issue May 20, 2024 · 0 comments
Labels
question Further information is requested stale state/awaiting processing

Comments

@chenjacken
Copy link

chenjacken commented May 20, 2024

一,版本:
v3.11.2
部署了高可用。

二,有一台结算节点离线

1,POD信息

[root@master1 ~]# kubectl get pods -n onecloud -owide -w | grep node10
default-host-deployer-h7zff                          0/1     CrashLoopBackOff        251        19h   172.16.1.234    node10     <none>           <none>
default-host-health-t9vc5                            0/1     CrashLoopBackOff        251        19h   172.16.1.234    node10     <none>           <none>
default-host-image-gbgn5                             0/1     CrashLoopBackOff        251        19h   172.16.1.234    node10     <none>           <none>
default-host-xvztn                                   1/3     CrashLoopBackOff        494        19h   172.16.1.234    node10     <none>           <none>
default-telegraf-f52sw                               0/1     Init:CrashLoopBackOff   228        19h   172.16.1.234    node10     <none>           <none>

2,host日志:

[root@master1 ~]# kubectl logs default-host-xvztn -n onecloud -c host
[info 240520 02:46:56 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies
[info 240520 02:46:56 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/host.conf
[info 240520 02:46:56 options.parseOptions(options.go:357)] Set log level to "info"
[info 2024-05-20 02:46:56 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/common/common.conf
[info 2024-05-20 02:46:56 options.parseOptions(options.go:357)] Set log level to "info"
[info 2024-05-20 02:46:56 hostman.(*SHostService).InitService(host_services.go:64)] exec socket path: /var/run/onecloud/exec.sock
[info 2024-05-20 02:46:56 app.InitApp(app.go:32)] RequestWorkerCount: 8
[info 2024-05-20 02:46:56 appsrv.NewApplication(appsrv.go:121)] App hostId: 4bhtR-oqKZELSL1qp4GCmt0ZpOM= (host,node10,172.16.1.234)
2024/05/20 02:46:56 Allow hosts []
[info 2024-05-20 02:46:56 appsrv.(*Application).SetDefaultTimeout(appsrv.go:137)] adjust application default timeout to 60.000000 seconds
[info 2024-05-20 02:46:56 hostinfo.DetectCpuInfo(hostinfohelper.go:78)] cpuinfo freq 2700
[info 2024-05-20 02:46:56 hostinfo.NewHostInfo(hostinfo.go:2446)] CPU Model Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz Microcode 0x2006e05
[info 2024-05-20 02:46:56 hostinfo.NewHostInfo(hostinfo.go:2466)] Get kubelet container image Fs: /opt/docker, eviction config: {"evictionHard":{"imagefs.available":{"Signal":"imagefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}},"memory.available":{"Signal":"memory.available","Operator":"LessThan","Value":{"Quantity":"100Mi","Percentage":0}},"nodefs.available":{"Signal":"nodefs.available","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}},"nodefs.inodesFree":{"Signal":"nodefs.inodesFree","Operator":"LessThan","Value":{"Quantity":null,"Percentage":0.05}}}}
[error 2024-05-20 02:46:59 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:171)] no block device avaiable
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).prepareEnv(hostinfo.go:411)] I/O Scheduler switch to none
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).getKubeReservedMemMb(hostinfo.go:1572)] Kubelet memory threshold subtracted: 100MB
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).Init(hostinfo.go:196)] Start detectHostInfo
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:885)] KVM API VERSION 12
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:890)] KVM CAP MAX VCPUS: 288
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectKVMMaxCpus(hostinfo.go:898)] KVM CAP NR VCPUS: 240
[info 2024-05-20 02:46:59 sysutils.detectNestSupport(kvm.go:146)] Host is support kvm nest ...
[info 2024-05-20 02:46:59 sysutils.detectNestSupport(kvm.go:151)] Host kvm nest is enabled ...
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:778)] DetectOsDist CentOS Linux 7.9.2009
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectQemuVersion(hostinfo.go:852)] Detect qemu version is 4.2.0
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOvsVersion(hostinfo.go:993)] Detect OVS version is 2.12.4
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).detectOvsKOVersion(hostinfo.go:1010)] kernel module openvswitch vermagic:       5.4.130-1.yn20230805.el7.x86_64 SMP mod_unload modversions 
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).Init(hostinfo.go:205)] Start parseConfig
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:241)] IP 172.16.1.234/br0/bond1
[info 2024-05-20 02:46:59 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br0 already has ip 172.16.1.234
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:241)] IP 10.0.1.234/br1/bond0
[info 2024-05-20 02:46:59 hostbridge.(*SBaseBridgeDriver).ConfirmToConfig(hostbridge.go:180)] bridge br1 already has ip 10.0.1.234
[info 2024-05-20 02:46:59 hostinfo.NewNIC(hostinfohelper.go:291)] Confirm to configuration!!
[info 2024-05-20 02:46:59 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond1", Bridge:"br0", Ip:"172.16.1.234", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc00151ec60), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc00151f5f0)}
[info 2024-05-20 02:46:59 hostinfo.(*SNIC).SetupDhcpRelay(hostinfohelper.go:203)] Not enable dhcp relay on nic: &hostinfo.SNIC{Inter:"bond0", Bridge:"br1", Ip:"10.0.1.234", Wire:"", WireId:"", Mask:24, Bandwidth:1000, BridgeDev:(*hostbridge.SOVSBridgeDriver)(0xc0016e5590), dhcpServer:(*hostdhcp.SGuestDHCPServer)(0xc0016e5ec0)}
[info 2024-05-20 02:46:59 hostinfo.(*SHostInfo).setupOvnChassis(hostinfo.go:223)] Start setting up ovn chassis
goroutine 1 [running]:
runtime/debug.Stack()
        /usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/lib/go/src/runtime/debug/stack.go:16 +0x19
yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn.func1()
        /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:125 +0x3b
panic({0x2c24140, 0xc000b9e810})
        /usr/lib/go/src/runtime/panic.go:838 +0x207
yunion.io/x/onecloud/pkg/util/ovnutils.mustPrepOvsdbConfig({{0xc0016b9b40, 0x1b}, {0xc0016b7fa8, 0x5}, {0x0, 0x0}, {0xc0016b7f80, 0xa}, 0x5dc, {0xc0016b7fd0, ...}, ...})
        /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:93 +0x645
yunion.io/x/onecloud/pkg/util/ovnutils.InitOvn({{0xc0016b9b40, 0x1b}, {0xc0016b7fa8, 0x5}, {0x0, 0x0}, {0xc0016b7f80, 0xa}, 0x5dc, {0xc0016b7fd0, ...}, ...})
        /root/go/src/yunion.io/x/onecloud/pkg/util/ovnutils/ovnutils.go:130 +0xb8
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*OvnHelper).Init(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostovn.go:41
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).setupOvnChassis(0xc000e82000?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:225 +0xb8
yunion.io/x/onecloud/pkg/hostman/hostinfo.(*SHostInfo).Init(0x5674ad0?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/hostinfo/hostinfo.go:210 +0xdc
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc000010160?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:80 +0x6f
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e108)
        /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
        /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a
goroutine 1 [running]:
runtime/debug.Stack()
        /usr/lib/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/lib/go/src/runtime/debug/stack.go:16 +0x19
yunion.io/x/log.Fatalf({0x30fd118, 0x1c}, {0xc0016dfea8, 0x1, 0x1})
        /root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/log/log.go:138 +0x32
yunion.io/x/onecloud/pkg/hostman.(*SHostService).RunService(0xc000010160?)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:81 +0xb4
yunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e108)
        /root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xe4
yunion.io/x/onecloud/pkg/hostman.StartService(...)
        /root/go/src/yunion.io/x/onecloud/pkg/hostman/host_services.go:163
main.main()
        /root/go/src/yunion.io/x/onecloud/cmd/host/main.go:30 +0x10a
[fatal 2024-05-20 02:46:59 hostman.(*SHostService).RunService(host_services.go:81)] Host instance init error: Setup OVN Chassis: normalize db host: dns lookup (default-ovn-north) failed: lookup default-ovn-north on 10.96.0.10:53: no such host

3,计算节点上ipconfig信息:

[root@node10 ~]# ifconfig 
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 5903  bytes 868402 (848.0 KiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 43  bytes 2870 (2.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 20225  bytes 5402646 (5.1 MiB)
        RX errors 0  dropped 6  overruns 0  frame 0
        TX packets 11854  bytes 1268500 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.1.234  netmask 255.255.255.0  broadcast 172.16.1.255
        inet6 fe80::642:1aff:fecb:4b6a  prefixlen 64  scopeid 0x20<link>
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 14997  bytes 4428205 (4.2 MiB)
        RX errors 0  dropped 249  overruns 0  frame 0
        TX packets 10986  bytes 1159598 (1.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

br1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.1.234  netmask 255.255.255.0  broadcast 10.0.1.255
        inet6 fe80::9e74:1aff:fec1:8946  prefixlen 64  scopeid 0x20<link>
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 5361  bytes 724815 (707.8 KiB)
        RX errors 0  dropped 289  overruns 0  frame 0
        TX packets 19  bytes 1282 (1.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eno1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 2969  bytes 178698 (174.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd3620000-d363ffff  

eno2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 04:42:1a:cb:4b:6a  txqueuelen 1000  (Ethernet)
        RX packets 17265  bytes 5224750 (4.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11868  bytes 1272200 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xd3600000-d361ffff  

enp28s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 1332  bytes 330415 (322.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 22  bytes 1428 (1.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp28s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500
        ether 9c:74:1a:c1:89:46  txqueuelen 1000  (Ethernet)
        RX packets 4571  bytes 537987 (525.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 21  bytes 1442 (1.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

genev_sys_6081: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        inet6 fe80::ec61:f8ff:fe76:a380  prefixlen 64  scopeid 0x20<link>
        ether ee:61:f8:76:a3:80  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 13 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 306  bytes 18416 (17.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 306  bytes 18416 (17.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

4,host.conf的网络信息:

ovn_encap_ip: 10.0.1.234
networks:
- bond1/br0/172.16.1.234
- bond0/br1/10.0.1.234

没改动内容情况下,重启该计算节点就报错了。、

请求解决思路,排查问题点,谢谢!!!

@chenjacken chenjacken added the question Further information is requested label May 20, 2024
@github-actions github-actions bot added the stale label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale state/awaiting processing
Projects
None yet
Development

No branches or pull requests

1 participant