You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Trivy to scan a Kubernetes cluster, the scan process gets stuck if any node in the cluster has taints applied. For example, a control-plane node with the taint node-role.kubernetes.io/control-plane causes this issue.
2024-12-12T17:56:21+06:00 FATAL Fatal error get k8s artifacts with node info error: running node-collector job: runner received timeout
To improve usability, Trivy should handle such cases more gracefully. It could skip nodes that cannot be scanned without additional tolerations applied, instead of causing the scan to get stuck.
Desired behavior:
Skip nodes that require tolerations to scan.
Provide clear warnings or logs about the skipped nodes.
Workaround:
You can set up tolirations through a specific flag:
I'm running .trivy ... commands on my laptop based here in Melbourne, Australia, against a cluster in AWS us-east-2 region. I believe the latency is playing havoc for me. I've just successfully finished a scan using an LXC container local to the cluster. After that, I ran another scan from my local laptop with --timeout 1h and it finished successfully and this time I get to see the output from all scanner types.
I was starting to feel dense thinking I just couldn't master such a simple task. I believe using --scanners vuln was helping in that the scan was limited to just the vulnerability scan.
The only critique I have is that there's no indication anything is happening when the scan actually begins - i.e. after the downloads are done. A progress bar would go a long way to alleviate the urge to ctrl-c out of it thinking it is stuck...
A full scan for me took ~18 minutes. I think the default of 5m is too short for a full cluster scan especially when there are multiple deployments in a cluster - I only have 25 deployments in this PoC cluster. Increasing this to 30 minutes and adding a progress bar will go a long way in helping those new to Trivy and running scans against remote clusters.
Alternatively, maybe implement something like how k8s does CrashLoopBackOff but in reverse - i.e. reset the timeout counter whenever the scan moves a step forward.
Description
When using Trivy to scan a Kubernetes cluster, the scan process gets stuck if any node in the cluster has taints applied. For example, a control-plane node with the taint
node-role.kubernetes.io/control-plane
causes this issue.To improve usability, Trivy should handle such cases more gracefully. It could skip nodes that cannot be scanned without additional tolerations applied, instead of causing the scan to get stuck.
Desired behavior:
Workaround:
You can set up tolirations through a specific flag:
$ trivy k8s --report summary --tolerations node-role.kubernetes.io/control-plane="":NoSchedule
Steps to Reproduce
$ kind delete cluster --name cilium && kind create cluster --config config.yaml $ kubectl get nodes $ trivy k8s --report summary
config.yaml
Discussed in #5639 (comment)
The text was updated successfully, but these errors were encountered: