bug(k8s): Trivy gets stuck when scanning a cluster with taints on nodes #8087

afdesk · 2024-12-12T11:38:31Z

Description

When using Trivy to scan a Kubernetes cluster, the scan process gets stuck if any node in the cluster has taints applied. For example, a control-plane node with the taint node-role.kubernetes.io/control-plane causes this issue.

2024-12-12T17:56:21+06:00	FATAL	Fatal error	get k8s artifacts with node info error: running node-collector job: runner received timeout

To improve usability, Trivy should handle such cases more gracefully. It could skip nodes that cannot be scanned without additional tolerations applied, instead of causing the scan to get stuck.

Desired behavior:

Skip nodes that require tolerations to scan.
Provide clear warnings or logs about the skipped nodes.

Workaround:

You can set up tolirations through a specific flag:

$ trivy k8s --report summary --tolerations node-role.kubernetes.io/control-plane="":NoSchedule

Steps to Reproduce

$ kind delete cluster --name cilium && kind create cluster --config config.yaml
$ kubectl get nodes  
$ trivy k8s --report summary

config.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: cilium
nodes:
  # controlplane nodes
- role: control-plane
  image: kindest/node:v1.31.2
- role: control-plane
  image: kindest/node:v1.31.2
- role: control-plane
  image: kindest/node:v1.31.2
  # worker nodes
- role: worker
  image: kindest/node:v1.31.2
- role: worker
  image: kindest/node:v1.31.2
- role: worker
  image: kindest/node:v1.31.2

Discussed in #5639 (comment)

The text was updated successfully, but these errors were encountered:

ak2766 · 2024-12-13T04:59:32Z

I've discovered why I'm getting failures.

I'm running .trivy ... commands on my laptop based here in Melbourne, Australia, against a cluster in AWS us-east-2 region. I believe the latency is playing havoc for me. I've just successfully finished a scan using an LXC container local to the cluster. After that, I ran another scan from my local laptop with --timeout 1h and it finished successfully and this time I get to see the output from all scanner types.

I was starting to feel dense thinking I just couldn't master such a simple task. I believe using --scanners vuln was helping in that the scan was limited to just the vulnerability scan.

@afdesk - Thanks muchly for your patience.

The only critique I have is that there's no indication anything is happening when the scan actually begins - i.e. after the downloads are done. A progress bar would go a long way to alleviate the urge to ctrl-c out of it thinking it is stuck...

ak2766 · 2024-12-13T05:13:47Z

A full scan for me took ~18 minutes. I think the default of 5m is too short for a full cluster scan especially when there are multiple deployments in a cluster - I only have 25 deployments in this PoC cluster. Increasing this to 30 minutes and adding a progress bar will go a long way in helping those new to Trivy and running scans against remote clusters.

Alternatively, maybe implement something like how k8s does CrashLoopBackOff but in reverse - i.e. reset the timeout counter whenever the scan moves a step forward.

ak2766 · 2024-12-13T05:24:53Z

One other thing I forgot to mention was that my v1.30.4 cluster is still using master for the taints - i.e.:

$ > kubectl get nodes -l node-role.kubernetes.io/control-plane -o jsonpath="{range .items[*]}{.metadata.name}{': '}{.spec.taints}{'\n'}{end}"
node1: [{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]

afdesk added target/kubernetes Issues relating to kubernetes cluster scanning bug labels Dec 12, 2024

afdesk self-assigned this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(k8s): Trivy gets stuck when scanning a cluster with taints on nodes #8087

bug(k8s): Trivy gets stuck when scanning a cluster with taints on nodes #8087

afdesk commented Dec 12, 2024 •

edited

Loading

ak2766 commented Dec 13, 2024 •

edited

Loading

ak2766 commented Dec 13, 2024

ak2766 commented Dec 13, 2024

bug(k8s): Trivy gets stuck when scanning a cluster with taints on nodes #8087

bug(k8s): Trivy gets stuck when scanning a cluster with taints on nodes #8087

Comments

afdesk commented Dec 12, 2024 • edited Loading

Description

Desired behavior:

Workaround:

Steps to Reproduce

Discussed in #5639 (comment)

ak2766 commented Dec 13, 2024 • edited Loading

ak2766 commented Dec 13, 2024

ak2766 commented Dec 13, 2024

afdesk commented Dec 12, 2024 •

edited

Loading

ak2766 commented Dec 13, 2024 •

edited

Loading