Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trivy operator throwing constantly reconcile errors #2137

Open
lkaluza-fadi opened this issue Jun 12, 2024 · 25 comments
Open

trivy operator throwing constantly reconcile errors #2137

lkaluza-fadi opened this issue Jun 12, 2024 · 25 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lkaluza-fadi
Copy link

What steps did you take and what happened:

Upgraded from helm version from 0.23.1 -> 0.23.3

What did you expect to happen:

That everything works smoothly

Anything else you would like to add:

This is the error that we get:

{"level":"error","ts":"2024-06-12T08:42:18Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-785c48587c","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-785c48587c","reconcileID":"624c0d2f-2cdb-4ea3-9d13-052f27ee7e87","error":"illegal base64 data at input byte 6; illegal base64 data at input byte 6","errorCauses":[{"error":"illegal base64 data at input byte 6"},{"error":"illegal base64 data at input byte 6"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-12T08:42:19Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-7cb7c95664","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-7cb7c95664","reconcileID":"5a67a1d8-fc0b-4e90-9991-d09bc2ba55e5","error":"illegal base64 data at input byte 6","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-12T08:42:50Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-6f777d44b8","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-6f777d44b8","reconcileID":"9dba26aa-115a-4787-8291-5ead70458e94","error":"illegal base64 data at input byte 6","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
  • Trivy-Operator version (use trivy-operator version): 0.21.3
  • Kubernetes version (use kubectl version): 1.28.9-gke.1000000
@lkaluza-fadi lkaluza-fadi added the kind/bug Categorizes issue or PR as related to a bug. label Jun 12, 2024
@chen-keinan
Copy link
Contributor

@lkaluza-fadi Please clean up all scan-jobs and restart operator.

kubectl delete jobs `kubectl get jobs -n trivy-system -o custom-columns=:.metadata.name`

@lkaluza-fadi
Copy link
Author

@chen-keinan After deleting the jobs, everything seems to be fine, but when the jobs were completed, the reconciliation errors returned.

@chen-keinan
Copy link
Contributor

@lkaluza-fadi the is the pod stuck in status completed ?

@lkaluza-fadi
Copy link
Author

@lkaluza-fadi the is the pod stuck in status completed ?

yes, thats correct.

@chen-keinan
Copy link
Contributor

@lkaluza-fadi can you please get it output and sent it (you can send it to me in slack if you do not want to expose it here)

kubectl logs pod <scan-pod-name> -n trivy-system

@lkaluza-fadi
Copy link
Author

@lkaluza-fadi can you please get it output and sent it (you can send it to me in slack if you do not want to expose it here)

kubectl logs pod <scan-pod-name> -n trivy-system

unfortunately there are no pod logs anymore.

@chen-keinan
Copy link
Contributor

@lkaluza-fadi are you able to reproduce it ?

@lkaluza-fadi
Copy link
Author

@lkaluza-fadi are you able to reproduce it ?

tried to reproduce it, but the logs are gone again

@chen-keinan
Copy link
Contributor

is the pod is stuck in completed status ? if so , logs should be there

@chen-keinan
Copy link
Contributor

@lkaluza-fadi do you get any reports ?

@lkaluza-fadi
Copy link
Author

lkaluza-fadi commented Jun 20, 2024

@chen-keinan yes, just send it over to you via email. the email that you have in your profil mentioned.

@chen-keinan
Copy link
Contributor

chen-keinan commented Jun 20, 2024

@lkaluza-fadi can you please do another check:

  1. uninstall trivy-operator : helm uninstall trivy-operator -n trivy-system
  2. delete all CRDs:
kubectl delete crd vulnerabilityreports.aquasecurity.github.io
    kubectl delete crd exposedsecretreports.aquasecurity.github.io
    kubectl delete crd configauditreports.aquasecurity.github.io
    kubectl delete crd clusterconfigauditreports.aquasecurity.github.io
    kubectl delete crd rbacassessmentreports.aquasecurity.github.io
    kubectl delete crd infraassessmentreports.aquasecurity.github.io
    kubectl delete crd clusterrbacassessmentreports.aquasecurity.github.io
    kubectl delete crd clustercompliancereports.aquasecurity.github.io
    kubectl delete crd clusterinfraassessmentreports.aquasecurity.github.io
    kubectl delete crd sbomreports.aquasecurity.github.io
    kubectl delete crd clustersbomreports.aquasecurity.github.io
    kubectl delete crd clustervulnerabilityreports.aquasecurity.github.io
  1. make sure no pods or jobs running in trivy-system namespace

  2. re-install trivy-operator again with helm and set this flag to false

@lkaluza-fadi
Copy link
Author

@lkaluza-fadi can you please do another check:

  1. uninstall trivy-operator : helm uninstall trivy-operator -n trivy-system
  2. delete all CRDs:
kubectl delete crd vulnerabilityreports.aquasecurity.github.io
    kubectl delete crd exposedsecretreports.aquasecurity.github.io
    kubectl delete crd configauditreports.aquasecurity.github.io
    kubectl delete crd clusterconfigauditreports.aquasecurity.github.io
    kubectl delete crd rbacassessmentreports.aquasecurity.github.io
    kubectl delete crd infraassessmentreports.aquasecurity.github.io
    kubectl delete crd clusterrbacassessmentreports.aquasecurity.github.io
    kubectl delete crd clustercompliancereports.aquasecurity.github.io
    kubectl delete crd clusterinfraassessmentreports.aquasecurity.github.io
    kubectl delete crd sbomreports.aquasecurity.github.io
    kubectl delete crd clustersbomreports.aquasecurity.github.io
    kubectl delete crd clustervulnerabilityreports.aquasecurity.github.io
  1. make sure no pods or jobs running in trivy-system namespace
  2. re-install trivy-operator again with helm and set this flag to false

done that!

@lkaluza-fadi
Copy link
Author

and what changed so far is that the pods for the jobs are now gone after they are done. and for that reason the operator is not logging any reconcile errors any more.

@chen-keinan
Copy link
Contributor

@lkaluza-fadi not sure I understand the question. are you getting reports after the change above ?

@lkaluza-fadi
Copy link
Author

@chen-keinan to wrap this up. the reconcile errors are back, but they are now a bit different

{"level":"error","ts":"2024-06-24T10:44:56Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-6f849756bb","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-6f849756bb","reconcileID":"71886afd-c52b-45e5-a36c-b7737c65d5cf","error":"invalid character 'u' looking for beginning of value; invalid character 'u' looking for beginning of value","errorCauses":[{"error":"invalid character 'u' looking for beginning of value"},{"error":"invalid character 'u' looking for beginning of value"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T10:47:00Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-65df45bb54","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-65df45bb54","reconcileID":"beaf874f-73ca-473d-875e-ea520c90018b","error":"invalid character 'u' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T10:47:01Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-8448d97cbb","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-8448d97cbb","reconcileID":"dfdd5a68-09c3-45c5-a880-f90bbb0f88cb","error":"invalid character 'u' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}
{"level":"error","ts":"2024-06-24T10:59:04Z","msg":"Reconciler error","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"scan-vulnerabilityreport-599dbf4488","namespace":"trivy-system"},"namespace":"trivy-system","name":"scan-vulnerabilityreport-599dbf4488","reconcileID":"8ecb8af1-af3e-4fcb-b2ff-8294f41b7e63","error":"invalid character 'u' looking for beginning of value","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"}

so back to your question we are getting reports after then changes. but I think we are back to the beginning getting this reconcile error but now in a different flavor!

@chen-keinan
Copy link
Contributor

@lkaluza-fadi I'll be happy to jump-in a zoom call to look at the issue, its very difficult to find what is wrong in your env.

@lkaluza-fadi
Copy link
Author

@chen-keinan iam fine with it when does it fit for you?

@chen-keinan
Copy link
Contributor

chen-keinan commented Jun 24, 2024

@lkaluza-fadi find me on slack we can discuss schedule details there

@chen-keinan
Copy link
Contributor

@lkaluza-fadi I mean find me via aqua security slack

@lkaluza-fadi
Copy link
Author

@chen-keinan I'm not using slack how do i do so?

@daanschipper
Copy link

This seems related to #1792.

@mib93
Copy link

mib93 commented Jul 29, 2024

Hi, I'm facing the same problem:
image

is there any solution?

@Xeroxxx
Copy link

Xeroxxx commented Sep 3, 2024

My cluster starting to have the same issue. Already reinstalled trviy-operator.

EDIT: Running on 1.31. Kubernetes SuccessPolicy changed.
#2251

@benni-as
Copy link

benni-as commented Sep 5, 2024

Same error:

{
  "level": "error",
  "ts": "2024-09-05T09:42:22Z",
  "msg": "Reconciler error",
  "controller": "job",
  "controllerGroup": "batch",
  "controllerKind": "Job",
  "Job": {
    "name": "scan-vulnerabilityreport-86c64f59b9",
    "namespace": "trivy-operator"
  },
  "namespace": "trivy-operator",
  "name": "scan-vulnerabilityreport-86c64f59b9",
  "reconcileID": "18547f15-5d01-42ed-b1b4-f208335a0fae",
  "error": "unrecognized scan job condition: SuccessCriteriaMet",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}

I am using the lastest helm chart 0.24.1 and I don't see any vulnerability or sbom reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants