Skip to content
This repository has been archived by the owner on Sep 5, 2024. It is now read-only.

Tekton issues when upgrading from 0.7 to 0.8 #1065

Open
yuregir opened this issue Nov 24, 2020 · 0 comments
Open

Tekton issues when upgrading from 0.7 to 0.8 #1065

yuregir opened this issue Nov 24, 2020 · 0 comments
Labels
bug Something isn't working

Comments

@yuregir
Copy link

yuregir commented Nov 24, 2020

Describe the bug

I am writing this for guide other people having similar issue.

This happened when I upgraded from 0.70-rc2 to 0.8,

After the update, (downloading new binary and doing rio install)
First problem I had with tekton was because it cannot bind configmap (named config-logging), tekton pods were failing to start.

The error in the logs are

Internal error occurred: failed calling webhook "config.webhook.pipeline.tekton.dev": Post https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=30s: no endpoints available for service "tekton-pipelines-webhook"

When I dig the logs and github issues, I found its cause;

This is due to a circular dependency.
The tekton-pipelines-webhook pod can't start if it doesn't have the configmap, but the configmap can't be installed because it can't reach the tekton-pipelines-webhook pod for validation.

Link of the issue and fix on tekton repo here
Fix

To summarize fix is deleting old webhook resources

kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io config.webhook.pipeline.tekton.dev
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io validation.webhook.pipeline.tekton.dev
kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io webhook.pipeline.tekton.dev

After this fix tekton pods started spawning, but I was getting new error and pods wont start.

New error in the logs is

OCI runtime create failed: container_linux.go:345: starting container process caused "chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied": unknown

After some search I found out the problem is caused by the security settings in the Pod.
Issue Link
Fix

Temporary solution:
I had to change the Deployment for both the webhook and the controller to change runAsUser from 1001 -> 65532.

After this fix tekton started working properly. Why I am calling this temporary because, If I change anything in rio-config, rio-controller restarts, after this restart, runAsUser going back to 1001 and tekton stops working.

In current state, my local code on pc builds on rio and works, but my code at github repo isnt building.

rio ps output of github repo build:

not ready; BuildDeployed: failed to update dev/iot-dashboard-9d5c1-b47e9 tekton.dev/v1alpha1, Kind=TaskRun for service-build dev/iot-dashboard: admission webhook "webhook.pipeline.tekton.dev" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "digest"(Error); iot-dashboard waiting on build

I wish to show you build-history logs but, rio build-history throwing fatal error

$ rio build-history

FATA[0000] template: :1:44: executing "" at <findRevision>: error calling findRevision: runtime error: invalid memory address or nil pointer dereference

I am not able to fix this, please help me to find a solution.

(Tried uninstalling/reinstalling rio several times but no luck, then I gave up and rollback to 0.7.1, everything working properly in 0.7)

Expected behavior

Clean upgrade from 0.7 to 0.8

Kubernetes version & type (GKE, on-prem): kubectl version

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Rio version: rio info

Rio Version: v0.8.0 (af7ad687)
Rio CLI Version: v0.8.0 (af7ad687)
Cluster Domain: service.metacore.io
Cluster Domain IPs:
System Namespace: rio-system
Wildcard certificates: service.metacore.io(true)
@yuregir yuregir added the bug Something isn't working label Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant