Harbor is throwing 503s #21250

veerendra2 · 2024-11-26T15:00:10Z

We get 503 errors once in a while for /v2/* endpoint and we have to re-run the github pipeline to make it work.

We run harbor on AKS kubernetes cluster with istio virtual service behind
VirtualService spec -> https://gist.github.com/veerendra2/5bae7d5dcfbceb641797acd767bf9a2c
helm value file -> https://gist.github.com/veerendra2/2f109eb8bfc3bda03e6112248ac86d5a
Below is the setup, deployed harbor via helm

$ k get pods -n harbor
NAME                                 READY   STATUS    RESTARTS      AGE
harbor-core-5d79cd78d5-f8lrs         2/2     Running   0             107m
harbor-core-5d79cd78d5-gxcdq         2/2     Running   0             107m
harbor-core-5d79cd78d5-nf2z2         2/2     Running   0             107m
harbor-exporter-7946548cbd-vnp7m     2/2     Running   0             107m
harbor-jobservice-66bf496776-6qv4h   2/2     Running   0             107m
harbor-jobservice-66bf496776-kmp6p   2/2     Running   0             107m
harbor-jobservice-66bf496776-pksgz   2/2     Running   0             106m
harbor-portal-5f47b6c7fb-478nv       2/2     Running   0             107m
harbor-portal-5f47b6c7fb-5kcq8       2/2     Running   0             107m
harbor-portal-5f47b6c7fb-kfnkg       2/2     Running   0             107m
harbor-postgres-db-0                 4/4     Running   1 (88m ago)   88m
harbor-postgres-db-1                 4/4     Running   2 (86m ago)   86m
harbor-postgres-db-2                 4/4     Running   2 (87m ago)   87m
harbor-registry-557df7f8c5-jqwxz     3/3     Running   0             107m
harbor-registry-557df7f8c5-tdr8w     3/3     Running   0             107m
harbor-registry-557df7f8c5-wlzf8     3/3     Running   0             107m
harbor-trivy-0                       2/2     Running   0             106m
harbor-trivy-1                       2/2     Running   0             106m
harbor-trivy-2                       2/2     Running   0             107m
redis-node-0                         4/4     Running   0             27h
redis-node-1                         4/4     Running   0             27h
redis-node-2                         4/4     Running   0             27h

We are able to access harbor via portal without any problem

Attaching screenshots

Harbor metrics grafana dashboard
istio-gateway logs for harbor which HTTP response 503

Steps to reproduce the problem:

Deploy harbor via helm chart with redis and postgres
Set nginx proxy replicas to 0
Deploy VirtualService to access harbor

Versions:
Please specify the versions of following systems.

harbor helm chart version: 1.16.0
harbor version: [v2.12.0](https://github.com/goharbor/harbor/releases/tag/v2.12.0)
AKS kubernetes version: v1.30.3

Additional context:

harbor-core-logs.csv

The text was updated successfully, but these errors were encountered:

stonezdj · 2024-11-27T03:24:36Z

maybe something wrong with the redis and postgres. postgres DB restart 88 minute ago, what is the deployment type of three postgresql instance?
for redis:

2024-11-26 15:37:20;redis: 2024/11/26 14:37:20 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.9.47:51360->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:20;redis: 2024/11/26 14:37:20 sentinel.go:587: sentinel: GetMasterAddrByName name="mymaster" failed: EOF
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 sentinel.go:587: sentinel: GetMasterAddrByName name="mymaster" failed: read tcp 10.244.20.157:60812->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.2.30:56824->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.2.30:56838->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.2.30:56854->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.20.157:60700->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.20.157:60708->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: read tcp 10.244.9.47:51270->10.244.0.149:26379: read: connection reset by peer
2024-11-26 15:37:19;redis: 2024/11/26 14:37:19 pubsub.go:159: redis: discarding bad PubSub connection: EOF

veerendra2 · 2024-11-27T11:28:52Z

@stonezdj We use https://github.com/zalando/postgres-operator to manage postgresql

$ k get postgresql
NAME                 TEAM     VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE    STATUS
harbor-postgres-db   harbor   15        3      2Gi      100m          1024Mi           130d   Running

$ k get sts
NAME                 READY   AGE
harbor-postgres-db   3/3     19h
harbor-trivy         3/3     130d
redis-node           3/3     130d

$ k get svc
NAME                        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
harbor                      ClusterIP   10.0.134.156   <none>        80/TCP                       130d
harbor-core                 ClusterIP   10.0.198.244   <none>        80/TCP,8001/TCP              130d
harbor-exporter             ClusterIP   10.0.255.72    <none>        8001/TCP                     49d
harbor-jobservice           ClusterIP   10.0.233.25    <none>        80/TCP,8001/TCP              130d
harbor-portal               ClusterIP   10.0.129.86    <none>        80/TCP                       130d
harbor-postgres-db          ClusterIP   10.0.218.184   <none>        5432/TCP                     130d
harbor-postgres-db-config   ClusterIP   None           <none>        <none>                       130d
harbor-postgres-db-repl     ClusterIP   10.0.5.126     <none>        5432/TCP                     130d
harbor-registry             ClusterIP   10.0.234.173   <none>        5000/TCP,8080/TCP,8001/TCP   130d
harbor-trivy                ClusterIP   10.0.134.5     <none>        8080/TCP                     130d
patroni-metrics             ClusterIP   10.0.31.14     <none>        9547/TCP                     30d
postgres-exporter           ClusterIP   10.0.197.179   <none>        9187/TCP                     30d
redis                       ClusterIP   10.0.186.250   <none>        6379/TCP,26379/TCP           130d
redis-headless              ClusterIP   None           <none>        6379/TCP,26379/TCP           130d
redis-metrics               ClusterIP   10.0.250.114   <none>        9121/TCP                     130d

We updated postgresql sidecar container before, that's why there were some restarts.

Please let me know any further details are needed

EDIT

Attaching jager traces on an endpoint

veerendra2 · 2024-11-27T12:05:08Z

harbor-core throwing 404 errors. I can see in istio-proxy containers logs below

I increased the harbor-core replicas from 3 to 5 and see any improvements

EDIT

Attaching harbor-core DEBUG logs in last 24 hours harbor-core-debug-logs.csv

veerendra2 · 2024-11-27T14:51:32Z

Update

I even check a sha256 layer path really exists in my storage account, indeed the sha256 layer for the image is exists in storage account

Searched same sha256 layer in Azure Storage Account Explorer

And also there are lot of ClientErrors/Failed transactions in Azure storage account insights(Anyways these ClientErrors are exists for long time)

veerendra2 · 2024-11-28T11:12:14Z

It seems mainly the upstream (harbor-core) is resetting the connection. By the way increased number replicas from 3 to 5

$ k get deploy
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
harbor-core         5/5     5            5           131d
harbor-exporter     1/1     1            1           50d
harbor-jobservice   5/5     5            5           131d
harbor-nginx        0/0     0            0           131d
harbor-portal       3/3     3            3           131d
harbor-registry     5/5     5            5           131d

Still, same, getting 503

stonezdj · 2024-12-02T05:33:58Z

What is the output in the harbor-core log? Harbor core doesn't throw 503 error in the program, this error is usually thrown by front end components.

veerendra2 · 2024-12-02T09:55:31Z

@stonezdj

What is the output in the harbor-core log?

I already attached harbor-core debug logs here

Upstream (harbor-core) resetting the connection, that's why istio-proxy sidecar throwing 503.

We had to add retry for virtualservice like below to fix login attempt to https://[REDACTED]/v2/ failed with status: 503 Service Unavailable

    match:
    - uri:
        prefix: /v2/
    retries:
      attempts: 3
      retryOn: "503"

hajnalmt · 2024-12-03T10:12:20Z

Hello @veerendra2
I can see some proxy-cache errors in the core logs.

Can you add some additional logs from the istio-proxy (for all the core pods), actual container logs instead of the klogs output?

I am curious if some of your traffic is routed towards the blackhole cluster. Additinally are you using REGISTRY_ONLY outbound traffic policy instead of ALLOW_ANY? If yes, are you properly configuring the service-entry, the destination rules and the gateway?

Most of the 503 I investigated on Istio were solely because the traffic was routed to the blackhole cluster due to a misconfigured virtualservice, gateway or destinationrule.

veerendra2 · 2024-12-04T12:12:01Z

@hajnalmt

Can you add some additional logs from the istio-proxy (for all the core pods), actual container logs instead of the klogs output?

Please find the attachment for logs of istio-proxy (harbor-core pods) ->
kobs-export-logs.log

I am curious if some of your traffic is routed towards the blackhole cluster.

If this the case, this should happen all the times, but in my case it is happening once in a while. After adding retries, there is lot better client experience and almost no 503s from clients(i.e docker login in Github actions and docker pulls)

Additinally are you using REGISTRY_ONLY outbound traffic policy instead of ALLOW_ANY? If yes, are you properly configuring the service-entry, the destination rules and the gateway?

It should be ALLOW_ANY(We didn't set any out bound traffic policy), and there are service in our cluster able to access stuff outside of cluster/mesh(For example, Azure blob, etc).
Yesterday, I also added SericeEntry and DestinationRule to see any improvements(but still same, I dont see any)
-> https://gist.github.com/veerendra2/5f946d073aff391aff894407bc646281

EDIT
Forgot mention before, below DestinationRule already exists before

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  labels:
    app: harbor-core
    kustomize.toolkit.fluxcd.io/name: harbor
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: harbor-core
  namespace: harbor
spec:
  host: harbor-core.harbor.svc.cluster.local
  trafficPolicy:
    connectionPool:
      http:
        idleTimeout: 50s
    loadBalancer:
      simple: LEAST_REQUEST

And similar for harbor-portal

veerendra2 · 2024-12-04T12:43:08Z

And also there are still lot of client errors shown in storage account insights

veerendra2 changed the title ~~Error: Error response from daemon: login attempt to /v2/ failed with status: 503 Service Unavailable~~ Harbor is throwing 503s Nov 26, 2024

reasonerjt added the help wanted The issues that is valid but needs help from community label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harbor is throwing 503s #21250

Harbor is throwing 503s #21250

veerendra2 commented Nov 26, 2024 •

edited

Loading

stonezdj commented Nov 27, 2024

veerendra2 commented Nov 27, 2024 •

edited

Loading

veerendra2 commented Nov 27, 2024 •

edited

Loading

veerendra2 commented Nov 27, 2024

veerendra2 commented Nov 28, 2024

stonezdj commented Dec 2, 2024 •

edited

Loading

veerendra2 commented Dec 2, 2024

hajnalmt commented Dec 3, 2024

veerendra2 commented Dec 4, 2024 •

edited

Loading

veerendra2 commented Dec 4, 2024 •

edited

Loading

Harbor is throwing 503s #21250

Harbor is throwing 503s #21250

Comments

veerendra2 commented Nov 26, 2024 • edited Loading

stonezdj commented Nov 27, 2024

veerendra2 commented Nov 27, 2024 • edited Loading

veerendra2 commented Nov 27, 2024 • edited Loading

veerendra2 commented Nov 27, 2024

veerendra2 commented Nov 28, 2024

stonezdj commented Dec 2, 2024 • edited Loading

veerendra2 commented Dec 2, 2024

hajnalmt commented Dec 3, 2024

veerendra2 commented Dec 4, 2024 • edited Loading

veerendra2 commented Dec 4, 2024 • edited Loading

veerendra2 commented Nov 26, 2024 •

edited

Loading

veerendra2 commented Nov 27, 2024 •

edited

Loading

veerendra2 commented Nov 27, 2024 •

edited

Loading

stonezdj commented Dec 2, 2024 •

edited

Loading

veerendra2 commented Dec 4, 2024 •

edited

Loading

veerendra2 commented Dec 4, 2024 •

edited

Loading