Frontend pod not coming up for a Non-GKE cluster on AWS EC2 #2151

piyushsachdeva · 2024-07-05T11:18:28Z

Describe the bug

I have a 3 node Kubernetes cluster (1.30) running on AWS EC2 machines. I have followed the instructions for NON-GKE cluster and set the environment variables as below for all the manifests

            - name: ENABLE_TRACING
              value: "false"
            - name: ENABLE_METRICS
              value: "false"

When I applied the yaml, all the pods came up except the frontend.

Please find the below logs from the frontend pod

{"timestamp": "2024-07-05 11:08:10", "message": "create_app | Unable to retrieve cluster name from metadata server metadata.google.internal.", "severity": "WARNING"}
{"timestamp": "2024-07-05 11:08:16", "message": "critical | WORKER TIMEOUT (pid:10)", "severity": "CRITICAL"}
{"timestamp": "2024-07-05 11:08:17", "message": "error | Worker (pid:10) was sent SIGKILL! Perhaps out of memory?", "severity": "ERROR"}
{"timestamp": "2024-07-05 11:08:17", "message": "info | Booting worker with pid: 11", "severity": "INFO"}
{"timestamp": "2024-07-05 11:08:41", "message": "create_app | Unable to retrieve cluster name from metadata server metadata.google.internal.", "severity": "WARNING"}
{"timestamp": "2024-07-05 11:08:47", "message": "critical | WORKER TIMEOUT (pid:11)", "severity": "CRITICAL"}
{"timestamp": "2024-07-05 11:08:48", "message": "error | Worker (pid:11) was sent SIGKILL! Perhaps out of memory?", "severity": "ERROR"}
{"timestamp": "2024-07-05 11:08:48", "message": "info | Booting worker with pid: 12", "severity": "INFO"}

To Reproduce

Followed the instructions from Non-GKE setup readme

Logs

kubectl logs frontend-6b7668877-kp6nw

{"timestamp": "2024-07-05 11:08:10", "message": "create_app | Unable to retrieve cluster name from metadata server metadata.google.internal.", "severity": "WARNING"}
{"timestamp": "2024-07-05 11:08:16", "message": "critical | WORKER TIMEOUT (pid:10)", "severity": "CRITICAL"}
{"timestamp": "2024-07-05 11:08:17", "message": "error | Worker (pid:10) was sent SIGKILL! Perhaps out of memory?", "severity": "ERROR"}
{"timestamp": "2024-07-05 11:08:17", "message": "info | Booting worker with pid: 11", "severity": "INFO"}
{"timestamp": "2024-07-05 11:08:41", "message": "create_app | Unable to retrieve cluster name from metadata server metadata.google.internal.", "severity": "WARNING"}
{"timestamp": "2024-07-05 11:08:47", "message": "critical | WORKER TIMEOUT (pid:11)", "severity": "CRITICAL"}
{"timestamp": "2024-07-05 11:08:48", "message": "error | Worker (pid:11) was sent SIGKILL! Perhaps out of memory?", "severity": "ERROR"}
{"timestamp": "2024-07-05 11:08:48", "message": "info | Booting worker with pid: 12", "severity": "INFO"}

output from kubectl describe pod

  Warning  Unhealthy  3m7s (x3 over 4m7s)   kubelet            Liveness probe failed: Get "http://192.168.189.72:8080/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    3m7s                  kubelet            Container front failed liveness probe, will be restarted
  Normal   Pulled     3m2s (x2 over 5m52s)  kubelet            Container image "us-central1-docker.pkg.dev/bank-of-anthos-ci/bank-of-anthos/frontend:v0.6.4@sha256:f25db63509515fb6caf98c8c76e906f3c2868e345767d12565ab3750e52963f0" already present on machine
  Warning  Unhealthy  3m2s                  kubelet            Readiness probe failed: Get "http://192.168.189.72:8080/ready": read tcp 172.31.86.155:36298->192.168.189.72:8080: read: connection reset by peer
  Warning  Unhealthy  3m2s                  kubelet            Readiness probe failed: Get "http://192.168.189.72:8080/ready": dial tcp 192.168.189.72:8080: connect: connection refused
  Warning  Unhealthy  51s (x27 over 5m27s)  kubelet            Readiness probe failed: Get "http://192.168.189.72:8080/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Screenshots

https://github.com/GoogleCloudPlatform/bank-of-anthos/assets/40286378/bdf3f99f-1939-4294-8edd-f39d5086d059

Environment

Ubuntu latest 3 node cluster on AWS EC2 Kubernetes 1.30

Additional context

I have performed the fresh installation and bumped up the memory for the pod. Even then, I am facing the same issue

Exposure

persistent for me

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontend pod not coming up for a Non-GKE cluster on AWS EC2 #2151

Frontend pod not coming up for a Non-GKE cluster on AWS EC2 #2151

piyushsachdeva commented Jul 5, 2024 •

edited by bourgeoisor

Loading

Frontend pod not coming up for a Non-GKE cluster on AWS EC2 #2151

Frontend pod not coming up for a Non-GKE cluster on AWS EC2 #2151

Comments

piyushsachdeva commented Jul 5, 2024 • edited by bourgeoisor Loading

Describe the bug

To Reproduce

Logs

Screenshots

Environment

Additional context

Exposure

piyushsachdeva commented Jul 5, 2024 •

edited by bourgeoisor

Loading