Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink Native HA on Kubernetes is not supported #243

Open
borah-hemanga opened this issue Dec 6, 2021 · 4 comments
Open

Flink Native HA on Kubernetes is not supported #243

borah-hemanga opened this issue Dec 6, 2021 · 4 comments

Comments

@borah-hemanga
Copy link

I tried out the native HA on Kubernetes using the operator.

Here is the general synopsis:

  • If I start a fresh new deployment, the deployment succeeds and the application comes up perfectly
  • If I try to perform a deployment (update) of an existing application, then the deployment fails.

The deployment (update) of an existing application goes through the following:

  • New job manager and task manager pods for the new deployment are created
  • The old job is canceled on the old pods
  • The new job manager tries to start the job and prints "Submitting Job with JobId=<>", but fails repeatedly with "The connection was unexpectedly closed by the client."
  • The old job manager eventually starts a job with a new job id
  • The new pods are destroyed and the old cluster continues running with the old code

Has anyone been successful in using Native K8s HA with Flink with this FlinkK8sOperator?

@nikolasten
Copy link

nikolasten commented Dec 14, 2021

You will need to change kubernetes.cluster-id config every time you want to deploy a flink app (increment it or take current timestamp) on any FlinkApplication config change. That way when operator starts upgrading and new cluster starts up, it wont try to behave as failover of existing cluster you are running.

I think for operator to support scenario of same kubernetes.cluster-id would need to first shutdown the job that is already running and stop the cluster. And then start the new cluster and deploy the app. Currently its trying to minimize the downtime with having both clusters running during upgrade. Would be nice to have that mode too

@anandswaminathan
Copy link
Contributor

@nikolasten Is it only kubernetes.cluster-id?

@anandswaminathan
Copy link
Contributor

It's here

env.Value = fmt.Sprintf("%s\nhigh-availability.cluster-id: %s-%s\n", env.Value, app.Name, hash)

@nikolasten
Copy link

This is config option for zookeeper only, and not for kuberenetes ha. We did this in our fork to enable it and to make sure its different every time we deploy or upgrade the app.
bluelabs-eu@fa64278#diff-0e21f32f488d8c4a8aeb58de476274825e4004216515b5bcbcbe0045efe08b00R215-R218

This pr here #170 address the changing of cluster id every time. But it does not add config option for kuberenetes based ha mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants