Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Add PodTemplate support for Spark driver and executor pods #4105

Open
2 tasks done
andrewwdye opened this issue Sep 29, 2023 · 9 comments
Open
2 tasks done
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request

Comments

@andrewwdye
Copy link
Contributor

Motivation: Why do you think this is important?

Today Spark driver and executor pods are not configurable on a per task basis; they use the shared propeller k8s plugin config. This disallows, for instance, setting additional tolerations when running a Spark job.

This extends work PodTemplate investments made in

Goal: What should the final outcome look like, ideally?

flytekit should accept optional driver and executor pod templates in Spark task config

@task(
    task_config=Spark(
        ...
        driver_pod_template=PodTemplate(...),
        executor_pod_template=PodTemplate(...),
    ),
    ...
)

The Spark plugin should use these optional templates and existing ToK8sPodSpec logic to populate the SparkApplicationSpec.

Describe alternatives you've considered

The Spark on K8s operator supports a bunch of k8s pod settings via spark_conf, which is already available in flytekit. While it doesn't support tolerations, for example, we could extend this concept with the addition of keys like spark.kubernetes.driver.tolerations and either get support added to Spark on K8s operator or parse out separately in flyteplugins and apply to the SparkApplicationSpec when building it.

Propose: Link/Inline OR Additional context

One solution here is to add driver and executor pod template parameters to the Spark dataclass and underlying SparkJob proto. Then in flyteplugins we can populate the driver and executor specs.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@Tom-Newton
Copy link
Contributor

This looks like a really valuable feature.

It looks like the backend stuff on flyte propeller has been completed #4183. Am I right in thinking there is still a change required to flytekit to expose driver_pod_template and executor_pod_template?

@andrewwdye
Copy link
Contributor Author

@Tom-Newton actually the final design drifted from this original proposal and doesn't require a flytekit change. Instead I made use of the existing singular pod_spec and shared it across the driver and executor pods. See discussion note in the description of #4183. I imagine some point in the future wanting to have separate pod specs, but moving forward on this now proved premature.

@Tom-Newton
Copy link
Contributor

Tom-Newton commented Dec 4, 2023

Thanks for explaining. Probably I should have read the PR description on #4183. I think it would definitely be very useful to have separate pod templates for driver and executor. We always use use spot nodes usually of a different node type for executors.

@Tom-Newton
Copy link
Contributor

I think there is also a small issue with environment variables that reference secrets. I created a github issue and I have a fix I can contribute.

@machichima
Copy link
Contributor

#take

@machichima
Copy link
Contributor

machichima commented Dec 13, 2024

Hi,
I am working on this feature and would like to discuss the way of handling default podTemplate and custom ones.

If we set a field in driverPod or executorPod, and these fields are also defined in default podSpec, should we merge them or totally use what we set in custom podTemplate?

For example, if default podTemplate have Toleration: with key "x/custom", and I defined a driverPod with Toleration with key "x/driver", should I merge them together into {"x/custom", "x/driver"} or use "x/driver" only?

Thanks!

@Tom-Newton
Copy link
Contributor

Tom-Newton commented Dec 13, 2024

If we set a field in driverPod or executorPod, and these fields are also defined in default podSpec, should we merge them or totally use what we set in custom podTemplate?

For example, if default podTemplate have Toleration: with key "x/custom", and I defined a driverPod with Toleration with key "x/driver", should I merge them together into {"x/custom", "x/driver"} or use "x/driver" only?

My vote would be to merge them. I think this is the behaviour for simple python tasks.

Thanks for working on this 🙂

@machichima
Copy link
Contributor

My vote would be to merge them. I think this is the behaviour for simple python tasks.

Thanks for working on this 🙂

Hi @Tom-Newton, thanks for your reply!

I just merged them by using the existing mergePodSpecs function in pod_helper.go. However, I discovered that if we set the same field in two different places (e.g. one from pod_template and one from driver_pod in spark_job config) with the same name, the mergePodSpec function will simply merge them without checking whether the duplicate value is set. See following images for more details.

Just wondering whether this is an acceptable behaviour or do I need to fix this?

  1. Same environment variable name "Env_Var" in the container of default podTemplate and driver_pod, causing two EnvVar with same Name

image

  1. Set Toleration in default porTemplate and driver_pod with same key of "x/flyte"

image

  1. Set same Args value in the container of default podTemplate and driver_pod

image

@Tom-Newton
Copy link
Contributor

I just merged them by using the existing mergePodSpecs function in pod_helper.go. However, I discovered that if we set the same field in two different places (e.g. one from pod_template and one from driver_pod in spark_job config) with the same name, the mergePodSpec function will simply merge them without checking whether the duplicate value is set. See following images for more details.

Just wondering whether this is an acceptable behaviour or do I need to fix this?

Using the existing mergePodSpecs sounds very reasonable to me.

I've never noticed any problems caused by the double values behaviour you describe on non-spark tasks, so personally I'm not really concerned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request
Projects
Status: In progress
Development

No branches or pull requests

4 participants