Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-k8s] per step k8s config not respecting container config image #26536

Open
abhinavDhulipala opened this issue Dec 17, 2024 · 0 comments
Labels
type: bug Something isn't working

Comments

@abhinavDhulipala
Copy link
Contributor

abhinavDhulipala commented Dec 17, 2024

What's the issue?

Given the following simple job:

@job(executor_def=k8s_job_executor)
def simple_prod_sink():
    @op
    def producer() -> int:
        return 10
    @op
    def sink(producer: int):
        sleep(producer)

    sink(producer())

And the following execution config:

execution:
  config:
    per_step_k8s_config:
      sink:
        container_config:
          image: 0123456789101.dkr.ecr.us-west-2.amazonaws.com/codelocation:sink_image

The sink run op should respect this and us the 0123456789101.dkr.ecr.us-west-2.amazonaws.com/codelocation:sink_image image.

What did you expect to happen?

Instead, with the default K8sRunLauncher and the above config, the codelocation servers image is used.

How to reproduce?

From the problem statement, add the following simple job:

@job(executor_def=k8s_job_executor)
def simple_prod_sink():
    @op
    def producer() -> int:
        return 1000
    @op
    def sink(producer: int):
        sleep(producer)

    sink(producer())

And the following execution config:

execution:
  config:
    per_step_k8s_config:
      sink:
        container_config:
          image: 0123456789101.dkr.ecr.us-west-2.amazonaws.com/codelocation:sink_image

Replace the above image with a separately tagged image of your codelocation and do kubectl describe po <sink run pod> -oyaml. Observe that the image is simply the sink_image.

The problem here lies in the launch_step of the K8sStepHandler while constructing the job_config.

Either K8sContainerContext#get_k8s_job_image needs to be updated to respect either the K8sContainerContext#server_k8s_config or (preferrebly) K8sContainerContext#run_k8s_config.

or the launch step needs to do something like the following instead:

        container_image = container_context.run_k8s_config.container_config.get('image')
        job_config = container_context.get_k8s_job_config(
            container_image or self._executor_image, step_handler_context.instance.run_launcher
        )

Dagster version

dagster, version 1.9.2. This is still a problem in the latest...

Deployment type

Helm Chart

Deployment details

Helm Chart deployment on EKS 1.29 Ubuntu 22.04 AMD64 nodes

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
By submitting this issue, you agree to follow Dagster's Code of Conduct.

I can push a PR to fix this if the maintainers will acknowledge that this is indeed unintended behavior and are happy with one of the proposed solutions.

@abhinavDhulipala abhinavDhulipala added the type: bug Something isn't working label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant