Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rfc] Support partitions in add_output_metadata #26562

Open
wants to merge 1 commit into
base: output_metadata_refactor
Choose a base branch
from

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Dec 18, 2024

Summary & Motivation

This PR attempts support at specifying metadata at the partition level for outputs. The use case here is that if you're materializing many partitions in the same computation, you might want to specify different metadata for each processed partition.

I explicitly made a change which makes this work on assets and not ops, since I think the use case is less valid in op-land. But we can discuss whether that's something worth supporting.

How I Tested These Changes

Added to an existing test which operates over partition ranges. Shows the behavior when specifying metadata on the partition level and the non-partition level, and shows that the metadata gets combined on the partitioned materialization in the end.

Copy link
Contributor Author

dpeng817 commented Dec 18, 2024

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@dpeng817 dpeng817 changed the title Support partitions in add_output_metadata [rfc] Support partitions in add_output_metadata Dec 18, 2024
@dpeng817 dpeng817 marked this pull request as ready for review December 18, 2024 16:05
Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is actually quite nice, really only a minor comment on naming but I do think this is a pretty low-risk change that would unblock existing users

metadata=metadata,
output_name=output_name,
mapping_key=mapping_key,
asset_partition_key=partition_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why partition_key above and asset_partition_key here?

I think just using partition_key everywhere makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it as asset_partition_key here to make it clear that this is only really applicable to asset events, and not to partitioned ops for example

@dpeng817 dpeng817 force-pushed the output_metadata_refactor branch from aab7bd1 to 298f9ae Compare December 19, 2024 01:57
@dpeng817 dpeng817 force-pushed the dpeng817/partitioned_output_metadata branch from 39b5894 to 674a249 Compare December 19, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants