Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offload literals #2872

Merged
merged 28 commits into from
Nov 22, 2024
Merged

Offload literals #2872

merged 28 commits into from
Nov 22, 2024

Conversation

eapolinario
Copy link
Collaborator

@eapolinario eapolinario commented Oct 28, 2024

Tracking issue

RFC flyteorg/flyte#5103

Why are the changes needed?

We need to be able to offload literals from flytekit.

What changes were proposed in this pull request?

This PR adds support for offloading of literals gated by the environment variables _F_L_MIN_SIZE_MB and _F_L_MAX_SIZE_MB, the minimal literal size for offloading and the max size, respectively.

Since we need to use the output name (i.e. "o0", "o1", etc) in the name of the offloaded literal, we use the entrypoint (since it's there that we have natural access to the metadata bucket) to write the offloaded literals to files named "o0_offloaded_metadata.pb", "o1_offloaded_metadata.pb", etc.

How hashing is handled

We need a stable representation of offloaded literals for caching purposes. In golang we use jsonpb.Marshaler + objecthash to generate a stable hash for a proto message, but unfortunately there's no readily available equivalent in python. Instead, we rely on the stable json representation of the proto Literal and use SHA256 to produce the hash.

In case a task makes use of HashMethod, then we use that hash in the offloaded literal.

How was this patch tested?

Unit tests and testing locally in sandbox.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
offloaded_metadata=_literal_models.LiteralOffloadedMetadata(
uri=f"{ctx.user_space_params.output_metadata_prefix}/{offloaded_filename}",
size_bytes=lit.ByteSize(),
# TODO: do I have to set the inferred literal type?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmahindrakar-oss , in what conditions we have to set this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed when we have launcplan execution created by propeller
eg : here
https://demo.hosted.unionai.cloud/console/projects/flytesnacks/domains/development/executions/akzthr8gxbdvpnq6f5lr/nodes

[UserError] failed to launch workflow, caused by: rpc error: code = InvalidArgument desc = invalid input input wrong type. Expected collection_type:{simple:STRING}, but got collection_type:{}

Code link in my comments below https://github.com/flyteorg/flytekit/pull/2872/files#r1819666906

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented in about separating those discussions and how we use the task typed interface to set inferred_type with the appropriate literal type.

flytekit/bin/entrypoint.py Outdated Show resolved Hide resolved
offloaded_metadata=_literal_models.LiteralOffloadedMetadata(
uri=f"{ctx.user_space_params.output_metadata_prefix}/{offloaded_filename}",
size_bytes=lit.ByteSize(),
# TODO: do I have to set the inferred literal type?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to get rid of LiteralTypeForLiteral in flyteorg/flyte#5909. Still pending a review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get rid of the literaltypeforliteral function, and we don't need inferredtype, we should deprecate the field in the idl. don't want to keep it around and have future us wonder if it's necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's separate those two discussions. We might be able to remove LiteralTypeForLiteral, but that shouldn't block us from getting the actual literal type from the typed interface in flytekit and use it here.

Also, if/when we end up removing LiteralTypeForLiteral we should use a different field to store the offloaded literal literal type and not use inferred in the name since it'll be the exact type.

Comment on lines 143 to 144
min_offloaded_size = int(os.environ.get("FK_L_MIN_SIZE_MB", "10")) * 1024 * 1024
max_offloaded_size = int(os.environ.get("FK_L_MAX_SIZE_MB", "1000")) * 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to update the injection of environment variables to match those names.

The need for short env vars is described flyteorg/flyte#5665.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok with keeping the current ones .

if lit.ByteSize() >= min_offloaded_size:
logger.debug(f"Literal {k} is too large to be inlined, offloading to metadata bucket")

# TODO: hash calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +64 to +66
if lit.offloaded_metadata:
# TODO: load literal from offloaded literal?
return f"Offloaded literal metadata: {lit.offloaded_metadata}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example we can try to see if we need to load the literal here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is relevant for the pyflyte fetch command:

❯ pyflyte --config ~/.flyte/config-sandbox.yaml fetch flyte://v1/flytesnacks/development/asvskwn766f5v492pzgt/n0-0-n1/o
Fetching data from flyte://v1/flytesnacks/development/asvskwn766f5v492pzgt/n0-0-n1/o...
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ {                                                                                                                                                                                                                                                                                             │
│     'o0': [                                                                                                                                                                                                                                                                                   │
│         'Offloaded literal metadata: Flyte Serialized object (LiteralOffloadedMetadata):\n  uri: s3://my-s3-bucket/metadata/propeller/flytesnacks-development- [...]\n  size_bytes: 39936015',                                                                                                │
│         'Offloaded literal metadata: Flyte Serialized object (LiteralOffloadedMetadata):\n  uri: s3://my-s3-bucket/metadata/propeller/flytesnacks-development- [...]\n  size_bytes: 39936015',                                                                                                │
...
│         'Offloaded literal metadata: Flyte Serialized object (LiteralOffloadedMetadata):\n  uri: s3://my-s3-bucket/metadata/propeller/flytesnacks-development- [...]\n  size_bytes: 39936015'                                                                                                 │
│     ]                                                                                                                                                                                                                                                                                         │
│ }                                                                                                                                                                                                                                                                                             │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

if lit.ByteSize() >= min_offloaded_size:
logger.debug(f"Literal {k} is too large to be inlined, offloading to metadata bucket")

# TODO: hash calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this hash interoperate with hashmethod as specified by the user? how does caching work again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the hash calculation in entrypoint only happens after dispatch_execute (in other words, if a user defined a HashMethod then we can reuse it here). I just pushed an update that implements this idea.

offloaded_metadata=_literal_models.LiteralOffloadedMetadata(
uri=f"{ctx.user_space_params.output_metadata_prefix}/{offloaded_filename}",
size_bytes=lit.ByteSize(),
# TODO: do I have to set the inferred literal type?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we get rid of the literaltypeforliteral function, and we don't need inferredtype, we should deprecate the field in the idl. don't want to keep it around and have future us wonder if it's necessary.

flytekit/bin/entrypoint.py Show resolved Hide resolved
Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 12.50000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 51.45%. Comparing base (faee3da) to head (b3a1b0d).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/utils/pbhash.py 0.00% 19 Missing ⚠️
flytekit/interaction/string_literals.py 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (faee3da) and HEAD (b3a1b0d). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (faee3da) HEAD (b3a1b0d)
8 2
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2872       +/-   ##
===========================================
- Coverage   79.32%   51.45%   -27.88%     
===========================================
  Files         199      200        +1     
  Lines       20870    20894       +24     
  Branches     2684     2687        +3     
===========================================
- Hits        16555    10750     -5805     
- Misses       3566     9544     +5978     
+ Partials      749      600      -149     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Signed-off-by: Eduardo Apolinario <[email protected]>
@eapolinario
Copy link
Collaborator Author

hold on, I had a bad merge.

Signed-off-by: Eduardo Apolinario <[email protected]>
wild-endeavor
wild-endeavor previously approved these changes Nov 19, 2024
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with dm

Signed-off-by: Eduardo Apolinario <[email protected]>
wild-endeavor
wild-endeavor previously approved these changes Nov 19, 2024
@eapolinario eapolinario merged commit 40c9540 into master Nov 22, 2024
103 of 104 checks passed
eapolinario added a commit that referenced this pull request Nov 22, 2024
* wip - Implement offloading of literals

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix use of metadata bucket prefix

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix repeated use of uri

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add temporary representation for offloaded literal

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add one unit test

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add another test

Signed-off-by: Eduardo Apolinario <[email protected]>

* Stylistic changes to the two tests

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add test for min offloading threshold set to 1MB

Signed-off-by: Eduardo Apolinario <[email protected]>

* Pick a unique engine-dir for tests

Signed-off-by: Eduardo Apolinario <[email protected]>

* s/new_outputs/literal_map_copy/

Signed-off-by: Eduardo Apolinario <[email protected]>

* Remove unused constant

Signed-off-by: Eduardo Apolinario <[email protected]>

* Use output_prefix in definition of offloaded literals

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add initial version of pbhash.py

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add tests to verify that overriding the hash is carried over to offloaded literals

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add a few more tests

Signed-off-by: Eduardo Apolinario <[email protected]>

* Always import ParamSpec from `typing_extensions`

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix lint warnings

Signed-off-by: Eduardo Apolinario <[email protected]>

* Set inferred_type using the task type interface

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add comment about offloaded literals files and how they are uploaded to the metadata bucket

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add offloading_enabled

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add more unit tests including a negative test

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix bad merge

Signed-off-by: Eduardo Apolinario <[email protected]>

* Incorporate feedback.

Signed-off-by: Eduardo Apolinario <[email protected]>

* Fix image name (unrelated to this PR - just a nice-to-have to decrease flakiness)

Signed-off-by: Eduardo Apolinario <[email protected]>

* Add `is_map_task` to `_dispatch_execute`

Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
@eapolinario eapolinario mentioned this pull request Nov 22, 2024
3 tasks
eapolinario added a commit that referenced this pull request Nov 22, 2024
* wip - Implement offloading of literals



* Fix use of metadata bucket prefix



* Fix repeated use of uri



* Add temporary representation for offloaded literal



* Add one unit test



* Add another test



* Stylistic changes to the two tests



* Add test for min offloading threshold set to 1MB



* Pick a unique engine-dir for tests



* s/new_outputs/literal_map_copy/



* Remove unused constant



* Use output_prefix in definition of offloaded literals



* Add initial version of pbhash.py



* Add tests to verify that overriding the hash is carried over to offloaded literals



* Add a few more tests



* Always import ParamSpec from `typing_extensions`



* Fix lint warnings



* Set inferred_type using the task type interface



* Add comment about offloaded literals files and how they are uploaded to the metadata bucket



* Add offloading_enabled



* Add more unit tests including a negative test



* Fix bad merge



* Incorporate feedback.



* Fix image name (unrelated to this PR - just a nice-to-have to decrease flakiness)



* Add `is_map_task` to `_dispatch_execute`



---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants