-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto Cache Plugin #2971
base: master
Are you sure you want to change the base?
Auto Cache Plugin #2971
Conversation
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
Signed-off-by: Daniel Sola <[email protected]>
@@ -132,9 +133,9 @@ def task( | |||
|
|||
@overload | |||
def task( | |||
_task_function: Callable[P, FuncOut], | |||
_task_function: Callable[..., FuncOut], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change P
to ...
?
self.cache_serialize = cache_serialize | ||
self.cache_version = cache_version | ||
self.cache_ignore_input_vars = cache_ignore_input_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of saving this state here? aren't these just forwarded to the underlying TaskMetadata
?
@@ -95,7 +96,7 @@ def find_pythontask_plugin(cls, plugin_config_type: type) -> Type[PythonFunction | |||
def task( | |||
_task_function: None = ..., | |||
task_config: Optional[T] = ..., | |||
cache: bool = ..., | |||
cache: Union[bool, CachePolicy] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this accept any AutoCache
-compliant object?
Basically the user can provide just a single autocache object like CacheFunctionBody
or compose multiple into a CachePolicy
, but users should be forced to always use a CachePolicy
object.
cache_version_val = cache_version or cache.get_version(params=params) | ||
cache_serialize_val = cache_serialize or cache.cache_serialize | ||
cache_serialize_val = cache_ignore_input_vars or cache.cache_ignore_input_vars | ||
else: | ||
cache_val = cache | ||
cache_version_val = cache_version | ||
cache_serialize_val = cache_serialize | ||
cache_ignore_input_vars_val = cache_ignore_input_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of forwarding all of these parameters via the CachePolicy
object? It doesn't look like it's being modified there.
cache_policy = CachePolicy( | ||
auto_cache_policies = [ | ||
CacheFunctionBody(), | ||
CachePrivateModules(root_dir="../my_package"), | ||
..., | ||
] | ||
salt="my_salt" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also provide an example of not needing to provide a CachePolicy
object, e.g. just a passing in CacheFunctionBody
.
Why are the changes needed?
Make caching easier to use in flytekit by reducing cognitive burden of specifying cache versions
What changes were proposed in this pull request?
To use the caching mechanism in a Flyte task, you can define a
CachePolicy
that combines multiple caching strategies. Here’s an example of how to set it up:Salt Parameter
The
salt
parameter in theCachePolicy
adds uniqueness to the generated hash. It can be used to differentiate between different versions of the same task. This ensures that even if the underlying code remains unchanged, the hash will vary if a different salt is provided. This feature is particularly useful for invalidating the cache for specific versions of a task.Cache Implementations
Users can add any number of cache policies that implement the
AutoCache
protocol defined in@auto_cache.py
. Below are the implementations available so far:1. CacheFunctionBody
This implementation hashes the contents of the function of interest, ignoring any formatting or comment changes. It ensures that the core logic of the function is considered for versioning.
2. CacheImage
This implementation includes the hash of the
container_image
object passed. If the image is specified as a name, that string is hashed. If it is anImageSpec
, the parametrization of theImageSpec
is hashed, allowing for precise versioning of the container image used in the task.3. CachePrivateModules
This implementation recursively searches the task of interest for all callables and constants used. The contents of any callable (function or class) utilized by the task are hashed, ignoring formatting or comments. The values of the literal constants used are also included in the hash.
It accounts for both
import
andfrom-import
statements at the global and local levels within a module or function. Any callables that are within site-packages (i.e., external libraries) are ignored.4. CacheExternalDependencies
This implementation recursively searches through all the callables like
CachePrivateModules
, but when an external package is found, it records the version of the package, which is included in the hash. This ensures that changes in external dependencies are reflected in the task's versioning.How was this patch tested?
Unit tests for the following:
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link