You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When converting local dataframes to a Ray Dataset and Dask DataFrame or when there is a group-map operation, Ray requires users to be explicit about the number of partitions and reducers. However, most users are not aware of this causing their transform function to use single cpu to process everything. We need to make this process smarter (but also safer). For example, can we find the current ray cluster's total number of cpus and use that to set the parallelism (if not specified). There can be pros and cons, we also need to careful about the strategy.
The text was updated successfully, but these errors were encountered:
goodwanghan
changed the title
[FEATURE] Ray engine should automatically find optimal number of partitions if not specified
[FEATURE] Ray/Dask engines should automatically find optimal number of partitions if not specified
Dec 29, 2022
goodwanghan
changed the title
[FEATURE] Ray/Dask engines should automatically find optimal number of partitions if not specified
[FEATURE] Ray/Dask engines guess optimal default partitions
Dec 29, 2022
When converting local dataframes to a Ray Dataset and Dask DataFrame or when there is a group-map operation, Ray requires users to be explicit about the number of partitions and reducers. However, most users are not aware of this causing their
transform
function to use single cpu to process everything. We need to make this process smarter (but also safer). For example, can we find the current ray cluster's total number of cpus and use that to set the parallelism (if not specified). There can be pros and cons, we also need to careful about the strategy.The text was updated successfully, but these errors were encountered: