You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This feature request is for adding customization like #5561 . Eventually, we might want to have something similar for mapInArrow and withColumn as well. As a downstream user, my hope is that everything can work out-of-the-box with spark. But I understand that there might be difficulties in upstreaming GPU-specific changes.
Describe the solution you'd like
The feature request is to have a new spark_rapids Python package to interface Python users in the same spirit of dask_cudf and dask_cuda. With this new package, we can add customization points for handling rapids-backed dataframe, either by using free functions like mapInCUDF(dataset, ...) or by using Python reflection to modify existing pyspark functions like mapInPandas to make it return cuDF iterators.
Describe alternatives you've considered
@wbo4958 proposed that we generate IPC handle in spark-rapids and decode the custom cuDF spark IPC message in XGBoost, which is really out of scope for XGBoost and difficult to implement due to memory management in two different projects.
Suggested by cuDF developers, we should use UCX to handle IPC message, which also requires some modification in upstream spark. [WIP] Implement IPC for pyspark. rapidsai/cudf#11564 is currently under discussion for whether it's necessary to have such code in cuDF.
Would be great if there's a reusable project to host all these customizations for Spark Python.
The text was updated successfully, but these errors were encountered:
Similar to #5561 , I can help work on the feature if it's approved by maintainers here. Although, we need some additional help from ops for maintaining a new Python package.
Is your feature request related to a problem? Please describe.
This feature request is for adding customization like #5561 . Eventually, we might want to have something similar for
mapInArrow
andwithColumn
as well. As a downstream user, my hope is that everything can work out-of-the-box with spark. But I understand that there might be difficulties in upstreaming GPU-specific changes.Describe the solution you'd like
The feature request is to have a new
spark_rapids
Python package to interface Python users in the same spirit of dask_cudf and dask_cuda. With this new package, we can add customization points for handling rapids-backed dataframe, either by using free functions likemapInCUDF(dataset, ...)
or by using Python reflection to modify existing pyspark functions likemapInPandas
to make it return cuDF iterators.Describe alternatives you've considered
Would be great if there's a reusable project to host all these customizations for Spark Python.
The text was updated successfully, but these errors were encountered: