[FEA] A Python package for hanlding customization. #6391

trivialfis · 2022-08-22T20:35:48Z

Is your feature request related to a problem? Please describe.
This feature request is for adding customization like #5561 . Eventually, we might want to have something similar for mapInArrow and withColumn as well. As a downstream user, my hope is that everything can work out-of-the-box with spark. But I understand that there might be difficulties in upstreaming GPU-specific changes.

Describe the solution you'd like
The feature request is to have a new spark_rapids Python package to interface Python users in the same spirit of dask_cudf and dask_cuda. With this new package, we can add customization points for handling rapids-backed dataframe, either by using free functions like mapInCUDF(dataset, ...) or by using Python reflection to modify existing pyspark functions like mapInPandas to make it return cuDF iterators.

Describe alternatives you've considered

@wbo4958 proposed that we generate IPC handle in spark-rapids and decode the custom cuDF spark IPC message in XGBoost, which is really out of scope for XGBoost and difficult to implement due to memory management in two different projects.
Suggested by cuDF developers, we should use UCX to handle IPC message, which also requires some modification in upstream spark. [WIP] Implement IPC for pyspark. rapidsai/cudf#11564 is currently under discussion for whether it's necessary to have such code in cuDF.

Would be great if there's a reusable project to host all these customizations for Spark Python.

The text was updated successfully, but these errors were encountered:

trivialfis · 2022-08-22T20:37:19Z

Similar to #5561 , I can help work on the feature if it's approved by maintainers here. Although, we need some additional help from ops for maintaining a new Python package.

trivialfis added ? - Needs Triage Need team to review and classify feature request New feature or request labels Aug 22, 2022

trivialfis mentioned this issue Aug 23, 2022

[FEA] Interchange protocol between processes on the same device. rapidsai/cudf#11514

Open

sameerz removed the ? - Needs Triage Need team to review and classify label Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] A Python package for hanlding customization. #6391

[FEA] A Python package for hanlding customization. #6391

trivialfis commented Aug 22, 2022

trivialfis commented Aug 22, 2022 •

edited

Loading

[FEA] A Python package for hanlding customization. #6391

[FEA] A Python package for hanlding customization. #6391

Comments

trivialfis commented Aug 22, 2022

trivialfis commented Aug 22, 2022 • edited Loading

trivialfis commented Aug 22, 2022 •

edited

Loading