Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] A Python package for hanlding customization. #6391

Open
trivialfis opened this issue Aug 22, 2022 · 1 comment
Open

[FEA] A Python package for hanlding customization. #6391

trivialfis opened this issue Aug 22, 2022 · 1 comment
Labels
feature request New feature or request

Comments

@trivialfis
Copy link

Is your feature request related to a problem? Please describe.
This feature request is for adding customization like #5561 . Eventually, we might want to have something similar for mapInArrow and withColumn as well. As a downstream user, my hope is that everything can work out-of-the-box with spark. But I understand that there might be difficulties in upstreaming GPU-specific changes.

Describe the solution you'd like
The feature request is to have a new spark_rapids Python package to interface Python users in the same spirit of dask_cudf and dask_cuda. With this new package, we can add customization points for handling rapids-backed dataframe, either by using free functions like mapInCUDF(dataset, ...) or by using Python reflection to modify existing pyspark functions like mapInPandas to make it return cuDF iterators.

Describe alternatives you've considered

  • @wbo4958 proposed that we generate IPC handle in spark-rapids and decode the custom cuDF spark IPC message in XGBoost, which is really out of scope for XGBoost and difficult to implement due to memory management in two different projects.
  • Suggested by cuDF developers, we should use UCX to handle IPC message, which also requires some modification in upstream spark. [WIP] Implement IPC for pyspark. rapidsai/cudf#11564 is currently under discussion for whether it's necessary to have such code in cuDF.

Would be great if there's a reusable project to host all these customizations for Spark Python.

@trivialfis trivialfis added ? - Needs Triage Need team to review and classify feature request New feature or request labels Aug 22, 2022
@trivialfis
Copy link
Author

trivialfis commented Aug 22, 2022

Similar to #5561 , I can help work on the feature if it's approved by maintainers here. Although, we need some additional help from ops for maintaining a new Python package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants