[WIP] POC for supporting cuda ipc for XGBoost scenario #6440
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The GpuMapInPandas still requires copying and passing Data from GPU in java process to python process, which may cause some performance issue see the link https://docs.google.com/spreadsheets/d/1PLZSxyjDAyt9cVe2x3Spgc8iGJq5l0rSibD86Mz5SB8/edit#gid=0.
So we have proposed an alternative way by using CUDA IPC, which can make two processes in a PC exchange data In the same machine with zero-copying. The solution is passing CUDA IPC meta info exported by cudf Table API in the row of Pandas.DataFrame in Java process, while python process first should re-construct the cudf Table by importing the CUDA IPC meta info. So this solution just pass some bytes of the CUDA IPC information instead of the whole real data. BTW, this PR depends on rapidsai/cudf#11564
I had the initial design doc from here and the performance testing on XGBoost scenario from here