Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use PyTorch's p2p access enable function (pytorch#2000)
Summary: Pull Request resolved: pytorch#2000 We split the diff after adding a needed lazy cuda init call in enable p2p access function. Diff 1: D48939723 [PyTorch] Add the lazy init call for p2p access function *Prior context* cudaEnablePeerAccess only enables cross device access for memory allocated with cudaMalloc. When using other cuda APIs such cuMemMap, peer access is managed differently. expandable_segments:True in PyTorch uses cuMemMap, so code that just calls cudaEnablePeerAccess is not sufficient to enable cross-device copies. This patch switching the p2p access enabling functions to use PyTorchs `get_p2p_access` which lets its allocator figure out how to correctly enable p2p access for that memory. In the normal case (expandable_segments:False), this code performs exactly the same cuda calls as before. Reviewed By: zdevito Differential Revision: D49021817 fbshipit-source-id: 7ffb4b477b1d1cddccc891dd9fc8f9a2a986585e
- Loading branch information