-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Direct Storage #227
Comments
Hello @ryxli, However, you can use the
This approach still involves the CPU step but allows you to load the data onto the GPU immediately after deserialization. Please let us know if this answers your question and if there is anything else we can do to help. |
thanks, I know about the torch.load / torch.save api. In 99% of cases today, checkpointing is composed of D2H -> Serialization (cpu) -> dump to storage (filesystem or s3), and similar inverse for loading From your answer, it's currently not supported, but was asking if your team / Mountpoint team has any items on the roadmap to look into https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html for direct memory transfer from device to storage (or s3 in this case), hence the skipping of the cpu step. |
At the moment, there are no immediate plans to investigate GPU Direct Storage integration for the s3-connector-for-pytorch project. However, I appreciate you raising this idea, as it could potentially benefit certain use cases. I'm curious to understand the rationale behind your suggestion better. My understanding is that torch.load and torch.save require CPU processing for serialization/deserialization. If that's the case, would the intention be to turn off object compression to skip the CPU step? While this could potentially reduce CPU overhead, it may also result in larger object sizes and longer download times from S3. Alternatively, if the goal is to bypass torch.load and torch.save altogether when transferring data directly to the GPU, could you please elaborate on the tools or approaches you have in mind? Understanding the specific use case and requirements would help evaluate the feasibility and potential impact of exploring this feature. Regarding GPU Direct Storage, my understanding is that it enables direct data transfer between GPU memory and storage devices by leveraging specialized storage drivers that support GPU-accelerated file operations (cuFile* primitives) at the kernel level. Could you please confirm if this high-level understanding is correct? |
Tell us more about this new feature.
Is there possibility with current CRT/boto3 for GPU Direct to S3, wondering if it is possible to skip the S3 -> CPU -> GPU with torch.load, or if there is already functionality that supports this.
The text was updated successfully, but these errors were encountered: