Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Direct Storage #227

Open
ryxli opened this issue Aug 27, 2024 · 3 comments
Open

GPU Direct Storage #227

ryxli opened this issue Aug 27, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@ryxli
Copy link
Contributor

ryxli commented Aug 27, 2024

Tell us more about this new feature.

Is there possibility with current CRT/boto3 for GPU Direct to S3, wondering if it is possible to skip the S3 -> CPU -> GPU with torch.load, or if there is already functionality that supports this.

@ryxli ryxli added the enhancement New feature or request label Aug 27, 2024
@IsaevIlya
Copy link
Contributor

Hello @ryxli,
Thank you for your interest to s3-connector-for-pytorch.
Currently, with the existing mountpoint-s3-client library, that we are using for communication with S3, it is not possible to directly load data from S3 into GPU memory, skipping the CPU step. The torch.load function, as per the documentation, deserializes the data on the CPU before loading it into tensors.

However, you can use the map_location parameter in torch.load to load the deserialized data directly onto the GPU after the CPU step. For example:

MODEL_PATH = 's3://bucket_name/checkpoint.chk'
MAP_LOCATION = torch.device('cuda:device_id')
model.load_state_dict(torch.load(MODEL_PATH, map_location=MAP_LOCATION))

This approach still involves the CPU step but allows you to load the data onto the GPU immediately after deserialization.

Please let us know if this answers your question and if there is anything else we can do to help.

@ryxli
Copy link
Contributor Author

ryxli commented Sep 3, 2024

thanks, I know about the torch.load / torch.save api. In 99% of cases today, checkpointing is composed of D2H -> Serialization (cpu) -> dump to storage (filesystem or s3), and similar inverse for loading

From your answer, it's currently not supported, but was asking if your team / Mountpoint team has any items on the roadmap to look into https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html for direct memory transfer from device to storage (or s3 in this case), hence the skipping of the cpu step.

@IsaevIlya
Copy link
Contributor

At the moment, there are no immediate plans to investigate GPU Direct Storage integration for the s3-connector-for-pytorch project. However, I appreciate you raising this idea, as it could potentially benefit certain use cases.

I'm curious to understand the rationale behind your suggestion better. My understanding is that torch.load and torch.save require CPU processing for serialization/deserialization. If that's the case, would the intention be to turn off object compression to skip the CPU step? While this could potentially reduce CPU overhead, it may also result in larger object sizes and longer download times from S3.

Alternatively, if the goal is to bypass torch.load and torch.save altogether when transferring data directly to the GPU, could you please elaborate on the tools or approaches you have in mind? Understanding the specific use case and requirements would help evaluate the feasibility and potential impact of exploring this feature.

Regarding GPU Direct Storage, my understanding is that it enables direct data transfer between GPU memory and storage devices by leveraging specialized storage drivers that support GPU-accelerated file operations (cuFile* primitives) at the kernel level. Could you please confirm if this high-level understanding is correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants