Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we get some benchmarks ? #127

Open
tchaton opened this issue Feb 2, 2024 · 7 comments
Open

Can we get some benchmarks ? #127

tchaton opened this issue Feb 2, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@tchaton
Copy link

tchaton commented Feb 2, 2024

Tell us more about this new feature.

Hey there,

It would be great to get some benchmarks about this dataset.

@tchaton tchaton added the enhancement New feature or request label Feb 2, 2024
@dnanuti
Copy link
Contributor

dnanuti commented Feb 7, 2024

Hello @tchaton! Thank you for your interest in s3-connector-for-pytorch and for raising this. We will review your request as a team and get back to you with an update.

@dnanuti dnanuti self-assigned this Feb 7, 2024
@dnanuti
Copy link
Contributor

dnanuti commented Feb 9, 2024

Hi @tchaton! We are working on providing a way for our customers to run benchmarks: #135. Any feedback is much appreciated. Thank you!

@tchaton
Copy link
Author

tchaton commented Feb 13, 2024

Sounds good @dnanuti. I will review it tomorrow.

@tchaton
Copy link
Author

tchaton commented Feb 13, 2024

Would you mind adding a comparaison with PyTorch Lightning Data: https://github.com/Lightning-AI/pytorch-lightning/tree/master/src/lightning/data?

@dnanuti
Copy link
Contributor

dnanuti commented Feb 15, 2024

Hi @tchaton, thanks for the suggestion. Our intention is to enable customers to run benchmarks on their own. Would incorporating PyTorch Lightning Data into our benchmarking framework fit your needs?

@tchaton
Copy link
Author

tchaton commented Feb 16, 2024

Hey dnanuti, I think enabling users to run their own benchmarks is great ! This is key and I am keen to try it out myself.

However, I think it would be great to see where this new library fits in the game of streaming libraries such as Lightning Data or WebDataset. Every single library uses Imagenet 1M without alteration for their benchmarks. I strongly recommend the s3-connector-for-pytorch Team to do the same.

Here are Lightning Data benchmarks for example: https://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries.

Furthermore, I believe there is room for improvement on Lightning Data by moving the backend to rust as this library does.

Note: Naming the client mountpoint_s3_client is quite confusing. This isn't really related to mountpoint-s3 mount solution.

@dnanuti
Copy link
Contributor

dnanuti commented Mar 22, 2024

Hey @tchaton!
Just checking in with a couple of updates:

Related to the note, the crate of the client we are using is actually published by mountpoint-s3. The naming is used to suggest the alignment with that solution. This crate is not intended for general-purpose use and we consider its interface to be unstable, as mentioned here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants