Can we get some benchmarks ? #127

tchaton · 2024-02-02T22:42:50Z

Tell us more about this new feature.

Hey there,

It would be great to get some benchmarks about this dataset.

dnanuti · 2024-02-07T10:10:14Z

Hello @tchaton! Thank you for your interest in s3-connector-for-pytorch and for raising this. We will review your request as a team and get back to you with an update.

dnanuti · 2024-02-09T15:10:57Z

Hi @tchaton! We are working on providing a way for our customers to run benchmarks: #135. Any feedback is much appreciated. Thank you!

tchaton · 2024-02-13T19:36:59Z

Sounds good @dnanuti. I will review it tomorrow.

tchaton · 2024-02-13T19:37:28Z

Would you mind adding a comparaison with PyTorch Lightning Data: https://github.com/Lightning-AI/pytorch-lightning/tree/master/src/lightning/data?

dnanuti · 2024-02-15T11:32:23Z

Hi @tchaton, thanks for the suggestion. Our intention is to enable customers to run benchmarks on their own. Would incorporating PyTorch Lightning Data into our benchmarking framework fit your needs?

tchaton · 2024-02-16T08:23:14Z

Hey dnanuti, I think enabling users to run their own benchmarks is great ! This is key and I am keen to try it out myself.

However, I think it would be great to see where this new library fits in the game of streaming libraries such as Lightning Data or WebDataset. Every single library uses Imagenet 1M without alteration for their benchmarks. I strongly recommend the s3-connector-for-pytorch Team to do the same.

Here are Lightning Data benchmarks for example: https://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries.

Furthermore, I believe there is room for improvement on Lightning Data by moving the backend to rust as this library does.

Note: Naming the client mountpoint_s3_client is quite confusing. This isn't really related to mountpoint-s3 mount solution.

dnanuti · 2024-03-22T17:02:11Z

Hey @tchaton!
Just checking in with a couple of updates:

We published a benchmarking module: https://github.com/awslabs/s3-connector-for-pytorch/tree/main/s3torchbenchmarking
We are working on prioritising benchmarks numbers, as you suggested, I will come back with an update when those are available.

Related to the note, the crate of the client we are using is actually published by mountpoint-s3. The naming is used to suggest the alignment with that solution. This crate is not intended for general-purpose use and we consider its interface to be unstable, as mentioned here.

tchaton added the enhancement New feature or request label Feb 2, 2024

dnanuti self-assigned this Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we get some benchmarks ? #127

Can we get some benchmarks ? #127

tchaton commented Feb 2, 2024 •

edited by matthieu-d4r

Loading

dnanuti commented Feb 7, 2024

dnanuti commented Feb 9, 2024

tchaton commented Feb 13, 2024

tchaton commented Feb 13, 2024 •

edited

Loading

dnanuti commented Feb 15, 2024

tchaton commented Feb 16, 2024 •

edited

Loading

dnanuti commented Mar 22, 2024

Can we get some benchmarks ? #127

Can we get some benchmarks ? #127

Comments

tchaton commented Feb 2, 2024 • edited by matthieu-d4r Loading

Tell us more about this new feature.

dnanuti commented Feb 7, 2024

dnanuti commented Feb 9, 2024

tchaton commented Feb 13, 2024

tchaton commented Feb 13, 2024 • edited Loading

dnanuti commented Feb 15, 2024

tchaton commented Feb 16, 2024 • edited Loading

dnanuti commented Mar 22, 2024

tchaton commented Feb 2, 2024 •

edited by matthieu-d4r

Loading

tchaton commented Feb 13, 2024 •

edited

Loading

tchaton commented Feb 16, 2024 •

edited

Loading