Scale up `lambkin.shepherd.data` APIs to larger datasets #100

hidmic · 2024-09-28T17:43:08Z

Feature description

By default, lambkin.data APIs will yield pandas.DataFrame instances when accessing benchmark results. These instances can grow large, very large. We need to find a way to keep the UX while scaling it up to huge amounts of data.

Implementation considerations

There are plenty things we could do here: lazy loading, chunking, compression, parallelization (see dask), and more. The solution may also be introducing some other storage formats more suitable to big data (e.g. Parquet).

The text was updated successfully, but these errors were encountered:

hidmic added the enhancement New feature or request label Sep 28, 2024

hidmic changed the title ~~Scale up lambkin.data APIs to larger datasets~~ Scale up lambkin.shepherd.data APIs to larger datasets Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale up `lambkin.shepherd.data` APIs to larger datasets #100

Scale up `lambkin.shepherd.data` APIs to larger datasets #100

hidmic commented Sep 28, 2024

Scale up lambkin.shepherd.data APIs to larger datasets #100

Scale up lambkin.shepherd.data APIs to larger datasets #100

Comments

hidmic commented Sep 28, 2024

Feature description

Implementation considerations

Scale up `lambkin.shepherd.data` APIs to larger datasets #100

Scale up `lambkin.shepherd.data` APIs to larger datasets #100