Scale up lambkin.shepherd.data
APIs to larger datasets
#100
Labels
enhancement
New feature or request
lambkin.shepherd.data
APIs to larger datasets
#100
Feature description
By default,
lambkin.data
APIs will yieldpandas.DataFrame
instances when accessing benchmark results. These instances can grow large, very large. We need to find a way to keep the UX while scaling it up to huge amounts of data.Implementation considerations
There are plenty things we could do here: lazy loading, chunking, compression, parallelization (see dask), and more. The solution may also be introducing some other storage formats more suitable to big data (e.g. Parquet).
The text was updated successfully, but these errors were encountered: