Skip to content

Performance benchmarks of Zarr V3 implementations

Notifications You must be signed in to change notification settings

LDeakin/zarr_benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zarr Benchmarks

This repository contains benchmarks of Zarr V3 implementations.

Note

Contributions are welcomed for additional benchmarks, more implementations, or otherwise cleaning up this repository.

Also consider restarting development of the official zarr benchmark repository: https://github.com/zarr-developers/zarr-benchmark

Implementations Benchmarked

Warning

Python benchmarks (tensorstore and zarr-python) are subject to the overheads of Python and may not be using an optimal API/parameters.

Please open a PR if you can improve these benchmarks.

make Targets

  • pydeps: install python dependencies (recommended to activate a venv first)
  • zarrs_tools: install zarrs_tools (set CARGO_HOME to override the installation dir)
  • generate_data: generate benchmark data
  • benchmark_read_all: run read all benchmark
  • benchmark_read_chunks: run chunk-by-chunk benchmark
  • benchmark_roundtrip: run roundtrip benchmark
  • benchmark_all: run all benchmarks

Benchmark Data

All datasets are $1024x1024x2048$ uint16 arrays.

Name Chunk Shape Shard Shape Compression Size
data/benchmark.zarr $512^3$ None 8.0 GB
data/benchmark_compress.zarr $512^3$ blosclz 9 + bitshuffling 377 MB
data/benchmark_compress_shard.zarr $32^3$ $512^3$ blosclz 9 + bitshuffling 1.1 GB

Benchmark System

  • AMD Ryzen 5900X
  • 64GB DDR4 3600MHz (16-19-19-39)
  • 2TB Samsung 990 Pro
  • Ubuntu 22.04 (in Windows 11 WSL2, swap disabled, 32GB available memory)

Read All Benchmark

This benchmark measures the minimum time and and peak memory usage to read an entire dataset into memory.

  • The disk cache is cleared between each measurement
  • These are best of 3 measurements

read all benchmark image

Note

zarr-python is excluded with sharding. It takes too long.

Table of raw measurements (benchmarks_read_all.md)

Read Chunk-By-Chunk Benchmark

This benchmark measures the the minimum time and peak memory usage to read a dataset chunk-by-chunk into memory.

  • The disk cache is cleared between each measurement
  • These are best of 1 measurements

read chunks benchmark image

Note

zarr-python is excluded with sharding. It takes too long.

Table of raw measurements (benchmarks_read_chunks.md)

Round Trip Benchmark

This benchmark measures time and peak memory usage to "round trip" a dataset (potentially chunk-by-chunk).

  • The disk cache is cleared between each measurement
  • These are best of 3 measurements

roundtrip benchmark image

Note

zarr-python is excluded with sharding. It takes too long.

Table of raw measurements (benchmarks_roundtrip.md)