- Petabyte scale 3D image processing is slow and computationally demanding;
- Computation has to be distributed with linear scalability;
- Local cluster and public cloud computing are not fully used at the same time;
- Duplicated code across a variety of routine tasks is hard to maintain.
- Composable operators. The chunk operators could be composed in a command line for flexible usage.
- Hybrid Cloud Distributed computation in both local and cloud computers. The task scheduling frontend and computationally heavy backend are decoupled using AWS Simple Queue Service. The backend could be any computer with an internet connection and cloud authentication. Benefit from the robust design, the cheap unstable instances (preemptable intance in Google Cloud, spot instance in AWS) could be used to reduce cost by about threefold!
- Petabyte scale. We have used chunkflow to output over eighteen-petabyte images and scaled up to 3600 nodes with NVIDIA GPUs across three regions in Google Cloud, and chunkflow is still reliable.
- Operators work with 3D image volumes.
- You can plug in your own code as an operator.
Check out the Documentation for installation and usage. Try it out by following the tutorial.
Perform Convolutional net inference to segment 3D image volume with one single command!
#!/bin/bash
chunkflow \
load-tif --file-name path/of/image.tif -o image \
inference --convnet-model path/of/model.py --convnet-weight-path path/of/weight.pt \
--input-patch-size 20 256 256 --output-patch-overlap 4 64 64 --num-output-channels 3 \
-f pytorch --batch-size 12 --mask-output-chunk -i image -o affs \
plugin -f agglomerate --threshold 0.7 --aff-threshold-low 0.001 --aff-threshold-high 0.9999 -i affs -o seg \
neuroglancer -i image,affs,seg -p 33333 -v 30 6 6
you can see your 3D image and segmentation directly in Neuroglancer!
After installation, You can simply type chunkflow
and it will list all the operators with help message. We keep adding new operators and will keep it update here. For the detailed usage, please checkout our Documentation.
Operator Name | Function |
---|---|
adjust-bbox | adjust the corner offset of bounding box |
channel-voting | Vote across channels of semantic map |
cleanup | remove empty files to clean up storage |
connected-components | Threshold the boundary map to get a segmentation |
copy-var | Copy a variable to a new name |
create-chunk | Create a fake chunk for easy test |
create-info | Create info file of Neuroglancer Precomputed volume |
crop-margin | Crop the margin of a chunk |
debug | Add breakpoint to debug the task content |
delete-chunk | Delete chunk in task to reduce RAM requirement |
delete-task-in-queue | Delete the task in AWS SQS queue |
downsample-upload | Downsample the chunk hierarchically and upload to volume |
download-mesh | Download meshes from Neuroglancer Precomputed volume |
evaluate-segmentation | Compare segmentation chunks |
fetch-task-from-file | Fetch task from a file |
fetch-task-from-sqs | Fetch task from AWS SQS queue one by one |
generate-tasks | Generate tasks one by one |
gaussian-filter | 2D Gaussian blurring operated in-place |
inference | Convolutional net inference |
log-summary | Summary of logs |
mark-complete | mark task completion as an empty file |
mask | Black out the chunk based on another mask chunk |
mask-out-objects | Mask out selected or small objects |
multiply | Multiply chunks with another chunk |
mesh | Build 3D meshes from segmentation chunk |
mesh-manifest | Collect mesh fragments for object |
neuroglancer | Visualize chunks using neuroglancer |
normalize-contrast-nkem | Normalize image contrast using histograms |
normalize-intensity | Normalize image intensity to -1:1 |
normalize-section-shang | Normalization algorithm created by Shang |
plugin | Import local code as a customized operator. |
quantize | Quantize the affinity map |
load-h5 | Read HDF5 files |
load-npy | Read NPY files |
load-json | Read JSON files |
load-pngs | Read png files |
load-precomputed | Cutout chunk from a local/cloud storage volume |
load-tif | Read TIFF files |
load-skeleton | Load skeletons |
load-synapses | Load synapses from a file |
load-zarr | Read Zarr files |
setup-env | Prepare storage infor files and produce tasks |
skip-task-by-file | If a result/flag file already exists, skip this task |
skip-task-by-blocks-in-volume | If all the blocks already exists in volume, skip this task |
skip-all-zero | If a chunk has all zero, skip this task |
skip-none | If an item in task is None, skip this task |
threshold | Use a threshold to segment the probability map |
view | Another chunk viewer in browser using CloudVolume |
save-h5 | Save chunk as HDF5 file |
save-points | Save point cloud as a HDF5 file. |
save-pngs | Save chunk as a serials of png files |
save-precomputed | Save chunk to local/cloud storage volume |
save-tif | Save chunk as TIFF file |
save-synapses | Save synapses as a HDF5 file. |
save-swc | Save skeletons as a SWC file. |
save-zarr | Save volume as a Zarr folder |
This package is developed at Princeton University and Flatiron Institute.
We have a paper for this repo:
@article{wu_chunkflow_2021,
title = {Chunkflow: hybrid cloud processing of large {3D} images by convolutional nets},
issn = {1548-7105},
shorttitle = {Chunkflow},
url = {https://www.nature.com/articles/s41592-021-01088-5},
doi = {10.1038/s41592-021-01088-5},
journal = {Nature Methods},
author = {Wu, Jingpeng and Silversmith, William M. and Lee, Kisuk and Seung, H. Sebastian},
year = {2021},
pages = {1--2}
}