Skip to content

imglib/imglib2-cache-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build status

ImgLib2 Cache Python

The imglib2-cache-python package provides a way to integrate CPython into ImgLib2 for the JVM. It combines imglib2-cache and Jep into the PythonCacheLoader that can be used to populate ImgLib2 datastructures like the CachedCellImg using native CPython code.

This package is still in an exploratory phase and interface-breaking changes may occur.

Requirements

ImgLib2 cache requires Java version 8 or later and a CPython interpreter with the jep pacakge. At the moment, there is no release of imglib2-cache-python available. To use as a dependency in another package, first install into your local maven repository:

mvn clean install

and then add the appropriate dependency, e.g.

<dependency>
	<groupId>net.imglib2</groupId>
	<artifactId>imglib2-cache-python</artifactId>
	<version>0.1.0-SNAPSHOT</version>
</dependency>

in pom.xml for a maven-based project, or

implementation("net.imglib2:imglib2-cache-python:0.1.0-SNAPSHOT")

in build.gradle.kts for gradle-based projects (you will need to use mavenLocal).

To install jep into your Python interpreter, run

python -m pip install jep

All packages that are available to that interpreter will also be available for use in Java. If you use a Python interpreter in a non-standard location, e.g. through Conda, you will need to set the PYTHONHOME environment variable appropriately. In all cases, numpy must be installed.

Usage

To create Python-backed CachedCellImg, first create a PythonCacheLoaderQueue:

import net.imglib2.cache.python.PythonCacheLoaderQueue;

final int numWorkers = 3;
final String init = "# expensive Python initialization, e.g. Tensorflow";
final PythonCacheLoaderQueue queue = new PythonCacheLoaderQueue(numWorkers, init);

The optional constructor parameters numWorkers and init specify the number of Python interpreters to execute requests in parallel (the GIL still applies) and a Python code block that gets executed on each Python interpreter upon initialization, respectively. The queue is used by the PythonCacheloader to load data for individual grid cells. A CachedCellImg can be conveniently created with the PythonCacheLoader.createCachedCellImg method.

import net.imglib2.RandomAccessible;
import net.imglib2.cache.img.CachedCellImg;
import net.imglib2.cache.python.Halo;
import net.imglib2.cache.python.PythonCacheloader;
import net.imglib2.img.cell.CellGrid;
import net.imglib2.type.numeric.integer.LongType;

final long[] dims = //
final int[] blockSize = //
final CellGrid grid = new CellGrid(dims, blockSize);
final String code = "# Python code to populate cell data. The output should be written into block.data, e.g. block.data[...] = 42";
final RandomAccessible<? extends NativeType<?>> input1 = //
final RandomAccessible<? extends NativeType<?>> input2 = //

final PythonCacheLoader<LongType, ? extends BufferAccess<?>> loader = PythonCacheLoader.fromRandomAccessibles(
				grid,
                queue,
                code,
                new LongType(),
                Halo.empty(grid.numDimensions()),
                input1,
                input2
);

final int maximumCacheSize = 30;
final CachedCellImg<LongType, ? extends BufferAccess<?>> img = loader.createCachedCellImg(maximumCacheSize);

The dimensions (dims) and block size (blockSize) define the cell grid of the CachedCellImg (img). The loader generates data for each of the cells of img on demand. Cells are cached in a Cache with at most maximumCacheSize entries. The code defines how the data for a cell is populated in Python. The type of the data must be specified in the loader (in this case, it is LongType) and a halo can be added if padding is needed to compute the cell data. Optional RandomAccessibles can be passed as inputs, if needed (input1, input2, ...). In general, block sized cells of the inputs are copied into direct/native buffers that are then passed into the Python code as numpy.ndarrays. A copy can be avoided for any input that is an (extended) CachedCellImg<?, ? extends BufferAccess<?>> that is backed by direct/native buffers and has a compatible blockSize. All relevant variables can be accessed from the Python code through the block variable of type Block, defined as

from dataclasses import dataclass
import numpy as np
@dataclass
class Block:
    data: np.ndarray
    inputs: list
    index: int
    min: tuple
    max: tuple
    dim: tuple
    halo: tuple

with the follwoing members:

Member Description
data Holds the cell/block data. Write output into this ndarray.
inputs List of ndarrays that hold input data (if any).
index Block index within the cell grid.
min Minimum coordinate of block.
max Maximum coordinate of block.
dim Dimension (shape) of block.
halo Slicing to crop any arrays, if necessary to remove padding.

Please refer to these working examples:

Note: The CachedCellImg is backed by a cache that, for some implementations, may relay on the JVM garbage collector to free unused entries. Direct buffers are used for shared memory access between Java and CPython but their native memory allocation does not count towards the JVM heap, i.e. the garbage collector will not remove unused entries from the cache. To avoid OutOfMemoryErrors, we recommend using a bounded cache like GuardedStrongRefLoaderCache.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages