Holoscan SDK v1.0.3
Release Artifacts
- π Docker container: tag
v1.0.3-dgpu
andv1.0.3-igpu
- π Python wheels:
holoscan==1.0.3
- π¦οΈ Debian packages:
1.0.3.2-1
(from the cuda repository) - π Documentation
Release Notes
New Features and Improvements
Core
- Allow operator input and output ports to have matching names
- Application graphs with cycles are now supported
(example) - Cycles in the graph are also supported in Data Flow Tracking
- An informative error message is now raised if an unsupported condition type is provided to
IOSpec::condition
. - User-defined operators can now define parameters that are of type
complex<float>
orcomplex<double>
. These parameters can either be parsed from a YAML config (e.g. using a string like "1.0 + 2.0j") or passed as aholoscan::Arg
to the operator constructor. - Holoscan tensors containing data of type
complex<float>
orcomplex<double>
can now be used. - Python applications can now send CuPy, NumPy or other tensor types with complex-valued data between fragments of a multi-fragment application. Previously, this only worked within a single fragment.
- Many C++ API
description
methods and corresponding Python API__repr__
methods have been improved.- The
IOSpec
class now has adescription
method and corresponding Python__repr__
method. - A bug was fixed where the
Arg
class__repr__
could raiseUnicodeDecodeError
for uint8_t or int8_t argument types - The
NetworkContext
andScheduler
print more comprehensive information. - Python bindings for GXF conditions, resources and operators have an improved
__repr__
that makes use of the underlying C++ description methods.
- The
- The
HOLOSCAN_UCX_PORTS
environment variable allows users to define preferred port numbers for the SDK's inter-node communication in a distributed application, especially in environments where specific ports need to be predetermined, such as Kubernetes. - A
Condition
orResource
class can be added to a Python operator after construction via itsadd_arg
method. - Distributed applications can now leverage RDMA transports with MLNX_OFED drivers. Tested with RoCE.
- The
HOLOSCAN_HEALTH_CHECK_PORT
environment variable allows users to define a port number for the SDK's health check endpoint in a distributed application. - A set of available keys in an
Application
orFragments
's YAML configuration file can now be determined via a newconfig_keys()
method in the C++ API orconfig_keys
method from Python. - Debugging (tracing and profiling) of Python operators is now fully supported.
- Previously, the
compute
,initialize
,start
, andstop
methods of the Holoscan Operator were not compatible with Python tracing/profiling in earlier releases. - Debugging methods of Python operators with the VSCode/PyCharm debugger using PyDev.Debugger (pydevd) is now feasible, as well as profiling or gathering coverage data using cProfile or coverage.py.
- For comprehensive information, refer to the Debugging section in the SDK User Guide.
- Previously, the
Operators
- HoloViz
- Inference
- The ONNX Runtime (ORT) inference backend is now a plugin, like the Torch backend, allowing you to use the inference operator without requiring an installation of ORT when using other backends (like TensorRT or Torch).
Utils
- Added a Dockerfile that contains only runtime dependencies. This Dockerfile can be built by running
./run build_run_image
at the top of the repository, creating an image that is ~8.6 GB vs. the ~13 GB build container from./run build_image
. (doc) - The
run
script in the git repository had a couple of updates and improvements, including:- Allow building as root
- Allow running of the build container without display
- Naming build image, build directories, and install directories with the target architecture and GPU (ex:
build
->build-aarch64-dgpu
) - Support building on system without tty support
- Support running on system without xhost support
- Added more flags (see
./run help
and./run <cmd> --help
for details)
Packaging
- Mellanox OFED user libraries were added to the NGC container to allow the use of RDMA transports from the container.
Documentation
- The user guide source code and tooling is now released on GitHub (link)
Breaking Changes
- H264 operator and applications were moved from the SDK to HoloHub (MR)
- For distributed applications, there is a change to the emit/receive behavior for array-like objects (e.g. PyTorch tensor) between operators within a fragment. Previously (in v0.6.x), the array-like object type was always preserved for within-fragment emit/receive. However, now now any host array-like will be recevied as a NumPy array (and any device array-like will be received as a CuPy array). Making within-fragment emit/receive behavior consistent with between-fragment emit/receive behavior was necessary to implement the fix for issue 4290043.
- Building against Ubuntu 22.04, debian packages and python wheels require
GLIBC_2.35
or above.
Bug fixes
Issue | Description |
---|---|
4185976 | Cycle in a graph is not supported. As a consequence, the endoscopy tool tracking example using input from an AJA video card in enabled overlay configuration is unfunctional. This is planned to be addressed in the next version of the SDK. |
4196152 | Getting "Unable to find component from the name ''" error message when using InferenceOp with Data Flow Tracking enabled. |
4211747 | Communication of GPU tensors between fragments in a distributed application can only use device 0 |
4212743 | Holoscan CLI packager copies into the App Package the unrelated files and folders in the same folder than the model file. |
4232453 | A segfault occurs if a native Python operator __init__ assigns a new attribute that overrides an existing base class attribute or method. A segfault will also occur if any exception is raised during Operator.__init__ or Application.__init__ before the parent class __init__ was called. |
4206197 | Distributed apps hang if multiple input/output ports are connected between two operators in different fragments. |
3599303 | Linux kernel is not built with security hardening flags. Future releases will include a Linux kernel built with security hardening flags. |
4187787 | TensorRT backend in the Inference operator prints Unknown embedded device detected. Using 52000MiB as the allocation cap for memory on embedded devices on IGX Orin (iGPU). Addressed in TensorRT 8.6+. |
4194109 | AppDriver is executing fragments' compose() method which can be avoided. |
4260969 | App add_flow causes issue if called more than once between a pair of operators. |
4265393 | Release 1.0-ea1 and 1.0-ea2 fail to run distributed applications with workers on two or more nodes. |
4272363 | A segfault may occur if an operator's output port containing GXF Tensor data is linked to multiple operators within the MultiThreadScheduler. |
4290043 | Bug in Python implicit broadcast of non-TensorMap types when at least one target operator is in a different fragment. |
4293729 | Python application using MultiThreadScheduler (including distributed application) may fail with GIL related error if SDK was compiled in debug mode. |
4101714 | Vulkan applications fail (vk::UnknownError ) in containers on iGPU due to missing iGPU device node being mounted in the container. Workaround documented in run instructions. |
3881725 | VK_ERROR_INITIALIZATION_FAILED with segmentation fault while running High-Speed Endoscopy gxf/cpp app on Clara AGX developer kits. Fix available in CUDA drivers 520. Workaround implemented since v0.4 to retry automatically. |
4293741 | Python application with more than two operators (mixed use of pure Python operator and operator wrapping C++ operator), using MultiThreadScheduler (including distributed app) and sending Python tensor can deadlock at runtime. |
4313690 | Failure to initialize BayerDemosaicOp in applications using the C++ API |
4187826 | Torch backend in the Inference operator is not supported on Tegra's integrated GPU. |
4336947 | The dev_id parameter of the CudaStreamPool resource is ignored. |
4344061 | Native Python operator overrides of the start, stop or initialize methods don't handle exceptions properly |
4344408 | The distributed application displays an error message if port 8777 is already in use. |
4363945 | Checking if a key exists in an application's config file results in an error being logged. |
Fixed bad cast exception when defining optional ports enablement (buffer input, output, camera pose) for the Holoviz operator from a YAML configuration file. | |
Fixed invalid stride alignment of video buffer inputs in the Holoviz operator. | |
4367627 | The distributed application does not handle IPv6 addresses and hostnames properly. |
4368977 | The DownstreamMessageAffordableCondition has not been added to the optional output ports of the Holoviz operator (AJASourceOp and HolovizOp). This omission leads to a GXF_EXCEEDING_PREALLOCATED_SIZE error when the data in the output port's queue is not consumed quickly enough. |
Fixed FormatConverterOp input stride handling, was previously ignored. | |
4381269 | Compiling the SDK with the VSCode Dev Container (using 'Tasks: Run Build Task') may lead to memory exhaustion due to the absence of the CMAKE_BUILD_PARALLEL_LEVEL environment variable. |
4398018 | Intermittent 'Deserialize entity header failed' error with the distributed app when running all fragments locally on the same node. |
4371324 | In the distributed application, a bug causes crashes ('Serialization failed') due to null tensor pointers during mixed broadcasts. This results from not using mutexes when sending GXF Tensors to remote endpoints with UCX. |
4414990 | Crash in Fragment::port_info with the distributed app if any of the fragments did not have any operators added to it during compose. |
4449149 | Unable to debug, trace, or profile the 'compute' method of Python API operators using VSCode debugger/profile/coverage. |
3878494 | Inference fails after a TensorRT engine file is first created using BlockMemoryPool . Fix available in TensorRT 8.4.1. Use UnboundedAllocator as a workaround. |
4171337 | AJA with RDMA is not working on integrated GPU (IGX or AGX Orin) due to conflicts between the nvidia-p2p and nvidia driver symbols (nvidia_p2p_dma_map_pages ) |
4233845 | The UCX Transmitter might select the wrong local IP address when creating a UCX client. This can cause the distributed application to fail if the selected IP address cannot be reached from the other computer. The Holoscan SDK automatically sets the HOLOSCAN_UCX_SOURCE_ADDRESS environment variable based on the --worker-address CLI argument if the worker address is a local IP address. In addition, the UCX_CM_USE_ALL_DEVICES environment variable is set to n by default to disable consideration of all devices for data transfer. |
Supported Platforms
Note: This release is intended for use with the listed platforms only. NVIDIA does not provide support for this release on products other than those listed below.
Platform | OS |
---|---|
NVIDIA IGX Orin | IGX SW 1.0 DP (L4T r36.1) or Meta Tegra Holoscan 1.0.0 (L4T r36.2) |
NVIDIA Jetson AGX Orin and Orin Nano | NVIDIA JetPack 6.0 DP (L4T r36.2) |
NVIDIA Clara AGX* *Only supporting the NGC container |
NVIDIA HoloPack 1.2 (L4T r34.1.2) or Meta Tegra Holoscan 0.6.0 (L4T r35.3.1) |
x86_64 platforms with Ampere GPU or above |
Ubuntu 22.04 |
Known Issues
This section supplies details about issues discovered during development and QA but not resolved in this release.
Issue | Description |
---|---|
4062979 | When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered. |
4267272 | AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h . Expected to be addressed in IGX SW 1.0 GA. |
4384768 | No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively. |
4190019 | Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run . Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead. |
4210082 | v4l_camera example seg faults at exit. |
4339399 | High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5 ) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler. |
4318442 | UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable. |
4325468 | The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888. |
4325585 | Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms , particularly if check_recession_period_ms is greater than zero. |
4301203 | HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA. |
4384348 | UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages. |
4481171 | Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads. |
4458192 | In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777 ), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780 . |