Release Holoscan SDK v2.0.0 · nvidia-holoscan/holoscan-sdk

Release Artifacts

🐋 Docker container: tag v2.0.0-dgpu and v2.0.0-igpu
🐍 Python wheel: pip install holoscan==2.0.0
📦️ Debian packages: 2.0.0.2-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Core

make_condition, make_fragment, make_network_context, make_operator, make_resource, and
make_scheduler now accept a non-const string or character array for the name parameter.
A new event-based mult-thread scheduler (EventBasedScheduler) is available. It is an alternative to the existing, polling-based MultiThreadScheduler and can be used as a drop-in replacement. The only difference in parameters is that it does not take check_recession_period_ms parameter, as there is no such polling interval for this scheduler. It should give similar performance to the MultiThreadScheduler with a very short polling interval, but without the high CPU usage seen for the multi-thread scheduler in that case (due to constant polling for work by one thread).
When an exception is raised from the Operator methods start, stop or compute, that exception will first trigger the underlying GXF scheduler to terminate the application graph and then the exception will be raised by Holoscan SDK. This resolves an issue with inconsistent behavior from Python and C++ apps on how exceptions were handled and fixes a crash in C++ apps when an operator raised an exception from the start or stop methods.
Now, when an exception occurs during the execution of a Holoscan application, it is propagated to
the application's run method, allowing users to catch and manage exceptions within their
application.
Previously, the Holoscan runtime would catch and log exceptions, with the application continuing
to run (in Python) or exit (in C++) without a clear indication of the exception's origin.
Users can catch and manage exceptions by enclosing the run method in a try block.
In the case of the holoscan::Fragment::run_async and holoscan.Application.run_async methods
for C++ and Python, they return std::future and concurrent.futures.Future respectively.
The revised documentation advises using future.get() in C++ and future.result() in Python to
wait until the application has completed execution and to address any exceptions that occurred.

Operators

V4L2 Video Capture: added support to set manual exposure and gain values for cameras that support it.
Inference: one can now run multiple instances of the Inference operator in a single application without ressource conflicts.

Utils

Can now build from source for iGPU (IGX iGPU, Jetpack) from a non-iGPU system (IGX dGPU, x86_64)
The NGC container now supports packaging and running Holoscan Application Packages using the Holoscan CLI.
CLI runner - better handling of the use of available GPUs by reading the package manifest file and check the system for available GPUs. New --gpus argument to override the default values.

Breaking Changes

The VideoStreamRecorderOp and VideoStreamReplayerOp now work without requiring the libgxf_stream_playback.so extension. Now that the extension is unused, it has been removed from the SDK and should no longer be listed under the extensions section of application YAML files using these operators.
As of version 2.0, we have removed certain Python bindings to align with the unified logger interface:
- Removed APIs:
  - holoscan.logger.enable_backtrace()
  - holoscan.logger.disable_backtrace()
  - holoscan.logger.dump_backtrace()
  - holoscan.logger.should_backtrace()
  - holoscan.logger.flush()
  - holoscan.logger.flush_level()
  - holoscan.logger.flush_on()
- However, the following APIs remain accessible for Python. These are intended for logging in Holoscan's core or for C++ operators (e.g., using the HOLOSCAN_LOG_INFO macro), and are not designed for Python's logging framework. Python API users are advised to utilize the standard logging module for their logging needs:
  - holoscan.logger.LogLevel
  - holoscan.logger.log_level()
  - holoscan.logger.set_log_level()
  - holoscan.logger.set_log_pattern()
Several GXF headers have moved from gxf/std to gxf/core:
- parameter_parser.hpp
- parameter_parser_std.hpp
- parameter_registrar.hpp
- parameter_storage.hpp
- parameter_wrapper.hpp
- resource_manager.hpp
- resource_registrar.hpp
- type_registry.hpp
Some C++ code for tensor interoperability has been upstreamed from Holoscan SDK into GXF. The public holoscan::Tensor class will remain, but there have been a small number of backward incompatible changes in related C++ classes and methods in this release. Most of these were used internally and are unlikely to affect existing applications.
- supporting classes holoscan::gxf::GXFTensor and holoscan::gxf::GXFMemoryBuffer have been removed. The DLPack functionality that was formerly in holoscan::gxf::GXFTensor is now available upstream in GXF's nvidia::gxf::Tensor.
- The struct holoscan::gxf::DLManagedTensorCtx has been renamed to holoscan::gxf::DLManagedTensorContext and is now just an alias for nvidia::gxf::DLManagedTensorContext. It also has two additional fields (dl_shape and dl_strides to hold shape/stride information used by DLPack).
- holoscan::gxf::DLManagedMemoryBuffer is now an alias to nvidia::gxf::DLManagedMemoryBuffer
The GXF UCX extension, used in distributed applications, now sends data asynchronously by default, which can lead to issues such as insufficient memory on the transmitter side when a memory pool is used. Specifically, the concern is only for operators that have a memory pool and connect to an operator in a separate fragment of the distributed application. As a workaround, users can increase the num_blocks parameter to a higher value in the BlockMemoryPool or use the UnboundedAllocator to avoid the problem. This issue will be addressed in a future release by providing a more robust solution to handle the asynchronous data transmission feature of the UCX extension, eliminating the need for manual intervention (see Known Issue 4601414).
- For fragments using a BlockMemoryPool, the num_blocks parameter can be increased to a higher value to avoid the issue. For example, the following code snippet shows the existing BlockMemoryPool resource being created with a higher number of blocks:
```
recorder_format_converter = make_operator<ops::FormatConverterOp>(
  "recorder_format_converter",
  from_config("recorder_format_converter"),
  Arg("pool") =
    //make_resource<BlockMemoryPool>("pool", 1, source_block_size, source_num_blocks));
    make_resource<BlockMemoryPool>("pool", 1, source_block_size, source_num_blocks * 2));
```
```
source_pool_kwargs = dict(
    storage_type=MemoryStorageType.DEVICE,
    block_size=source_block_size,
    #num_blocks=source_num_blocks,
    num_blocks=source_num_blocks * 2,
)
recorder_format_converter = FormatConverterOp(
        self,
        name="recorder_format_converter",
        pool=BlockMemoryPool(self, name="pool", **source_pool_kwargs),
        **self.kwargs("recorder_format_converter"),
    )
)
```
- Since the underlying UCXTransmitter attempts to send the emitted data regardless of the status of the downstream Operator input port's message queue, simply doubling the num_blocks may not suffice in cases where the receiver operator's processing time is slower than that of the sender operator.
- If you encounter the issue, consider using the UnboundedAllocator instead of the BlockMemoryPool to avoid the problem. The UnboundedAllocator does not have a fixed number of blocks and can allocate memory as needed, though it can cause some overhead due to the lack of a fixed memory pool size and may lead to memory exhaustion if the memory is not released in a timely manner.
  The following code snippet shows how to use the UnboundedAllocator:
```
...
  Arg("pool") = make_resource<UnboundedAllocator>("pool");
```
```
from holoscan.resources import UnboundedAllocator
...
        pool=UnboundedAllocator(self, name="pool"),
...
```

Bug fixes

Issue	Description
4381269	Fixed a bug that caused memory exhaustion when compiling the SDK in the VSCode Dev Container (using 'Tasks: Run Build Task') due to the missing CMAKE_BUILD_PARALLEL_LEVEL environment variable. Users can specify the number of jobs with the `--parallel` option (e.g., `./run vscode --parallel 16`).
4569102	Fixed an issue where the log level was not updated from the environment variable when multiple Application classes were created during the session. Now, the log level setting in Application class allows for a reset from the environment variable if overridden.
4578099	Fixed a segfault in FormatConverterOp if used with a BlockMemoryPool with insufficient capacity to create the output tensor.
4571581	Fixed an issue where the documentation for the built-in operators was either missing or incorrectly rendered.
4591763	Application crashes if an exception is thrown from Operator::start or Operator::stop
4595680	Fixed an issue that caused the Inference operator to fail when multiple instances were composed in a single application graph.

Known Issues

This section supplies details about issues discovered during development and QA but not resolved in this release.

Issue	Description
4062979	When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered.
4267272	AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing `nv-p2p.h`. Expected to be addressed in IGX SW 1.0 GA.
4384768	No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing `nv-p2p` kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively.
4190019	Holoviz segfaults on multi-gpu setup when specifying device using the `--gpus` flag with `docker run`. Current workaround is to use `CUDA_VISIBLE_DEVICES` in the container instead.
4210082	v4l_camera example seg faults at exit.
4339399	High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the `check_recession_period_ms` parameter set to `0` by default) may still experience high CPU usage. Setting the `HOLOSCAN_CHECK_RECESSION_PERIOD_MS` environment variable to a value greater than 0 (e.g. `1.5`) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler.
4318442	UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the `UCX_TLS` environment variable.
4325468	The `V4L2VideoCapture` operator only supports `YUYV` and `AB24` source pixel formats, and only outputs the `RGBA` GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888.
4325585	Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the `stop_on_deadlock_timeout` parameter is improperly set to a value equal to or less than `check_recession_period_ms`, particularly if `check_recession_period_ms` is greater than zero.
4301203	HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA.
4384348	UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages.
4481171	Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads.
4458192	In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the `HOLOSCAN_HEALTH_CHECK_PORT` environment variable (default: `8777`), for example, by using `export HOLOSCAN_HEALTH_CHECK_PORT=8780`.
4601414	The UCX extension's asynchronous data transmission feature causes a regression in the distributed application, such as insufficient memory on the transmitter side. As a workaround, users can increase the `num_blocks` parameter in the `BlockMemoryPool` or use the `UnboundedAllocator` instead of the `BlockMemoryPool` to avoid the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Holoscan SDK v2.0.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators

Utils

Breaking Changes

Bug fixes

Known Issues